laitimes

Chen Jianlin|Types of enterprise data empowerment from the perspective of general artificial intelligence

Chen Jianlin|Types of enterprise data empowerment from the perspective of general artificial intelligence
Chen Jianlin|Types of enterprise data empowerment from the perspective of general artificial intelligence
Chen Jianlin|Types of enterprise data empowerment from the perspective of general artificial intelligence

With the advent of the era of general artificial intelligence, the importance of enterprise data in the digital governance ecosystem has become increasingly prominent, and the development of the industry has generated the need for enterprise data empowerment. Enterprise data rights are a new type of intellectual property bred in the intelligent era, which has distinct spatiotemporal and regional characteristics. Compared with existing intellectual property rights, enterprise data rights are a weak right with relative exclusivity, and can only be limited from being used by others on the basis of proving openness, value and management. Enterprise data empowerment from the perspective of general artificial intelligence should be based on promoting industrial development, and a scientific and rational enterprise data rights system should be constructed that combines leniency and severity. Among them, the right to hold data resources is the basis of enterprise data rights, the right to process and use is the core of enterprise data rights, and the right to circulate and trade is the value realization of enterprise data rights. This paper argues that the construction of a typed enterprise data rights system can effectively incentivize data investment, promote data flow, and accelerate human society into the intelligent era, and systematically analyzes the legal basis of enterprise data empowerment.

Chen Jianlin|Types of enterprise data empowerment from the perspective of general artificial intelligence

I. Formulation of the problem

With the application of ChatGPT (Chat Generative Pre-trained Transformer), artificial intelligence has gradually moved from specialized artificial intelligence to artificial general intelligence. Specialized AI refers to artificial intelligence that extracts information from a specific data set and runs it in a specific scenario to perform a single task. Whereas, artificial general intelligence refers to intelligent systems with cognitive abilities equal to or higher than those of humans, capable of understanding, learning, planning, and solving problems to complete tasks in a variety of application scenarios.

Data is the foundation of artificial general intelligence. For example, ChatGPT and sora are language models and Wensheng video models trained based on deep learning based on big data. The "Opinions of the Central Committee of the Communist Party of China and the State Council on Constructing a Basic Data System to Better Play the Role of Data Elements" (hereinafter referred to as the "20 Data Articles") distinguishes data into public data, enterprise data, and personal data according to the data source and data characteristics. Enterprise data refers to data collected and processed by various market entities in the course of production and business activities, and does not involve personal information or public interests. In artificial general intelligence, the datasets used for data model training and the data products formed by data models are enterprise data.

As far as enterprise data is concerned, the current mainland mainly protects it in the form of rights and interests through the Anti-Unfair Competition Law, but the behavior regulation model has problems such as unclear rights boundaries and insufficient stability, which is difficult to meet the industrial needs of general artificial intelligence. There is a clear disagreement on whether and how enterprise data needs to be empowered. The denial of data empowerment believes that the level of protection of enterprise data in mainland China has far exceeded the level of protection of property rights protection rules, and the confirmation of data rights will only increase the difficulty of data utilization, but will not promote data utilization. Data empowerment affirmers also have very different views on how to do data empowerment. Most scholars advocate the empowerment of data in the rights model, but there are currently three different views on what kind of rights are enterprise data rights: intellectual property law scholars advocate that enterprise data and information protection industrial property rights have a deep compatibility, and it is necessary to include enterprise data rights in the industrial property rights sequence; Civil law scholars believe that data rights and intellectual property rights are different in terms of rights and interests structure, protection period, protection concept, object of rights, etc., and advocate that by drawing on the experience of property rights, the property rights and interests of data processors such as the right to hold, use, benefit and dispose should be confirmed and protected, and according to the different sources and degrees of contributions of different subjects to the formation of data, the data originator has the ownership of the data and the data processor has the right to use the data. Some scholars also argue that data property rights have the basic attributes of property, temporality, limited domination and limited exclusivity, and are the third type of property rights with temporal nature alongside property rights and intellectual property rights. In addition to the rights model, some scholars argue that data protection is essentially a property governance paradigm to construct the order of data circulation and utilization. As to what kind of rights to be granted, some scholars believe that a dual right structure of data producer rights and data user rights can be adopted, among which the data product producer right is the core right of data empowerment; There are also scholars who construct data rights according to the cycle of data generation, and construct resource holding rights, data processing and use rights, and data product management rights for data collection, data processing and utilization, and data product transactions.

It can be seen that enterprise data, as a type of data, faces huge differences in the necessity of empowerment, the attributes of rights, and the content of rights. At present, when discussing the issue of enterprise data empowerment, most scholars do not construct a structural and typed rights system that conforms to the value of enterprise data according to the characteristics of the development of the artificial intelligence industry and the characteristics of data types. The theory of data property rights advocates the property protection of data, which is not conducive to the circulation and use of data in the artificial intelligence industry. The theory of new data rights regards enterprise data as a new civil right, and fails to fully consider the historical origin and technical background of enterprise data, which makes it difficult to be self-consistent with the existing civil law system. The data holding theory emphasizes the governance paradigm, and the object of protection is still data interests, but does not fundamentally clarify the scope and content of enterprise data rights.

Enterprise data empowerment is closely related to the development of science and technology, and it is precisely because artificial intelligence has brought a subversive revolution to traditional industries that enterprise data empowerment has entered the field of legal research. In view of this, this paper intends to analyze the rights attributes of enterprise data based on the objective background of enterprise data empowerment and the development history of enterprise data. According to the generation cycle of enterprise data, from the perspective of self-use right and disabling right, the three sub-rights of data resource holding, data processing and use, and data product operation right covered by enterprise data right are typed, and finally the structural rights system of enterprise data empowerment is proposed, in order to make a more in-depth research and analysis on the problem of enterprise data empowerment from the perspective of general artificial intelligence.

2. Enterprise data rights are a new type of intellectual property rights born in the era of intelligence

The essence of enterprise data is information, and early enterprise data does not have scale, creativity and value, and is not included in traditional intellectual property rights. In recent years, with the development of algorithms, computing power, and data storage capabilities, human society has gradually entered the era of intelligence. The emergence of large models makes enterprise data explode and generate increasingly prominent value. The need for industrial development is the basic demand for enterprise data empowerment, and enterprise data empowerment also needs to be based on industrial laws and the attribute characteristics of enterprise data.

(1) Enterprise data empowerment is an objective need in the era of general artificial intelligence

1. Artificial general intelligence (AI) generates massive amounts of data that need to be protected by law

The learning of general artificial intelligence large models is divided into two stages: general intelligence and professional intelligence, the former focuses on the learning of language comprehension, reasoning ability and general knowledge, and the latter focuses on the learning of specific task instructions. Compared with specialized artificial intelligence that is proficient in a certain field, general artificial intelligence involves a wider range of user data, more diverse user data collected in advance and obtained in the process of human-computer interaction, and a larger amount of data derived from specific algorithms based on self-learning. These enterprise data have use value and need to be regulated by law to prevent the disorderly use of data.

2. The rapid iteration of data determines the timeliness of data empowerment

From ChatGPT to sora, the ability of large models to process data has been continuously strengthened, and it is foreseeable that with the advent of the era of general artificial intelligence, data iteration will be faster. On the one hand, it is necessary to promote the orderly development of data, and on the other hand, it is necessary to limit the protection period of enterprise data rights, so as to promote the data that has been iterated to enter the public domain as soon as possible and realize data sharing and sharing.

3. Expand from protecting knowledge to protecting the ability to use data

Artificial intelligence accelerates the process of human knowledge discovery, but it also brings about the depreciation of knowledge. Through powerful retrieval and computing capabilities, artificial intelligence can easily generate a large number of patent applications that can pass the utility examination, whether it is the splitting and combination of technical scenarios or the mining and processing of cross-domain data, which also makes the creation of mediocre works meaningless. The advent of the era of general artificial intelligence has forced the development of human society to enter a new stage of pursuing wisdom from the pursuit of knowledge, and the value of human beings is not only reflected in how much knowledge they have, but also in their ability to obtain data, process data, and use knowledge. By improving the enterprise data governance system and establishing a legal empowerment system for enterprise data, we can promote the acquisition, processing and circulation of data, and encourage data to play a greater role in the use.

4. The risk of enterprise data infringement increases

The algorithmic model characterized by deep learning is essentially an end-to-end black box. In the process of pre-training large models, one-sided and false data will produce wrong feedback results, resulting in the risk of data bias and algorithm discrimination. With the advent of the era of general artificial intelligence, it is necessary to build an enterprise data compliance system with enterprise data empowerment as the main line, resolve the risk of enterprise data infringement, and effectively promote the healthy and orderly development of the digital economy industry.

(2) Enterprise data rights are new types of intellectual property rights

As a new type of right generated by the development of science and technology, the most suitable path for enterprise data empowerment is still intellectual property. In terms of the object of protection, both enterprise data and intellectual property protect information, and the two present an incremental relationship in creativity. Information with a certain degree of creativity can be the object of intellectual property, while information with a weak level of creativity may become the object of enterprise data. Historically, the protection of intellectual property rights has facilitated the disclosure and circulation of works, trademarks, and patents. Like intellectual property empowerment, the importance of enterprise data empowerment is to demonstrate legitimacy, so as to confirm the legitimacy of subsequent data processing, use, circulation and transactions, and promote the further use of data. By assigning intellectual property rights to enterprise data, it is possible to balance the private interests and social public interests of relevant operators investing in and processing data, so as to build a data property rights system with Chinese characteristics.

1. Enterprise data rights sprout in intellectual property law

The "Fit" case heard by the United States Supreme Court in 1991 was a landmark case in the determination of originality under copyright law and the germ of data protection. In that case, the Supreme Court of United States held that the alphabetical arrangement of data such as telephone numbers in the directory by the customer's name did not have any originality required by copyright law and was therefore not protected by copyright law. It is true that corporate data that is not original and not uniquely orchestrated is difficult to be protected by copyright law. However, intellectual property law not only protects original works or technical solutions, but also protects commercial marks and trade secrets with commercial interests.

Intellectual property law is an ever-expanding process, and the development of business practices has led to the gradual improvement of the intellectual property legal system. Until the middle of the 19th century, the property nature of commercial logos was not recognized. In the second half of the 19th century, due to the recognition and use of commercial logos in commercial practice, trademark law began to become an independent legal field and was gradually incorporated into the field of intellectual property law. Trade secrets are not necessarily inventive in themselves, such as business information, and are more distinct from information in the public domain. Although trade secrets have a long history of protection, they were not initially included in the Paris Convention as a separate type of intellectual property. Like trade marks, trade secrets are legislated separately and eventually incorporated into the intellectual property law system after a long period of business practice. As a result, IP law itself is a process of continuous expansion with technological and commercial developments. Intellectual property law not only protects creative works and patented technical solutions, but also protects trademarks with commercial interests and trade secrets, goodwill and other objects subject to anti-unfair competition laws.

Enterprise data is closely linked to intellectual property law. In the industrial age, corporate data, represented by customer phone numbers in telephone directories, was not initially protected by copyright law or other intellectual property laws. The reason for this was that, given the level of productivity at the time, the role of data was not fully utilized and the value of data was not sufficiently valued. Although enterprise data was not protected by copyright law at the beginning, it shows that there is a certain close relationship between enterprise data and intellectual property rights to a certain extent. With the development of artificial intelligence, data has never been more valuable. With the needs of social development, personal data, enterprise data, and public data assume different functional positions and have different legislative needs. At present, personal data and public data have been legislated separately, and it is entirely possible for enterprise data with market value to be legislated separately and systematically incorporated into the intellectual property legal system.

2. Enterprise data rights are rooted in intellectual property law

Under existing laws, a portion of corporate data is already covered by intellectual property laws. Data with a unique arrangement can be protected by copyright law as a compilation work, and undisclosed enterprise data can be protected by trade secrets. Enterprise data is not a completely new area of law as imagined, but a limited void space that exists outside of existing IP laws.

At present, enterprise data has not been empowered, but the need for judicial protection has been created. The protection of enterprise data in China is mainly protected by the Anti-Unfair Competition Law in the intellectual property legal system, for example, in the cases of Dianping.com and Meijing, both of which are protected by the general provisions of Article 2 of the Anti-Unfair Competition Law of the mainland. In the proposed draft amendment to the Anti-Unfair Competition Law, a special article on commercial data is also specified. For a period of time in the future, enterprise data will still be mainly protected in accordance with the Anti-Unfair Competition Law. The Anti-Unfair Competition Law protects the data rights and interests of enterprises through the mode of behavior regulation, which avoids the problem of absolute domination of data that may be caused by the property rights of enterprise data, but it is difficult to solve the problems of individual case identification and ex post facto regulation in the Anti-Unfair Competition Law. With the advent of the era of general artificial intelligence, intelligent technology and governance scenarios are deeply integrated, and the importance of enterprise data in the digital governance ecosystem is becoming increasingly prominent, and the protection of enterprise data should still be empowered through separate legislation.

It is worth noting that in the process of formulating the original General Provisions of the Civil Law, paragraph 2 of Article 108 of the first draft provides that intellectual property rights refer to the rights enjoyed by the right holder in accordance with the law with respect to the following subject matter: "...... (8) Data information; ...... "In view of the considerable controversy over the conceptual scope, scope of protection, attributes of rights, rights and obligations of data and network virtual property, the second review draft of the draft provides for a separate article on data and virtual property, stipulating that "where the law has provisions on the protection of data and network virtual property, follow those provisions", and the General Provisions of the Civil Law, which was finally adopted, maintained this provision. This provision of the General Provisions of the Civil Code is upheld in the General Provisions of the Civil Code. Data is divided into personal data, enterprise data, and public data, each of which has its own focus. Personal data focuses on information protection, public data focuses on information disclosure, and enterprise data focuses on circulation and use. Enterprise data, like intellectual property, is a "structured but non-material" thing with an information structure but no form, and is protected by intellectual property law because of its commercial interests.

(3) The characteristics of the rights of enterprise data rights

1. Enterprise data rights have a distinct spatiotemporal nature

Enterprise data rights are time-sensitive and have a short protection period. In the era of artificial intelligence, computers, Internet of Things sensors, network users, large models, etc. continue to generate data, and the iterative update speed of data information is fast. Granting an excessively long protection period to an enterprise's data right may result in excessive protection of data information that has lost its statute of limitations, affecting the circulation and use of data. As a result, the term of protection of enterprise data rights is shorter than that of other intellectual property rights.

2. Enterprise data rights have regional characteristics

The territoriality of enterprise data rights is reflected in the prerequisites for the creation of rights, and enterprise data should be generated or registered in China in order to be protected in China. The territoriality of enterprise data rights is also reflected in the review and management of enterprise data. In the era of intelligence, enterprise data covers a large amount of information about social and economic life, involving national security. Therefore, all countries have introduced data supervision measures and implemented a classification and hierarchical protection system for data. For enterprise data that may impact national security, conduct security reviews during data processing and data transmission, and build a mechanism for data security, compliance, and orderly cross-border circulation.

3. Enterprise data rights focus on the protection of prior rights

According to the Interim Measures for the Administration of Generative AI Services issued by the Cyberspace Administration of China, enterprise data must not infringe on the intellectual property rights and personal information enjoyed by others in accordance with the law. In the era of intelligence, much of the content of enterprise data comes from user-generated data, and if user-generated data contains personal information, enterprises should obtain personal consent before collecting data, and should filter personal privacy during processing. In the process of standardizing the processing of data, promote the anonymization of personal information, and fully protect the right to privacy, reputation, honor and other personality rights. The pre-training of algorithmic models relies on data resources, and the corresponding data resources shall be lawfully obtained. At present, there are infringements of unauthorized use of data in algorithmic models in various countries. In February 2023, Getty images, the world's largest image gallery company, sued Midjourney for using its own work to train algorithmic models without permission.

In June 2023, there was also a dispute in China over the unauthorized use of Xueersi's data to train an artificial intelligence model.

4. Enterprise data rights have weak rights attributes

Enterprise data rights protect data sets or data products that are publicly held by enterprises and have market value and for which management measures have been taken, which is a supplementary protection to the existing intellectual property system. Enterprise data, which would otherwise not be protected by intellectual property laws, has been empowered by the development of the AI industry. If enterprises are given the same level of protection as intellectual property rights, it may lead to conflicts over existing rights and eventually empty the existing legal protection system. Therefore, enterprise data that meets the inventive step requirement should be protected by copyright or patent rights. Compared with existing intellectual property rights, enterprise data rights are a weak right.

5. Enterprise data rights are weakly exclusive

In intellectual property rights, copyrights, patents, and trademark rights are strong exclusive rights, and once the rights are obtained, no one may use them without permission. Trade secrets are weakly exclusive, and the establishment of trade secrets needs to be determined on a case-by-case basis before exercising the right of exclusivity. Trade secrets are not absolutely exclusive, but relative exclusivity under certain conditions. As a right evolved from rights and interests, enterprise data rights are also weakly exclusive, and it is necessary to prove the openness, value and management of enterprise data in individual cases, and only on this basis can others be excluded from using it. Therefore, compared with general civil rights, the right attribute of enterprise data rights is weaker.

In general artificial intelligence, the processing and use of enterprise data has become a standardized process. Data is desensitized, cleaned, classified, sorted, and integrated to become data resources, and data resources are processed and used to become data products that can be used in a standardized manner, and data products realize the value of data in circulation and transactions. According to the cycle of data generation, the right to hold data resources, the right to process and use data, and the right to operate data products are generated, which are the same classification standards as the classification of enterprise data rights in Article 20 of the Data Articles. The empowerment of enterprise data can be interpreted legally from the perspective of intellectual property self-use rights and exclusive rights.

3. Ownership of data resources from the perspective of general artificial intelligence

In the era of artificial intelligence, massive enterprise data has been classified and integrated to form data resources. Based on the management of data, enterprises have formed data resource ownership, and data resource ownership is the most common manifestation of enterprise data rights. With the acceleration of general artificial intelligence, data resources are mainly focused on providing pre-training functions for large models, and the corresponding data resource ownership is the basic right of enterprise data rights.

(1) The connotation of the right to hold data resources

The right to hold data resources includes two aspects, that is, the data held by the data resource holder is lawfully obtained, and the data resource holder manages and controls the data based on the legal acquisition of the data. Legal access to other people's data includes the acquisition of others' original data, as well as the acquisition of others' data elements and data products. Management control data refers to the right of enterprises to desensitize, clean, classify, sort, and integrate legally collected data. Data management is the prelude and preparation of data processing, and through the implementation of management measures such as cleaning, classification, and sorting of enterprise data, data can be transformed into data resources that can be used in production. Enterprises have the right to hold data resources based on the acquisition and management of data, so as to prevent others from using the relevant data without permission.

(2) The power to hold data resources

1. It is forbidden for others to obtain data by illegal means

The data controller has the right to prohibit access to the data by unlawful means. Protective measures to prohibit others from illegally obtaining enterprise data include not only setting up management measures to prohibit others from physically obtaining enterprise data, but also reasonably restricting others from obtaining and using enterprise data through contractual agreements and other means. The data that others are prohibited from obtaining includes not only the personal account and password data of the internet platform, but also the background data of the internet platform. Although public data is provided to the public without discrimination, it does not mean that other enterprises can obtain and use it without any restrictions, and enterprises can restrict others from accessing, copying, tampering with or destroying the data by setting access permissions and other means. The data holder also has the right to prohibit others from copying, tampering with, or destroying its management control data, so as to prevent the destruction of its management control status of the data.

The specific circumstances of illegally obtaining other people's data are: First, destroying technical measures to obtain other people's data by means of theft, electronic intrusion, etc. Platforms with large amounts of data resources often use specific technical measures to prevent others from accessing them, and the measures to undermine different technologies vary depending on the intensity and effectiveness of the technical measures. The second is to violate the agreement between the two parties and obtain the data of others beyond the scope or agreed period of the contract. Scraping data in violation of the agreed scope and conditions, such as the data scraping behavior involved in the "Sina Weibo v. Maimai case", and obtaining the data of others in violation of the agreed time limit. After the termination of the license contract, the defendant continued to scrape other people's data. The third is to violate business ethics and obtain other people's data in a misleading manner. For example, scraping data in violation of the Robots protocol. However, in special circumstances, there are indeed cases where the robots protocol is improperly set up, and the defendant can raise a defense and provide relevant evidence to prove it.

Illegally obtaining other people's data is a typical act that infringes on the right to hold data resources of enterprises. For example, in the case of Tencent v. Zhenfenduo Company for trademark infringement and unfair competition, Zhenfenduo Company misled users of the WeChat Official Account platform to download its "Official Account Assistant" software by using similar trademarks, similar software names and slogans, and obtained the accounts and passwords of WeChat Official Account users through the download process of the "Official Account Assistant" software, which is an act of collecting and storing data such as WeChat Official Account accounts and passwords. The above-mentioned acts can be determined to have illegally obtained and stored the data managed by the enterprise, infringing on the enterprise's right to hold data resources. In another example, in the unfair competition dispute case between Beijing Weibo Vision Co., Ltd. and Shanghai Liujie Information Technology Co., Ltd., Xiamen Scrap Abdominal Muscle Network Technology Co., Ltd., and Zhejiang Taobao Network Co., Ltd., Liujie Company used technical means to illegally obtain the data related to the live broadcast tipping records of Douyin users on the "Douyin" platform and the income of the anchors, and publicly displayed them after sorting out and calculating them on its own.

2. It is forbidden to copy the data controlled by others and constitute a substantial substitution

Illegal copying of other people's data and substantial substitution will also infringe on the right to hold data of enterprises. For example, in the case of Dianping, the user review information of Dianping.com does not necessarily belong to Dianping, but has become the core competitive resource of the enterprise. Dianping.com cleans the user review information, preliminarily excludes false and illegal information, and sorts the user review data according to the location and category of the merchant and the chronological order of the reviews. Through the cleaning, classification, and sorting of data, Dianping has gained control over the management of data. Baidu's search engine crawled the information involved in the case on Dianping.com, and although it did not violate the Robots Agreement, it carried out the act of copying the data controlled by Dianping.com, resulting in a substantial substitution of the management and control data of Dianping.com, which can be regarded as infringing on Dianping's right to hold data resources.

(3) The characteristics of rights in the ownership of data resources

1. Based on the integration of data resources, the enterprise has the right to hold data resources

In the era of intelligence, everything can be digitized, but not all data is a data resource that can be used directly. The personal information on top of the data needs to be hidden, and the fragmented data needs to be further collected and integrated. Enterprises collect, desensitize, clean, classify, sort, and integrate data, making data a data resource that can be used for further processing. At Google, at least 40 percent of engineers process data every day, and then gain knowledge through the data, and through that knowledge makes computers smarter. On the one hand, enterprises have invested time, financial resources, management and other resources in the process of data collection and sorting, and on the other hand, they have obtained the support of technology (algorithms) and intellectual processing, which makes enterprise data generate commercial benefits, and it is necessary to realize the protection of rights by giving data resource holders.

The right to hold data resources is the basic right of enterprise data. An enterprise's processing and use of data resources forms the right to process and use data, and the transaction of data resources or data products forms the right to operate data circulation, and the right to use data processing and the right to operate data circulation are derivative rights of the right to hold data resources. Compared with the right to use data processing and the right to circulate and trade, the right to hold data resources focuses on the protection of data production factors, which is the basic right of enterprise data rights. In the general artificial intelligence industry, the main role of data resources is to provide pre-training functions for large models, and the corresponding data resource holdings should appropriately reflect a certain degree of openness and inclusiveness to promote the growth and development of the industry.

2. The ownership of data resources is premised on the legality of the content

Through the pre-training of massive data, the large model extracts the solutions to various problems or the characteristics of different types of data products, and generates the required results according to the instructions. In massive data, it is inevitable that illegal and bad information, personal information and intellectual property rights protected by law will be included. The premise of the existence of data resource ownership is that the data resource is legitimate, and it is difficult for illegal data to be protected by law.

Data resources shall filter negative information. In 2022, Stanford University released an AI index report pointing out that many large language models of artificial intelligence are biased, and this bias comes from the basic data of intelligent models, which reflects the systemic bias of human society or the bias of data filterers. In order to ensure the legitimacy of data resources, the bad information in the corpus can be filtered and cleaned by using keywords and classification models in the data collection process.

Data resources shall protect personal information. Web users leave behind a lot of data in the process of using model products, which can enable general AI to provide us with more personalized services. However, if this data is misused or leaked, it will bring us huge security risks. The premise for the existence of the right to hold data resources is the legalization of the collection of personal information and the anonymization of personal privacy. If the acquisition of data resources is not obtained with the consent of the personal information subject, or if the data is not anonymized in the process of purging personal privacy, and specific biological information is not deleted, the data resources themselves are not legitimate, and it is difficult to produce legally protectable interests.

Data resources shall protect intellectual property rights. Whether it is open source data, self-collected data, commercial data, or user input data, enterprises should protect prior intellectual property rights in the process of data acquisition and use. Enterprises should exercise due diligence when entering data, identify and review whether the data infringes on intellectual property rights such as copyrights, trademark rights, patent rights, and trade secrets, and prevent illegal copying and use of intellectual property rights in the subsequent processing and use of data. At the same time, the summary information of the intellectual property part of the data resources is disclosed, and the third party is supported to inquire about the relevant intellectual property rights in the complaint channel.

3. The ownership of data resources contributes to the protection of trust interests

Large-scale data training is the basis for AI models to generate accurate results, and massive training materials may contain information that infringes personal information, intellectual property rights, or violates regulations. Through the collection, cleaning, and sorting of data, the right to hold data resources is generated, but it is inevitable that there will still be bad information that violates laws and regulations. Manual review of large-scale datasets is not operable, and the relationship between the right to use data processing, the right to circulate and trade data, and the prior right is objectively blocked by establishing the right to hold data resources. If a data resource is infringed, the corresponding liability shall be borne by the data resource holder, which objectively protects the trust and interests of data processing users and data circulation traders in the subsequent links, and also lowers the threshold for data processing and use. By establishing data resource ownership, the value of data reuse can be effectively improved.

4. The right to use data processing from the perspective of general artificial intelligence

Artificial intelligence technology relies on systems such as visual technology, speech technology, natural language understanding, and planning and decision-making to achieve human-computer interaction, and the process of human-computer interaction is also the process of data processing and use. In generative AI, the data products produced by large models need to be protected by the right to process and use, and in the future, the data products generated by general AI will also need to be adjusted through the right to process and use. Therefore, the right to use data processing is the core of enterprise data rights, and it is also the object that needs to be protected by the general artificial intelligence industry.

(1) The connotation of the right to use data processing

The right to process and use enterprise data includes two aspects: processing, production and use. Data processing is the process of analyzing and processing data through large models according to user needs. Processing can be the analysis and processing of data elements, or the reprocessing of data products. Through data processing, new data products are formed. In the process of data use, enterprises apply the processed and produced data products to specific scenarios to solve specific needs and realize the digital transformation of enterprises.

(2) The authority to use data processing

1. It is forbidden for others to process data resources without permission

By giving enterprises the right to process and use data, enterprises can prohibit third parties from producing, processing, and using data without permission. For example, in the unfair competition dispute case between Beijing Otis Brand Management Consulting Co., Ltd. and Beijing Cheqq.com Information Technology Co., Ltd. and Zhao, Cheqq.com collected a large amount of complaint information for various brands of automobiles, reviewed, analyzed, sorted out and modified them one by one, and finally displayed them on the front-end of the website in a unified format through professional editing. This process is not a simple data collection, but a processing of consumer complaint information in a specific format and content. As a competitor in the same industry, Otis used the means of copying and transporting to take the complaint information accumulated by others as its own, and openly displayed and used it as its own business resources, infringing on the right to use data processing obtained by Cheqqwang based on data processing.

2. It is forbidden for others to use data products without permission

There are multiple application scenarios in the use of data, and enterprise data rights holders have the right to prohibit others from applying data products to specific scenarios without permission. In the current generative AI, data input and text organization products can be applied to office scenarios to effectively improve work efficiency. Data analysis and decision support products can be applied to enterprise management to assist enterprise decision-making by identifying models and trends in data; Natural language processing products can be applied to customer service to improve the customer experience through question answering and communication. With the development of general artificial intelligence, data products can be used in any scenario of study, work, production, and life, and unauthorized third parties are not allowed to use data products for scenario application.

(3) The characteristics of the rights of the right to use data processing

1. The right to process and use is the core of an enterprise's data right

Data processing is the core link of data utilization, and enterprises generate data rights based on substantial investment in data processing. Processed data can become the object of sharing and transfer, but not all processed data have exchange value, and only data that realizes the general sense and recognized "value added" in the real sense can become the object. There are two types of data products that are processed into data. One is model products, such as ChatGPT, a Wensheng language intelligence model, and Sora, a Wensheng video intelligence model. The other is knowledge products, such as analytical reports or solutions. As a factor of production, enterprise data is processed to produce new data. Therefore, in the whole data utilization system, the right of data processing is at the core. By giving the right to process and use data, the process of data processing by enterprises is guaranteed by law, and the processed products can be smoothly circulated, continuously promoting the development of the digital economy.

Data use is an important way to add value to data. As a kind of incorporeal object, data will not reduce the value of the data itself due to the use of others, but will generate new derivative value and added value due to the continuous use of countless people. On the premise of respecting the statutory prior rights and interests of the data source, data holders may use data according to their own commercial production and operation needs, including using data to analyze production and operation rules, training artificial intelligence models, processing data products, and many other ways of use. Data use is the purpose of data processing, and the right to use data is also an important power of data empowerment.

2. Data pollution and algorithmic discrimination lead to infringement of the right to process and use data

The quality of data products depends largely on the quality of the training data and the compliance of the algorithm. In the era of intelligence, the logic of big data has changed from causality to strong correlation, which objectively leads to the algorithm black box of intermediate processes. Based on the characteristics of autonomous intelligence, data dependence, and algorithm black box, enterprise data will face severe challenges in terms of application derivation, data security, and privacy protection. Data pollution and algorithmic discrimination may infringe on the right to process data and reduce the value of data. Therefore, at the source, it is necessary to improve the quality of data to promote algorithm compliance.

Data pollution infringes on the right to process and use data. In the artificial intelligence industry, data processing includes three stages: data learning, training process and result output, and the essence of data learning is the replication of data elements. Artificial intelligence generates natural and fluent dialogue content by learning language expression patterns; Improve the performance of interactive experience based on self-learning ability; Utilize a supervised fine-tuning approach to make the model more secure by fine-tuning the underlying language model on high-quality annotated data and using a language model with human feedback for reinforcement learning. High-quality data resources are the basis for the processing and use of artificial intelligence. At present, most of the training data of AI technology comes from public data on the Internet. The quality of this data varies, often concealing disinformation and value-biased political issues and ethical positions, leading to a potential risk of infringement for AI-generated content. In the general artificial intelligence stage, in order to avoid the infringement of the processing and use rights of enterprises, it is necessary to reduce the risk of such data, and continuously improve the quality of basic data through data labeling, data classification and data trading.

Algorithmic discrimination may also infringe on the right to process and use data. The right to use data processing is based on the right arising from the lawful processing and use of data, and the processing and use of data infringes on the rights of users, and the process also infringes on the right to use data processing. Algorithms are human-computer interaction decision-making, including a set of mechanisms such as code setting, data operation, and automatic decision-making. Artificial general intelligence algorithms focus on solving technical problems on how to simulate and reproduce intelligence, and in the decision-making process, data analysis may lead to unfair treatment of specific groups. For example, a Shanghai business company once used an algorithm to discriminate against customers in hotel reservations, resulting in the hotel room rates booked by customers through the app being much higher than the market rate. 4 Airlines in the United States have also used personal privacy to make a fortune, and when an airline finds that the inquirer of a ticket has recently had to travel and has not been very sensitive to fares in the past, it will make a much higher price than others. Because of the existence of algorithmic discrimination, the processing and use of data infringes on the rights of network users, and also infringes on the right to process and use enterprise data.

5. The right to circulate and trade data from the perspective of general artificial intelligence

In the era of general artificial intelligence, the generation and optimization of model products are inseparable from data training, and the value of data products has been continuously valued. Whether it is data resources or data products, they all have transaction value because of the development of the industry. By giving data circulation and trading rights, it can not only promote the market-oriented circulation of data resources and data products, but also realize the return on data investment and promote the development of the data industry.

(1) The connotation of the right to circulate and trade data

The right to circulate and trade enterprise data refers to the right of an enterprise to transfer, license others to use, and convert legally held data into data assets. The incorporeal nature and reproducibility of data determine that the circulation and trading rights of data are different from traditional property rights, and the circulation and trading of data can maximize the value of data and realize the full use of data rights. The premise for the existence of an enterprise's right to trade data circulation is that the enterprise lawfully holds the data, or lawfully obtains the data products based on the circulation, transaction, and other contractual agreements. According to the local regulations on data circulation and trading that have been released, the content is compliant, authentic and usable, has clear application scenarios or use cases, can provide test data, has continuous supply capacity or data update ability, and data products that can be priced can become data for circulation and trading.

Circulation and transaction is an important channel for enterprises to realize the value of data, and the realization of legal protection through empowerment is helpful to promote the realization of the value of enterprise data, promote the return on data investment, and strengthen the incentive orientation based on data value creation and value realization. There are many ways to circulate and trade enterprise data: First, the transfer of enterprise data. After the transfer of enterprise data, although the market-oriented circulation of data is realized, it also loses the right to use the product. The second is the licensing of enterprise data. Licensing others to use data products is the most common mode of enterprise data circulation transactions, and enterprise data rights holders can obtain the benefits of data products through licensing, and at the same time realize the circulation and reuse of data through data licensing. The third is the financing guarantee of enterprise data. Guarantee is an important way for enterprise data financing, and the market evaluation of data value can be realized by setting up a guarantee on top of the data. Fourth, investment in enterprise data. As a factor of production, enterprise data has the same value as land and capital, and the use of enterprise data for investment and equity is helpful to realize the diversified asset allocation of data.

(2) The power of the right to circulate and trade data

1. Prohibit others from trading data without permission

The enterprise data rights holder has the right to prohibit the licensee from sub-licensing and conducting data transactions beyond the license conditions, scope and duration of the license. Just as other intellectual property rights need to comply with prior agreements in the process of circulation and transactions, the licensing and trading of enterprise data also need to comply with prior contractual agreements. Article 14 of the Data Bill passed by the European Parliament in November 2023 also stipulates that data licensees shall generally not provide data products to third parties for commercial or non-commercial purposes.

There are also exceptions to prohibiting unauthorized trading of data. For example, in order to protect specific interests, data may be traded without the permission of the right holder on the premise of ensuring that the interests that are prioritized for protection are proportional to the losses and data security risks suffered by the data holder. For example, in order to promote economic development and innovation, build digital government and smart cities, increase social welfare and other public interests, data can be traded without the permission of the right holder.

2. Prohibit the cross-border flow of non-compliant data

When exercising the right to circulate and trade data, enterprises shall conduct cross-border data flow on the premise of compliance declaration for big data or important data involving personal information collected in China. In the case of general AI models, if the data input during model training involves personal information, or if the resulting data products still contain personal information, they need to undergo a compliance review before cross-border data circulation. Of course, there are exceptions to the cross-border flow of data, such as the export of data that only involves academic cooperation and international trade, data that is not collected domestically, and the reasonable and limited use of data by individuals under certain circumstances, which do not need to be subject to cross-border review.

(3) The characteristics of the rights of the right to circulate and trade data

1. The principles of fairness, reasonableness, and non-discrimination in data licensing

Compared with dedicated artificial intelligence, general artificial intelligence has a greater demand for basic data elements, and the data products produced are more concentrated, so data holding enterprises should be fair, reasonable, non-discriminatory and transparent in data circulation and transactions, so as to prevent the occurrence of data monopoly behavior. The channels for enterprise data circulation and transaction are diverse, which can be both on-site and over-the-counter transactions, and enterprise data that meets data compliance and data security can further realize cross-border circulation and cross-border transactions. By building a credible data circulation and trading system, we will enhance the availability, credibility, circulation, and traceability of data, and orderly develop cross-border data circulation and transactions, so as to achieve data sharing and mutual benefit.

2. Objectively and reasonably determine the value of data

The pricing of enterprise data should be based on multi-dimensional considerations such as data quality, data cost, product level, and buyer heterogeneity, and further release the vitality of data production factors by encouraging market players to continuously try data pricing models that meet business needs and economic laws. In the existing adjudication, the value of enterprise data is mostly calculated by statutory compensation or discretionary damages. The reason for this is that it is difficult to fully prove the actual loss or illegal profit of enterprise data, and because data transactions are just emerging, there is also a lack of data transaction records for reference. With the maturity and improvement of the data trading market, enterprise data can be evaluated by referring to the pricing of license fees in the process of circulation and transaction, and at the same time, the value of enterprise data can also be evaluated with reference to the cost of data collection, storage, mining, processing and delivery, and the economic benefits that the data service recipient may obtain.

3. Strengthen the incentive orientation based on the value of data

In the process of realizing the right to trade data circulation, the income distribution determines the optimal allocation of data. In the distribution of profits, on the one hand, it is necessary to deal with the relationship between enterprise data and personal data, and if the collected enterprise data involves personal information, the individual should be compensated in a reasonable form. On the other hand, according to the link of data processing, in accordance with the principle of "who invests, who contributes, and who benefits", the labor and capital invested by enterprises in data holding, processing and use should be fully considered, so as to ensure the rights of data holders and processing users to obtain benefits.

6. Establish a scientific and rational enterprise data rights system

It is precisely because of the rise of artificial intelligence technology that the value of enterprise data has been continuously valued, and the need for protection of enterprise data empowerment has arisen. The construction of an enterprise data rights system shall be based on the promotion of industrial development, and an enterprise data rights system that is scientific and reasonable, and that blends leniency and severity. The construction of enterprise data rights system should be value-oriented to promote the development of the artificial intelligence industry and realize the circulation and reuse of enterprise data.

(1) The construction of enterprise data rights should balance public and private interests

Enterprise data empowerment provides forward-looking institutional arrangements for the commercialization of data products and promotes the circulation and protection of data. It is necessary not only to ensure that the right to hold data resources, the right to process and use data, and the right to circulate and trade data in the enterprise's data rights produce effective exclusivity in different data circulation links, but also to carry out necessary rights restrictions to protect the rights and interests of other data industry players. In terms of time and space limitations, it is necessary to give enterprises the right to hold data resources, the right to process and use data, and the right to circulate and trade data to differentiate the term of rights, and to prevent the long term of rights from affecting the circulation and reuse of data. In terms of fair use, the small-scale use of data in science, education, culture and health, and the processing and use of data in the public sector and for scientific research purposes should be included in the reasonable use of enterprise data. In terms of statutory permissions, when a major epidemic occurs or public interests are involved, enterprises can be required to disclose AI-generated data to assist in public management. At the same time, through the establishment of the necessary data sharing system, the necessary data in key areas will be ensured.

(2) The construction of enterprise data rights should optimize the rights structure

Enterprise data rights, which are dominated by the right to hold data resources, the right to process and use data, and the right to circulate and trade data, are essentially a rights system formed with data resources and data products as the distinguishing objects. Enterprises form the right to hold data resources through the management of data resources, and the data resources are processed and used to form data products, and the civil legal relations generated in the process of the formation of data products constitute the object of rights for data processing and use, and the operation of data products generates the need for the right to circulate and trade data of enterprises. Enterprises desensitize, clean, classify, sort, and integrate big data to form data resources, and no new data is generated in the process, which is the obvious difference between the right to hold data resources and the right to process and use data in the object. In the enterprise data rights system, the right to hold data resources is the basic right, the right to process and use data is the core right, and the right to circulate and trade data is the value realization. Among the three rights of enterprise data, the most important is the right to process and use data, and the construction of data processing and use rights is essential to protect AI-generated data products. The construction of enterprise data rights is essentially a rights system with the right to process and use data as the main body, the right to hold data resources and the right to circulate and trade data as the "two wings".

(3) The construction of enterprise data rights should coordinate the rights relationship

The positioning of enterprise data, personal data, and public data is different, and the corresponding rights system is also different. However, enterprise data rights are not completely independent from personal data and public data, and many of the data pre-trained by general artificial intelligence come from personal data, and the data products formed by large models will eventually flow to society and become public data after a period of protection. It is necessary to pay attention to the intersection of the right to hold data resources and the right to personal data, and protect personal information and compensate personal interests in the process of exercising the right to hold data resources. It is also necessary to clarify the relationship between the right to process and use data, the right to circulate and trade, and the right to public data, reasonably consider the timeliness of enterprise data, and form a dynamic assessment system for enterprise data rights.

epilogue

At present, the world has embarked on a new round of technological revolution, the core of which is to turn the intelligence problem into a data problem. If capital and the steam engine are the driving force of global modernization since the Age of Discovery, then data is the core driving force of the technological revolution in the intelligent era. Data and algorithms are the two cornerstones of artificial general intelligence, and data, especially enterprise data, is crucial to the development of artificial general intelligence.

As a new type of intellectual property right, enterprise data rights are essentially weak rights with a short term of protection. Enterprise data empowerment from the perspective of general artificial intelligence needs to be interpreted legally based on the characteristics of large data volume, wide range of sources, and rapid iteration in the intelligent era, based on the three-power separation structure of data resource ownership, data processing and use rights, and data circulation and transaction rights in the "Data 20 Articles". By granting limited exclusivity to enterprise data, data investment can be effectively incentivized and data flow can be facilitated. Just as the emergence of trademark rights has promoted the improvement of product quality and the prosperity of business operations, the empowerment of enterprise data will also promote the development of general artificial intelligence and accelerate the arrival of the intelligent era.

Chen Jianlin|Types of enterprise data empowerment from the perspective of general artificial intelligence

Read on