Risk and Governance of Generative AI – The Case of ChatGPT

Generative AI, represented by ChatGPT, brings many risks while creating social welfare. Therefore, it is imperative to clarify the relationship between the application value and potential risks of generative AI in mainland China in light of the development status of generative AI, so as to effectively resolve the risks without affecting the development of applications.

The operation mechanism of generative AI is mainly divided into three stages, namely, the preparation stage of machine learning and manual labeling, the operation stage of using algorithms to process data to obtain the processed results, and the generation stage of data processing output products to be exported to society and have an impact. At present, the most prominent risks of generative AI are data compliance risks in the preparation stage, algorithmic bias risks in the computing stage, and intellectual property risks in the generation stage.

Data compliance risks in the preparation phase. The current data compliance system in mainland China is based on the Cybersecurity Law, the Data Security Law and the Personal Information Protection Law, which require data processors to take necessary measures to ensure basic data security, network security and personal information security during processing. Based on the legal framework of the mainland, the data compliance risks of generative AI are mainly reflected in three aspects: compliance risks of data sources, compliance risks of data use, and data accuracy risks. The first is the compliance risk of data sources. Generative AI, such as ChatGPT, often needs to collect a large amount of data for training in the initial stage. Based on this, the following issues may be faced: first, whether the user consents to the collection of personal information; second, whether the collection and use of publicly available information is within a "reasonable scope"; The third is whether the collected samples are protected by copyright and can be recognized as "fair use" during training. The second is the risk of data use compliance. On the one hand, there is the risk of data breaches. Users will transfer personal information, business information, and even trade secrets to ChatGPT. Analyzing the operating mechanism of ChatGPT, it is not difficult to find that in iterative training, it will also use the information input and interaction information of the user. Therefore, how to ensure the security of this data is a big problem. On the other hand, it is more difficult for users to exercise their right to delete personal information. Although OpenAI's privacy agreement stipulates that users have relevant rights with respect to their personal information, given the complex nature of requiring generative AI systems to delete data, there is still great uncertainty about whether developers can achieve true deletion of personal information and thus meet the requirements of regulatory compliance. Finally, there is the risk of data accuracy. Because in the early days of ChatGPT training, the content that is fed into the data is obtained and selected by the developer from the network, so there is a possibility that the generated content will be inaccurate due to missing or wrong data, etc.

Risk of algorithmic bias during the run phase. "Machine learning", assisted by "human annotation", improves the intelligence and accuracy of generative AI through the combination of the two. However, this also increases the probability of algorithmic bias dramatically. This combination method is more reflective of people's subjective judgments and preferences than traditional machine learning methods, because people add their preference information to the machine learning model, which increases people's bias, and this bias is difficult to track and prevent. After analyzing the operation of ChatGPT, it is found that the algorithm bias is mainly manifested in two aspects: first, because the received data needs to be manually labeled, there are certain errors in the understanding process. Second, the processing of data, when ChatGPT processes the data and draws conclusions, it needs to be corrected because the original results are inconsistent with public expectations, but this process also produces a certain degree of algorithmic bias.

IP risks at the generation stage. The rise of generative AI has posed new challenges to many industries, and the most impactful is the challenge to the field of intellectual property in the generative stage. Because generative AI is highly intelligent, the ownership of intellectual property rights has changed subversively compared to previous AI systems in the computing process. ChatGPT is a kind of generative artificial intelligence, which is far stronger than analytical artificial intelligence in processing and analyzing data, and its content generation process mainly includes automatic content compilation, intelligent trimming and processing, multi-modal conversion, creative generation, etc., which directly affects the content production mode and content supply mode of publishing. Although the creator of ChatGPT contains some creative elements of natural persons, which in a sense, is more in line with the constituent elements of the work, there is still a debate about whether this kind of work created by generative artificial intelligence can be empowered, and the research on the specific empowerment criteria is still blank. As a result, intellectual property risk has become the third biggest risk that generative AI cannot avoid.

In view of the above-mentioned risks of generative AI, it is recommended to adopt the following three countermeasures to resolve the risks.

Strengthen the data compliance construction of generative AI enterprises. The development of generative AI should not only focus on capability and efficiency but ignore security, and relevant enterprises should use a good data compliance system to ensure data security. The construction of enterprise data compliance can be strengthened through three measures. First, establish the principle of data compliance. There are four main principles, namely the principle of legal compliance, the principle of notification and consent, the principle of legitimate purpose, and the principle of minimum necessity. Second, establish a multi-technical mechanism for data compliance. First of all, the industry standards at the macro level should be unified. Authorities in various industries should take the lead in establishing a "Xinhua Dictionary" of data versions, so that data encoding and formats are consistent, and ensure that the source, content and processing logic of data can be "counterfactually verified". The second is the internal and external review system at the mesoscopic level. Establish a special data compliance organization internally to be responsible for the daily data compliance processing of the enterprise, and introduce a third-party review mechanism externally to audit and conduct ethical review of the enterprise's data compliance. Finally, there are ethical norms at the micro level. Ethical norms and principles are embedded in the behavioral logic of technology applications in the form of law, so that they can be adapted to the situation. Third, improve laws related to data compliance. The first is to improve legislation and accelerate the introduction of basic laws on data and artificial intelligence at the legislative level, so as to serve as top-level guidance for enterprise data compliance laws. The second is to improve law enforcement, clarify the law enforcement authority of various departments as soon as possible, and avoid the situation of "multi-headed governance" resulting in "Kowloon water control". Finally, it is necessary to improve the judiciary and the electronic evidence system to protect the relevant litigation rights of rights holders.

Combining technology and management to correct the algorithmic bias of generative AI. This consists of two main measures. First, in view of the congenital algorithmic bias that occurs in the process of generative AI machine learning, the learning path of the relevant algorithm model should be adjusted, the relevant norms and technical standards should be complied with, and a substantive review should be conducted before the generative AI is put into the market. In view of the characteristics of generative AI, its deviation correction work can be divided into two aspects: on the one hand, algorithmic program compilation is used to prevent possible innate biases in machine learning; On the other hand, the standard of manual annotation should be set to improve the practice level of practitioners to cope with the algorithmic bias of manual annotation. Second, the acquired algorithmic bias derived from the self-learning of generative AI should be eliminated by establishing an agile, automated, and whole-process regulatory system. First of all, realize the automatic supervision of algorithm technology. Automate governance for machine learning and human annotation, pause output whenever algorithmic bias occurs, and return to find the root cause of the problem. Second, establish a multi-subject supervision model. Administrative entities, platforms, industry associations, and enterprises themselves participate in the supervision. Finally, implement a whole-process agile supervision mechanism. Supervise the whole process of generative AI output conclusions, effectively reduce the probability of wrong conclusions due to algorithmic bias, and effectively promote the construction of a credible algorithm system.

Adopt a limited protection model to protect against the intellectual property risks of generative AI works. Compared with traditional AI technology, generative AI is innovative in that it has a certain degree of self-awareness and participates in the processing and creation of outputs. If all of its achievements are protected based on their self-perception, there may be a future in which generative AI companies have "creative hegemony" in their hands. But from a business perspective, it would be unfair for generative AI companies to spend a lot of money and technical capital to build highly intelligent AI programs, and not to protect the "works" derived from that program at all. Therefore, the intellectual property attributes of ChatGPT products should be comprehensively evaluated according to their technical operation mode, degree of participation, degree of innovation, etc., and a differentiated limited protection model should be adopted for the intellectual property rights of their products. Wait until generative AI has reached a certain stage in the future and have a deep understanding of how it works, and then determine the specific IP protection model.

Generative AI represented by ChatGPT is in the ascendant, and many of the legal risks it poses should be properly addressed within the existing legal framework. In the face of risks and problems, the development of generative AI should not be limited just because there are risks in the industry and controversial theories. This requires the integration of "law + technology" to create a good market environment and ensure that the generative AI market thrives.

For more information, please click Full-Scenario Live Streaming Solution - Aerospace Cloud Network Solution

Risk and Governance of Generative AI – The Case of ChatGPT

Read on