The challenge and governance path of generative AI to personal information protection

Author: Wan MeixiuAuthor's Affiliation: 1.Nanchang University School of LawAbstract: Generative artificial intelligence technology represented by ChatGPT has brought subversive changes to all walks of life, but it has also triggered personal information infringement crises such as personal information leakage, algorithmic bias, and dissemination of false information. The traditional "rights-based protection" approach overemphasizes the protection of personal information and hinders the development of the AI industry, while the "risk-based prevention" path highlights the reasonable use value of personal information and makes better value selection. However, only by co-managing with rights protection and risk protection can we achieve a balance of interests and establish a long-term protection mechanism for personal information. In terms of personal information processing rules, the rigid and strict informed consent rules are replaced by "weak consent" rules; In the principle of purpose limitation, replace "purpose limitation" with "risk limit"; In the principle of minimization of personal information, "risk minimization" is used to replace "purpose minimization". On this basis, it is necessary to further strengthen the compliance supervision of generative AI data sources, improve the transparency and explainability of algorithms, and strengthen the ethics of science and technology and the pursuit of infringement liability.

0 Introduction

Generative AI, represented by ChatGPT, has set off the fourth wave of global technological revolution and has become a new engine for global economic growth [1]. However, as a new generation of artificial intelligence technology, generative AI has brought many legal risks to personal information protection while constantly iteratively updating and transforming production relations. The operation of generative AI is based on the personal information of massive users, and the use of personal information is inseparable from the input end, simulation training end, simulation optimization end, and output end. In the context of large-scale data processing and opaque algorithmic black boxes, generative AI has given rise to problems such as illegal collection of personal information, creation of false and harmful information, algorithmic bias and discrimination.

In this regard, regulatory authorities in various countries have paid extensive attention, and the governments of the United States, France, Italy, Spain, Canada and other countries have announced the investigation and supervision of ChatGPT, and issued corresponding regulatory specifications. On July 10, 2023, the Cyberspace Administration of China and seven other departments also jointly issued the Interim Measures for the Administration of Generative AI Services (hereinafter referred to as the "Interim Measures"), clarifying specific measures to promote the development of generative AI technology and making a positive and powerful response to support and regulate the development of generative AI. However, it should be noted that the provisions on personal information protection in the Interim Measures only refer to the relevant provisions of the PIPL in Articles 4, 7, 9, 11 and 19, and lack specific provisions on the new problems arising from the use of generative AI technology to infringe on the rights and interests of personal information, and the continued application of the PIPL faces many application difficulties. How to find a balance between promoting the innovation and development of generative AI technology and personal information security is a problem of the times posed by the new generation of AI technology. In view of this, this paper intends to analyze the challenges brought by generative AI to personal information protection based on the operational logic of generative AI technology, and discuss the governance principles and governance paths of personal information protection based on the spirit embodied in the Civil Code, the Personal Information Protection Law and the Interim Measures, and put forward specific governance countermeasures on this basis, in order to provide preliminary solutions to a series of problems brought about by the application of generative AI technology to personal information protection. It is useful to explore the problem of personal information protection in the era of artificial intelligence.

1. The operating logic of generative AI

At present, there are two main types of AI technologies: discriminant/analytical AI and generative AI [2]. Among them, decision-making artificial intelligence uses machine learning, deep learning, and computer vision technologies to train the conditional probability distribution in the data and make decisions to judge the probability that the sample data belongs to a specific target. Generative AI, on the other hand, uses deep neural networks to learn input and train data, summarizes existing large-scale datasets, abstracts the essential laws and probability distributions of data, and then generates new data based on these laws and probability distributions. The "generative adversarial network" deep learning model proposed in 2014 is the most influential, which makes the generated data original through generators and discriminators. Since then, with the breakthrough of natural language processing algorithms such as "recurrent neural networks", "pre-trained language models" and "Transformers", generative AI has developed rapidly and is widely used in content generation, human-computer interaction, product design and other fields. Taking ChatGPT as an example, GPT4, launched by OpenAI in the United States, is based on the Transformer model, pre-trained to predict the next instruction in the document, and fine-tunes the reinforcement learning model from human feedback using publicly available data (such as Internet data) and data licensed by third-party providers [3]. After pre-training, when a user enters a question, ChatGPT will convert the question into computer data and use the algorithm model to form the corresponding text, picture, video and other datasets, and finally output new content with a certain degree of originality from the dataset that meets the requirements through continuous improvement and optimization. The principle of operation is shown in Figure 1.

Figure 1: How generative AI works

From the underlying operating logic of ChatGPT, it can be seen that the development of a new generation of generative AI benefits from the application of algorithms, computing power and data, and technological breakthroughs. At the algorithm level, it uses a pre-trained language model (LM) as the initial model to generate content that basically meets the requirements, then collects data and trains a scoring model (BM) to evaluate whether the generated content is human-friendly, and finally uses reinforcement learning (RL) to iteratively update the scoring model to generate high-quality content that conforms to human cognition [4]. At the computing power level, generative AI needs to efficiently perform complex computational tasks and optimize generated content through continuous training and inference. At the data level, training and optimizing AI models requires a large amount of data, and the use of web crawler technology can obtain massive amounts of data from multiple channels such as social media, public institutions, and sensors. Therefore, the continuous optimization and iterative development of generative AI is inseparable from the driving force of the above-mentioned troika of computing power, algorithms and data, data is the basis of generative AI training, algorithms are the core of generative AI optimization, and computing power provides technical support and guarantee for the development of generative AI. However, the massive amount of data that is the basis for generative AI training is collected by developers through various means, which involves a large amount of personal information processing, and developers do not fully handle it in accordance with the Personal Information Protection Law and other relevant regulations, which brings many risks and challenges to personal information protection.

2. The challenge of generative AI to the protection of personal information

21 input: illegal scraping and excessive collection

The input side of generative AI is the source of personal information leakage, and its legal risks are mainly concentrated in two stages: one is the initial database of the simulation training side, and the other is the updated database of the simulation optimization side.

Judging from the initial database, generative AI has a large number of "black history" of illegally scraping personal information, and the notification and consent rules for processing personal information are void. The Personal Information Protection Law and the Civil Code of the People's Republic of China clearly stipulate that the processing of personal information shall be subject to the obligation to inform and obtain the consent of the individual, and the reasonable processing of disclosed personal information shall not require the consent of the individual, but the obligation to inform shall also be fulfilled[5]. Taking generative AI ChatGPT as an example, its initial database is mainly data obtained from public sources before 2021 using web crawler technology, which contains a large amount of personal information such as account information, social media information, and whereabouts. However, most users are unaware that their personal data is being used for simulation training, let alone "consent". Under the mode of deep learning and unsupervised learning, a large amount of public personal information that has a significant impact on the rights and interests of individuals is illegally captured, and the notification and consent rules are useless. Accordingly, how to ensure the reasonable use of the initial database that has been captured and applied to generative AI simulation training at this stage and prevent infringement of individual rights and interests has become an urgent problem to be solved.

Judging from the updated database, generative AI has "bad behaviors" of excessive collection of personal information for a long time, and the principle of personal information minimization has been hollowed out. Like humans, generative AI cannot survive once and for all with an inherent body of knowledge, and it needs to constantly update data to improve the accuracy and credibility of its outputs. However, in fact, the rules for the collection and processing of personal information at this stage have not been implemented.

First, the principle of purpose limitation faces a dilemma in its application. Paragraph 1 of Article 6 of the Personal Information Protection Law of the People's Republic of China stipulates that the processing of personal information shall have a clear and reasonable purpose and be directly related to the purpose of processing. Article 17 stipulates that changes in the handling of personal information shall be notified in a timely manner. Judging from the corporate privacy policy published on OpenAI's official website, it claims that personal information may be used for "purposes such as improving services, developing new projects, preventing abuse of services to commit crimes, and carrying out business transfers"[6], but this statement is highly general and vague, and there is no corresponding explanation of the retention period, deletion, and change notification of personal information, and users can only choose to accept it, otherwise they cannot continue to use it. In addition, from a technical point of view, at present, generative AI cannot automatically identify "information related to the purpose of processing", but adopts a "package agreement" to capture all of it, which undoubtedly exacerbates the risk of infringement of personal information rights and interests.

Second, the principle of personal information minimization faces a dilemma. According to Article 6, Paragraph 2 of the PIPL, the collection of personal information shall be limited to the minimum scope to achieve the purpose of processing, the so-called "personal information minimization principle". Judging from Articles 1, 2 and 3 of the Privacy Policy published on OpenAI's official website, it can collect user account information, communication information, technical information, social information, input or uploaded content information, and any other information provided. However, OpenAI's inclusion of all user information, such as the type of device accessed, the operating system, the way the service interacts, and any other information that is available to use, is not necessary to use the generative AI service, which is obviously an excessive collection of personal information and violates the principle of minimum necessary personal information.

Third, the rules for the handling of sensitive personal information face the dilemma of applying. The PIPL divides personal information into general personal information and sensitive personal information, and the law provides special handling rules because the leakage of sensitive personal information will pose a serious threat to the person's person and property. According to Articles 28 and 29 of the PIPL, the processing of sensitive personal information shall be subject to the individual's separate consent and strict protective measures shall be taken for a specific purpose and when it is sufficiently necessary. However, generative AI does not make any distinction when it comes to collecting personal information from users. More importantly, it transmits all the historical information used by the user to the terminal server and saves it in the cloud in real time for future model optimization training. Although Article 2 of OpenAI's privacy policy states that all personal information collected by ChatGPT will be aggregated or identified, Article 3 immediately states that it will be shared with third parties. With the help of additional third-party information and related technical means, even if the anonymized information is still identifiable [7]. De-identified personal information will face the risk of re-identification, which will exacerbate the risk of personal information leakage. On March 20, 2023, ChatGPT had the leakage of sensitive personal information such as some users' chat records, credit card payment information, and emails, which raised concerns about the protection of personal information by regulators in various countries. It can be seen that the current legislation lacks specific provisions on the infringement of personal information rights and interests by generative AI, and cannot provide individuals with clear behavioral expectations.

22 Simulation training end: algorithm black box and excessive mining

In the simulation training of generative AI, the use of algorithms is inseparable, and the "algorithm black box" that is not open and opaque leads to the crisis of personal data infringement, and it is difficult to implement the principle of openness and transparency in the processing of personal information. According to Articles 7 and 24 of the Personal Information Protection Law, the processing of personal information shall follow the principle of openness and transparency, and the use of personal information for automated decision-making shall also ensure the transparency of decision-making and the fairness and impartiality of the results. The essence of generative AI algorithm operation is the process of data input and output, but there is an unexplained "black hole" between input and output, which leads to the problem of "algorithm black box" [8]. More importantly, the algorithms of generative AI have been further improved compared with previous artificial intelligence, and they do not follow the process of data input, logical reasoning, and prediction of traditional algorithms, but gradually have certain self-learning and decision-making capabilities with the help of deep learning models, and generate new works directly on the basis of original data through self-learning [9]. With the increasing frequency of autonomous learning of generative AI algorithms and the continuous iteration of algorithms, the hidden layers of technology are becoming more and more complex, and their logic is beyond the scope of the general public's understanding, coupled with the asymmetry of information, which deepens the opacity and incomprehensibility of algorithms, and exacerbates the "black box" attribute of algorithms, which obviously cannot ensure the fairness and justice of the implied results behind algorithms, and directly violates the principle of openness and transparency in personal information processing. At present, ChatGPT has not published its algorithm rules, and Baidu's "Wenxin Yiyan" and Alibaba Cloud's "Tongyi Qianwen" have not been announced, which obviously poses a serious challenge to the principle of openness and transparency stipulated in the PIPL.

In the process of simulation training and optimization, generative AI over-excavates personal information through deep learning algorithm models, so that de-identified personal information or even anonymized information is re-identified, which aggravates the risk of personal information leakage. The use of personal information by generative AI is not limited to the simple processing of traditional AI, but through deep mining through strong reasoning ability, to discover the hidden internal connections between information subjects. For example, a study by the University of California, Berkeley, shows that AI systems can analyze data about a user's movements in AR and VR environments, infer hundreds of relevant parameters from them, and reveal personal information with astonishing accuracy. In fact, even if there is no personal information about a person in the generative AI training dataset under the existing technical conditions, its characteristics can be inferred on the basis of in-depth mining in combination with other information, such as gender, age, race, education, etc. It can be seen that the new generation of artificial intelligence shows strong self-learning ability, deep synthesis ability and logical reasoning ability, which brings great challenges to personal information protection.

23 Output: Algorithmic bias and false information

At the output end of generative AI, because the algorithm itself is not technologically neutral, the "algorithm black box" aggravates the non-neutrality of the algorithm and causes bias in the output results. First of all, in terms of algorithm design, the underlying algorithms of generative AI are designed by developers with subjective preferences, and the inherent cognitive biases of developers will inevitably form algorithm biases. Secondly, in terms of deep learning technology, the self-learning ability of generative AI continues to develop iteratively, but machine learning does not screen the value orientation of the information in the database, which leads to the formation of generative AI and deepens the algorithm bias embedded in developers. Finally, in terms of data sources, the data quality of simulation training is uneven, and a large number of false data, missing data, polluted data, and incomplete data input lead to the final generation of discriminatory content. In addition, the non-disclosure and opacity of the "algorithmic black box" puts on a reasonable technical cloak for "algorithmic bias", which makes it difficult to detect biased behavior, thereby exacerbating discrimination and prejudice against specific groups, and also bringing a crisis to the traditional protection of equality rights [10]. Although OpenAI states on its official website that ChatGPT has been optimized through algorithm settings and simulation training, and can reject unreasonable requests from users to a certain extent, such as generating content that violates the law, public order and good customs such as sexism, racial discrimination, violence, gore, pornography, etc., in fact, the risks it brings to users and non-users still exist. Previously, Amazon was exposed to the use of artificial intelligence-trained algorithms for recruitment, and there was a problem of patriarchal sexism. It can be seen that algorithmic bias presents various unreasonable discrimination, which leads to deep-seated inequality and discrimination.

At the output end of generative AI, perpetrators can also use deepfake, deep synthesis and other technologies to generate false information to commit crimes such as insult and defamation, rumor-mongering, and property fraud, and the authenticity and accuracy of personal information as stipulated in Article 7 of the PIPL cannot be guaranteed. Since generative AI does not have the ability to screen the authenticity and accuracy of the input data, it does not guarantee the authenticity and accuracy of the output results, and may cause problems such as "serious nonsense", "correct nonsense", and fake news, thus infringing on the rights and interests of personal information. More importantly, this flaw can be easily exploited by criminals to commit crimes. On April 25, 2023, Hong in Gansu used artificial intelligence technology to concoct a false information that "a train in Gansu hit a road repair worker this morning, killing 9 people" for profit, and was investigated by the police. It can be seen that the emergence of generative AI has led to the generation and dissemination of a large amount of false information, infringing on the rights and interests of personal information, and causing serious social problems.

The challenges of generative AI to personal information protection are illustrated in Figure 2.

Figure 2: Challenges of generative AI to personal information protection

3. The governance path of personal information protection in the context of generative artificial intelligence

3.1 "Rights protection" and "risk prevention" are jointly governed

Based on the above, generative AI brings many risks and challenges to personal information protection. In this regard, the traditional personal information protection rules stipulated in the Civil Code, the Personal Information Protection Law and the Interim Measures are all facing difficulties in applying. The root cause lies in the fact that individualism and static personal information protection approaches are difficult to adapt to the development of science and technology, and it is urgent to seek a more reasonable personal information protection system to ease the tension between the two. Based on the concept of people-oriented, it is required to strengthen the protection of personal information; Based on the concept of promoting and regulating the development of the artificial intelligence industry and encouraging innovation, it is required to impose certain restrictions on the protection of personal information. Therefore, only by correctly understanding and coordinating the relationship between personal information protection and the innovative development of generative AI can AI better serve economic development and social progress.

From the perspective of overall regulatory principles, countries around the world have two legislative attitudes towards the development of generative AI, "conservative" and "open", and have issued corresponding laws and regulations to regulate them. Based on the tragedy of the two world wars and the large-scale serious human rights violations by fascists, European countries attach great importance to the protection of basic human rights such as human dignity and personal freedom [11], therefore, they have been more cautious in the supervision of AI for a long time, adopting the governance principle of "regulating first and then developing, and steadily promoting supervision", and established the ethical framework for the development of AI in the EU with the General Data Protection Regulation and the Ethical Guidelines for Trustworthy AI, and the Artificial Intelligence Law and the Ethical Guidelines for Trustworthy AI Operational legal regulations have been further strengthened. Based on the huge impact of ChatGPT and the need to maintain its international leading position in the field of artificial intelligence, the United States is relatively open to the governance of artificial intelligence, adopts the governance principle of "prudential supervision to promote industrial innovation", and has successively promulgated the American Artificial Intelligence Initiative and the Artificial Intelligence Capabilities and Transparency Act to promote the development of the artificial intelligence industry through a combination of corporate self-regulation and government regulation [12]. Judging from Article 3 of the Interim Measures, the mainland generally adheres to an open and inclusive attitude towards the development of generative AI and steadily promotes the development of the AI industry. On the one hand, it adheres to the concept of people-oriented, protects basic human rights, protects personal information and personal interests, and realizes individual autonomy. On the other hand, taking into account the new environment and new ways of using personal information in the era of artificial intelligence, necessary restrictions are made on the protection of personal information to safeguard the public interest and social interest. In other words, under the premise that personal information is relatively safe, the rules for the strong protection of personal information should be adjusted, and the rational development and use of personal information should be used to promote the development of the artificial intelligence industry, so as to seek a balance between the protection of individual rights and interests and the protection of corporate interests.

From the perspective of specific personal information protection rules, there are two paths for personal information protection in mainland China in the context of generative artificial intelligence: "rights-based protection" and "risk-based prevention". Among them, the "rights-based protection" approach originates from the principle of fair information practice born in the United States in 1973, which guarantees the exercise of control rights by individuals by empowering individuals with information and imposing obligations on information processors [13]. However, due to the fact that personal information is not only related to personal interests, but also has a public and social nature [14], it is difficult for the rules of strong protection of personal information to safeguard public interests and adapt to the development of the artificial intelligence era. Therefore, a "risk-based prevention" approach was proposed and gradually applied to the legislation of personal information protection in various countries. In 2013, Digital Europe, a well-known think tank, proposed a plan to reform the EU's personal data protection law, starting from strengthening corporate responsibility rather than the control rights of information subjects, and requiring companies to design rules to prevent risks [15]. Subsequently, the European Union's General Data Protection Regulation (GDPR) introduced this "risk-based" approach when it amended its personal data protection laws. In the EU Artificial Intelligence Act, a regulatory path of risk-based governance has also been established, and differentiated supervision has been carried out at each level. The Personal Information Protection Law enacted by the mainland also embodies the theory of "risk-based" prevention. For example, the distinction between "general personal information" and "sensitive personal information" and the provision of different processing rules for personal information actually implies a priori and abstract risk presumption in specific scenarios, that is, the processing of sensitive personal information may have a more serious adverse impact on individuals and society[16].

The author believes that the theory of "risk-based prevention" can better deal with the series of problems brought about by the infringement of personal information rights and interests by generative AI, and the application of this theory is justified. First, the Interim Measures reflect the attempt of mainland policymakers to address the personal information protection challenges posed by generative AI from a "risk-based" governance path. As can be seen from paragraph 2 of Article 5 of the Interim Measures, personal information processors are still obliged to take appropriate measures to prevent various social risks that may arise in the process of personal information processing. In a sense, the promulgation of this policy also provides effective guidance for the formulation of laws in the field of artificial intelligence and the application of risk prevention theories in the future. Second, the "risk society" requires "risk control". Contemporary society is a "risk society" where risks are ubiquitous, unpredictable, and often cause irreparable damage. Once the personal information collected by generative AI is leaked or misused, irreversible damage will be brought to the personal information subject. Therefore, it is more advantageous to change the previous single empowerment protection model and post-event accountability mechanism, and strengthen prior risk prevention from the perspective of risk prevention, that is, to build a comprehensive protection system for personal information from the dimension of risk control, and strengthen the risk prevention responsibility of information processors and the personal prevention responsibility of information subjects. Third, the path of "risk-based prevention" is conducive to achieving a balance of interests and promoting the development of the AI industry. In contrast, the "strong protection" of personal information based on the "rights protection" path ignores the reasonable use value of personal information, and cannot cope with the development of the new era and the crisis of personal information infringement in modern society, which is becoming increasingly risky. The "risk-based prevention" approach is a compromise governance scheme, which strengthens the risk prevention obligations of information processors and the personal risk responsibilities of information subjects from the perspective of risk control by appropriately expanding the scope of reasonable use of personal information, and carries out pre-prevention and responsibility allocation for risks that may occur in specific scenarios, so as to make better value choices in preventing the occurrence of risks and relieving them afterwards. However, it should be noted that the governance path of "risk prevention-based" advocated in this article does not completely abandon "rights-based protection", but weakens the "strong rights" protection model to realize the reasonable use value of personal information. It is true that the rights and interests in personal information, as the most basic personality right of natural persons, should still be protected by the basic rights. Only by adhering to the joint governance of the two paths of "rights-based protection" and "risk-based prevention" can we achieve a balance between the interests of all entities and build a long-term protection mechanism for personal information.

32 Establish a compliance supervision mechanism for data sources

To solve the problem of illegal capture and excessive collection of personal information at the input end of generative artificial intelligence, it is necessary to prevent it from the source of data and establish a compliance supervision mechanism for data sources. For the initial database, since the information rights holder has lost the right to self-control of personal information, it should seek post-event remedial measures to protect its legitimate rights and interests. First, at the technical level, service providers should take strict protective measures to prevent the leakage of personal information. For example, information that has been de-identified is further anonymized by means of desensitization, encryption and other technical means, so that it cannot be re-identified to a specific natural person. Second, in terms of liability for infringement, it is necessary to consider factors such as the fault of generative AI in collecting personal information without permission in advance, the failure to fulfill the necessary duty of care for the occurrence of infringement, and the failure to take remedial measures afterwards. Forcing service providers to regularly carry out compliance monitoring of the original database of personal information that has already been collected but obtained without permission, and strengthen their obligations to ensure the security of personal information.

For updating databases, service providers should also strengthen compliance oversight of data sources, and strictly follow the rules for the collection and handling of personal information. First, establish a mechanism for assessing the impact of personal information. Article 55 of the Personal Information Protection Law of the People's Republic of China clarifies the obligation of personal information processors to assess the prior processing of specific personal information, including the processing of sensitive personal information that has a significant impact on the rights and interests of individuals. Personal information impact assessment is a prerequisite for service providers to process personal information, and it is also the basis for their continuous and stable operation. Therefore, service providers should carry out impact assessments before processing personal information, and independently assess whether the data sources crawled are compliant, whether they infringe on the rights and interests of personal information, the intellectual property rights of others, the rights and interests of fair competition, and so forth, and employ corresponding protective measures based on different impacts. Second, establish a hierarchical supervision mechanism for personal information. Articles 3 and 16 of the Interim Measures refer to "categorical and hierarchical supervision" twice, but do not specify this. In the author's opinion, service providers should distinguish between different types of personal information and establish different information processing mechanisms when collecting personal information: (1) distinguish between general personal information and sensitive personal information. For the processing of general personal information, the rigid and strict principle of informed consent is difficult to meet the needs of safeguarding the public interest and the development of the digital economy[17], and a "weak consent" rule should be established between the protection and use of personal information, and the "risk-based prevention" approach should be adopted to require service providers to assess the legality, compliance, and reasonableness of personal information processing in advance. In principle, the term "purpose limitation" is replaced by "risk limitation", and the subsequent use of personal information by the enterprise does not exceed the risk scope of "the original degree and cannot be predicted by the user", and the risk is controlled to a reasonable level to achieve the specific purpose. In the principle of personal information minimization, "risk minimization" is used to replace "purpose minimization", and enterprises should take measures such as anonymization to reduce the risk to the lowest level to achieve the purpose for the secondary use of personal information[18]. However, for sensitive personal information, the rules of notification and consent are strictly followed to avoid infringement of personality rights and interests. Where sensitive personal information is handled when necessary, strictly employ desensitization and encryption technical measures such as anonymization, rather than simple de-identification. (2) Distinguish between those that have a significant impact on the rights and interests of individuals and those that do not have a significant impact on the rights and interests of individuals. Service providers shall conduct a risk assessment of personal information before processing it. where there is a major impact on individual rights and interests, strictly follow the rules for notification and consent to obtain the individual's separate consent. Where there is no major impact on individual rights and interests, it is not necessary to obtain the individual's separate consent, but technical measures shall still be employed to prevent infringement of individual rights and interests. Third, carry out regular monitoring of enterprise data compliance. Generative AI service providers shall establish long-term mechanisms for preventing personal information handling risks, periodically conduct compliance reviews of conduct involving the handling of personal information in products or services, and promptly employ necessary measures to prevent potential risks or security risks when discovered.

33 Improve the transparency and explainability of the algorithm

The essence of the "algorithm black box" problem in generative AI simulation training is that complex algorithms are neither observable nor understandable by ordinary people. Therefore, to govern the "algorithm black box", we must first open the "black box" and promote the openness and transparency of algorithms. However, it should be noted that the openness and transparency of the algorithm does not mean that the specific code and programming of the algorithm should be disclosed, but that the algorithm should be explained and explained as necessary [19]. The reason for this is that, on the one hand, the source code of the algorithm is extremely complex, which is difficult for the public to understand, and even if it is made public, it can even lead to hacker attacks and be used by criminals to commit crimes. On the other hand, the disclosure cost of algorithms is relatively large, and most of them involve the company's trade secrets, which are generally not consciously disclosed by enterprises based on their own interests. Therefore, to promote the transparency of generative AI algorithms, it is necessary to publicly explain the major interests of users from aspects such as algorithm design, algorithm functions, algorithm risks, algorithm logic, and algorithm types, and accept the review of algorithm regulatory departments and social supervision, so as to ensure that algorithms are fair, just, and responsible. Second, it is necessary to strengthen the interpretability of algorithms. Due to the high degree of technical and complex nature of algorithms, it is difficult for the public to know the decisions behind the algorithms only by making them public, so it is necessary to strengthen the interpretability of algorithms, and use the interpretability technology of algorithms to reveal the process, results, and application process of algorithm development to the greatest extent, so as to unveil the inequality of internal groups in algorithmic automated decision-making [20]. Article 12 of the European Union's General Data Protection Regulation, for example, obliges algorithm controllers to provide information in "concise, transparent, understandable, accessible and clear language". In other words, algorithmic explanations must be carried out to a degree that can be known to the general public, otherwise algorithmic explanations lose their meaning. Of course, the scope of application and technical requirements of algorithm interpretability still need to be further studied. Finally, a third party is introduced for algorithm supervision. Explore the introduction of third-party independent organizations, support academic organizations, non-profit organizations, and other professional bodies to assess, review, and file algorithms, to resolve the risk of personal information infringement brought about by "algorithmic black boxes", and to achieve algorithm security and controllability. In Germany, non-profit organizations led by technologists and senior media professionals have been developed to evaluate and monitor algorithmic decision-making processes that affect public life [21]. The US state of New York has also enacted the Algorithmic Accountability Act, which requires representatives of citizen organizations to be included in the working group that oversees automated decision-making to ensure that algorithms are open and transparent [22]. At present, the supervision of algorithms in mainland China is still insufficient, and the establishment of a third-party independent agency for supervision needs to be further explored. In addition, the issue of excessive mining of personal information is similar to the above-mentioned regulatory mechanism for the compliance of data sources, and the scope, purpose, and method of personal information capture should be further restricted in the design of generative AI algorithms, and technical risks should be prevented by means of legal regulation.

3. 4. Strengthen ethical norms and investigation of tort liability

At the output end of generative AI, algorithmic bias leads to discrimination in output results, which seriously infringes on the rights and interests of personal information. Only by tackling algorithmic bias can we better use algorithms for the benefit of mankind. The essence of the transformation of algorithmic bias into algorithmic discrimination lies in the role of people, and the developers and users of algorithms are responsible for algorithmic discrimination [23]. Therefore, the root cause of alleviating algorithmic discrimination caused by algorithmic bias lies in optimizing the ethical governance of artificial intelligence, and adhering to the concepts of "people-oriented" and "science and technology for the people" to develop and design artificial intelligence. Article 4 of the Interim Measures also responds to this. The provision and use of generative AI services shall comply with ethical requirements. First, improve the ethics of the AI industry, and strengthen the ethical review and assessment of algorithm designers. The behavior of algorithm designers is constrained by regularly carrying out scientific research ethics training, so as to strengthen their ethical self-discipline and further increase the industry entry threshold for algorithm designers. Second, build a system for the filing and review of algorithms, and strengthen prior supervision. Before the algorithm is put into use after research and development, it is required to report to the relevant regulatory departments, and after a preliminary review it meets the requirements, it is allowed to enter the market for application, and it is returned if it does not meet the requirements. Through the prior supervision of the regulatory authorities, it is possible to effectively prevent algorithms with serious biases from being put into the market. Third, establish systems for categorical and hierarchical management of algorithms and risk monitoring, and complete accountability mechanisms. Service providers should classify and hierarchical management of algorithms, and regulate algorithmic discrimination caused by "information cocoons". Starting from the damage results, post-event accountability should be carried out according to the standard of "whoever designs is responsible, and whoever is in charge is responsible", so as to curb and prevent algorithmic discrimination at the source [24]. Fourth, improve the AI ethics risk assessment mechanism, and strictly conduct a review of ethical norms. For algorithmic models embedded with generative AI, service providers should carry out self-examination and regular assessments, sort out the sources, types, and causes of ethical risks, and formulate corresponding risk response plans. Algorithm design should uphold the concept of equality and fairness, and prevent designers from using algorithms to discriminate.

The essence of the problem of disinformation governance brought about by the output of generative AI is also the role of people. The perpetrator's illegal purpose induces him to use generative AI as an auxiliary tool to create or disseminate false information and commit crimes. Therefore, to regulate the problem of false information caused by generative AI, we should start with the prevention of infringement liability in advance, control during the event, and post-processing. First, in terms of precaution, deep synthesis identification is carried out on generative AI-generated works. Generative AI service providers should strictly follow the "Provisions on the Administration of Deep Synthesis of Internet Information Services", the "Interim Measures" and other provisions to conduct identification, classification and hierarchical management of deep synthesis content, and make risk warnings for content that might cause fair confusion or misidentification in generated content, and promote the transparency of generative AI. The use of deep synthesis identification technology can also effectively track the source of false information, improve the identification rate of false information, and at the same time investigate the main responsibility of relevant responsible persons. Second, in terms of in-process control, establish a multi-subject coordination and co-management mechanism. Consider the behavior patterns and participation of governments, AI companies, users, and other entities in the generation, dissemination, and governance of disinformation, and establish a regulatory mechanism that balances the interests of all parties. Third, in the post-event treatment, the responsibilities of all parties should be reasonably distributed. Entities such as developers, users, and service providers of generative AI bear legal responsibility for the generation and dissemination of false information within the scope of their own faults. Based on the concept of encouraging innovation, applying the principle of fault liability, and based on the plurality of infringing entities that infringe on the rights and interests of personal information by generative AI, it is necessary to analyze the main responsibilities of each party according to the specific situation, and analogize the "notice and deletion" rule for service providers [25]. Therefore, the tort liability system for the use of generative AI to infringe on personal information rights and interests should be further improved.

In summary, the governance path of personal information protection under generative AI is shown in Figure 3.

Figure 3: The governance path of personal information protection under generative AI

4 Conclusion

Globally, the technological innovation of generative AI has brought huge development opportunities to countries around the world, but at the same time, it has also triggered many personal information infringement crises, such as personal information leakage, algorithmic bias, and the spread of false information. The essence lies in how to balance the protection of personal information rights and interests and the development of scientific and technological innovation. The "rights-based protection" path places too much emphasis on personal information protection, and the rigid and strict notification and consent rules are difficult to adapt to the development of the artificial intelligence era, while the "risk-based prevention" path moderately expands the scope of reasonable use of personal information and comprehensively considers the risk prevention obligations of each responsible entity, which is stable and forward-looking. However, to cope with the challenges brought by generative AI to personal information protection, rights protection and risk prevention are two indispensable dimensions. Adhere to the concept of putting people first and encouraging scientific and technological innovation and development, and further strengthen the risk management and control of generative AI inputs, simulation training, simulation optimization, and outputs, so as to achieve a balance between the protection and use of personal information. Focusing on the future, we should pay more attention to the series of impacts brought by the development of science and technology on ethics and morality and the protection of personality rights, and strengthen the research on the protection system of personality rights, so as to achieve a balance between the protection of basic human rights and scientific and technological progress.

Source: Cybersecurity & Data Governance Magazine, April 2024

☞ Business Cooperation: ☏ Please call 010-82306118 / ✐ or to [email protected]

dot

The challenge and governance path of generative AI to personal information protection

Read on