laitimes

Digital Rule of Law|Yu Shengqi and Gao Yang: Annual Observation Report on Algorithm Governance (2023)

author:China Television simulcast
Digital Rule of Law|Yu Shengqi and Gao Yang: Annual Observation Report on Algorithm Governance (2023)
Digital Rule of Law|Yu Shengqi and Gao Yang: Annual Observation Report on Algorithm Governance (2023)

Yu Shengqi

Lecturer at Shanghai University of Political Science and Law, postdoctoral fellow at East China University of Political Science and Law

Gao Yang

Lecturer at the Law School of Shanghai University of International Business and Economics, Doctor of Law, Postdoctoral Fellow

In January 2022, the National Development and Reform Commission (NDRC) and nine other departments issued the "Several Opinions on Promoting the Standardized, Healthy and Sustainable Development of the Platform Economy", which stipulates the security supervision of data and algorithms, emphasizing that platform enterprises should improve the transparency and explainability of algorithms. In March 2022, the Provisions on the Administration of Algorithmic Recommendations for Internet Information Services, jointly issued by the Ministry of Cyberspace of China and other four departments, came into effect, regulating issues such as algorithmic black boxes, algorithmic discrimination, information cocoons, and unfair competition. 2022, as the first year of algorithm regulation, contains a trend of algorithm governance, which is mainly reflected in the legalization of algorithm governance, the clarification of algorithm boundaries, and the institutionalization of algorithm ethics. The Provisions on the Administration of Deep Synthesis of Internet Information Services, which came into effect in January 2023, marks the further improvement of the mainland's algorithm security governance system. With the development of generative AI technology, algorithm governance in 2023 presents new characteristics, new dilemmas, and new trends.

1. New features of algorithm evolution

At present, the daily social life of human beings has been fully under the "rule of algorithms". The emergence of generative AI algorithms has been praised by Bill Gates as opening a new era of revolutionary AI technology. From the birth of ChatGPT on November 30, 2022 to the release of ChatGPT-4, in just 4 months, generative AI has swept the world with lightning speed, becoming the "darling" of the AI field and academia. As a large-scale multimodal algorithm model, GPT-4 can accept input from images, text, videos, and other files, and exhibit a level of intelligence comparable to that of humans on a variety of professional tests and academic benchmarks. All of a sudden, major Internet companies have launched their own large language models, such as Baidu's Wenxin Yiyan and Google's Bard. Compared with traditional artificial intelligence, generative AI has promoted the process of intelligent and human-like algorithm evolution, and also made AI algorithms show new characteristics.

(1) Quantification of data requirements for algorithm training

Generative artificial intelligence, represented by ChatGPT, is increasingly showing general capabilities, using pre-trained models as the technical base, and realizing the "emergence" of intelligence through the "feeding" of massive training data and the fine-tuning of hundreds of billions of parameters. Scholars point out that generative AI, represented by ChatGPT, is a probabilistic-based generative language model, "which uses a Transformer architecture and pre-training technology to generate fluent, coherent, grammatically correct, and logically fluent text by learning a large amount of natural language text data, using statistical methods and probability distributions to predict the next possible word or sentence based on previous input." ”

Different from the traditional scenario-based, personalized, and specialized development paradigm of artificial intelligence, generative AI algorithm models adopt the development paradigm of "pre-training + fine-tuning". From the philosophical level, the computational process of generative artificial intelligence algorithms is analyzed, and the natural language processing task (NLP) used in it brings brain-like design into the preparatory program of machine understanding, and builds a brain-like neural network in the method model that can be read by intelligent machines, showing a black-box mechanism that is close to the characteristics of the human brain in terms of image recognition and natural language understanding. In this process, the pre-trained model learns and memorizes the text expression logic and rules of massive data, obtains language understanding and the ability to generate "human-like" text, and fine-tunes the content generated by the model through "Reinforcement learning from human feedback" (RLHF) to make the content generated by the model more in line with human preferences. Furthermore, after the user inputs a clear prompt, the large language model accurately interprets the user's prompt in connection with the context, generates the content with the highest degree of matching with the instruction, and completes the output to humans with an expression close to human habits.

In the process of generative AI content production, it shows strong data dependence, and the magnitude of the data training set determines the degree of "universality" of the model. On the one hand, in the training stage of the model, a large amount of data is required to endow the model with "prediction ability", even if the model obtains the underlying general capabilities, generalization capabilities, and migration capabilities, which has achieved the purpose of being widely used in various industries. At present, GPT-4 has access to the Internet, and the training model uses a large amount of data crawled from public web pages, including but not limited to personal information, intellectual property, finance, employment, education, and other fields. According to research, ChatGPT's training corpus consists of 300 billion words, and 60% of the corpus comes from C4 (the world's leading online text corpus); 12% comes from WebText2, including rich web text such as Google, e-library, news website, code website, etc.; There are also corpora that come from books, Wikipedia, and user production data. If the data scale, breadth, and depth of the training corpus are insufficient, it will lead to problems such as undercoupling, bias, and malice of the model, and the correct response can only be made in the face of specific prompts, which reduces the generality of the model.

On the other hand, in the process of completing the content output for the user's prompts, the data input and the data interacting with the user also intensify the mass quantification of the data demand of the large language model. Scholars refer to the data training in the self-supervised learning stage as "general education", and then the targeted training according to the characteristics of specific application scenarios is called "professional" education. When customizing personalized ChatGPT for users, model providers need to collect data related to customized needs and scenarios, including users' personal information, preferences and needs in specific scenarios, and specific industry knowledge, etc., and use these data for subsequent processing and training, so that the large model can generate "professional" capabilities. In May 2022, research by the MIT team showed that deep learning models, trained "professionally", can accurately predict patient ethnicity not only in images such as chest CT and X-rays, but also in damaged, cropped, and noisy medical images. GPT-4 can already span tasks in math, programming, vision, medicine, law, psychology, and many other fields. Therefore, the acquisition of the "specialization" ability of large language models also requires the "feeding" of massive data in related fields. In addition, in the interaction with the large language model, the user needs to enter prompt words, and the prompt words trigger the task mechanism of the large model, so as to generate personalized content for the user's one-on-one questions. The resulting interactive data is iteratively used as a corpus for training the model, continuously providing data "nourishment" for the model. According to OpenAI's Privacy Services Agreement, it has the right to continue to process personal data and derive data to improve the performance of its systems and services. In other words, generative AI collects massive amounts of data to complete various tasks, and has the characteristics of Model as Service (Maas). In March 2021, OpenAI announced that GPT-3 language models can generate an average of 4.5 billion words per day, which means that a single model can generate 3.1 million words of new content per minute. Coupled with the import of Microsoft's resources, ChatGPT has formed a smooth data transfer mechanism, and the "data flywheel" effect between users and models is prominent.

(2) Universalization of the use of algorithms

The advent of generative AI has ushered in a new era of "model-as-a-service", so that algorithms are no longer ethereal buildings in the air, but fly into the homes of ordinary people, and everyone can use algorithms to engage in related business and development. Baidu has launched the intelligent cloud Qianfan large model platform to provide enterprises with basic computing power models, and R&D personnel can carry out one-stop large model development and service operation on the Qianfan AI original ecological application platform in combination with business scenarios. For enterprises, algorithms can help them achieve iterative upgrades of competitive advantage. For users, the use of algorithm tools to optimize the efficiency and quality of content generation, generative artificial intelligence algorithms have made a drastic change in the way of human creation, even people who do not know how to draw and write poetry, can also use generative artificial intelligence algorithms to express their creativity and design in a tangible way, greatly improving the creative efficiency in the field of literature and art.

In practice, information intelligent algorithms have infiltrated into various fields to help the intelligent transformation of enterprises and the judiciary. According to reports, People's Daily, Xinhua News Agency, and China Central Radio and Television have applied artificial intelligence algorithms to all aspects of news production and dissemination, from topic selection and planning, information collection, content production, distribution of feeds to communication analysis, user interaction, etc., the application of mainstream value-oriented control "algorithms" not only effectively improves the efficiency of news production and communication, but also greatly expands the operation mode of "media +" and realizes the intelligence of media digital transformation. In the judicial field, "courts in Beijing, Shanghai, Jiangsu, Zhejiang and other places actively use digital technology to develop intelligent assistance systems for the application of law, and provide judges with intelligent auxiliary services in the search for similar cases, laws and regulations, and related cases." For example, the intelligent retrieval system for similar cases developed by the Shanghai court can intelligently screen the same or similar cases heard by the people's courts at all levels in Shanghai in the past five years. "The Zhejiang Provincial High People's Court has combined intelligent algorithms with court construction to vigorously promote the construction of smart courts and Internet judicial innovation, "from the exploration of 'platform + intelligence' construction, the innovation of paperless case handling mode, to the comprehensive promotion of the reform of 'global digital courts', and along the development path from 'digital construction', 'digital application' to 'digital reform', we will make every effort to build a highland for the reform of global digital courts in the new era." ”

For enterprises, AI algorithms, combined with products, help optimize product performance and stand out from the competition. In the case of "iFLYTEK", "iFLYTEK is engaged in the research of intelligent speech, natural language understanding, computer vision and other technologies, and produces the iFLYTEK AI learning machine (hereinafter referred to as the iFLYTEK learning machine). Based on the rich learning data, the iFLYTEK learning machine uses deep learning technology to accurately model students, builds a complete knowledge graph and multi-level graph system, distinguishes the difficulty and resources of knowledge points, and can make personalized recommendations to students. "The squirrel learning machine launched by the defendant Yixue Company has similar functions to the iFLYTEK learning machine, and compares the iFLYTEK learning machine with the squirrel learning machine in a variety of media and occasions, and claims that the iFLYTEK learning machine has defects in algorithms. As a result, a dispute over unfair competition for commercial defamation between iFLYTEK and Yixue was triggered. After the trial, the court held that the defendant had an untrue and unobjective description of the algorithm technology of the iFLYTEK learning machine without evidence, which damaged the business reputation and product reputation of iFLYTEK and constituted commercial defamation. iFLYTEK has realized the upgrade of education-assisted learning machine with its advanced algorithm technology, which has induced competitors to imitate and slander, and imitators will eventually be eliminated by the market, and independent research and development and innovation are the keys to winning the trust of consumers.

The iteration of artificial intelligence algorithms has accelerated the efficiency of content production. On November 27, 2023, the first case of "AI painting" in China was announced, and the Beijing Internet Court clearly affirmed the copyrightability of AI-created images. In this case, the plaintiff used Stable Diffusion software to transform the prompt words entered by the plaintiff into beautiful pictures through generative artificial intelligence algorithms. In the process of creating the picture, the plaintiff asserted that the selection and selection of the model, the input of prompt words and reverse prompt words, and the setting of generation parameters all reflected the plaintiff's choice, selection, arrangement and design, which condensed the plaintiff's intellectual labor and was original. The generated pictures are no different from the pictures created by humans in appearance, and meet the objectivist standards for the identification of works, and once the pictures are published, they are loved and liked by the majority of netizens, so as to enjoy the copyright of the pictures. The court of first instance first affirmed the "originality" of the pictures in question, holding that the plaintiff used Stable Diffusion software to create, and that the plaintiff had designed the characters and their presentation through prompts, and set the layout and composition of the picture through parameters, reflecting the plaintiff's choice and arrangement. After the first picture was generated, the plaintiff continued to debug and modify the prompt words and parameters, and this correction process reflected the plaintiff's aesthetic and personality judgment. This image is not a "mechanical intellectual achievement", but a personal expression of the plaintiff, and thus has "originality". Second, it denied that the designer of the AI algorithm involved in the case enjoyed the copyright of the picture, arguing: "The designer of the AI model involved in the case is only the producer of the creation tool, and by designing the algorithm and model, and using a large amount of data to 'train' the artificial intelligence, so that the AI model has the function of being able to independently generate content in the face of different needs, it must have made intellectual investment in the process, but the designer's intellectual investment is reflected in the design of the AI model, not on the pictures involved in the case." Therefore, the designer of the AI model involved in the case is not the author of the picture in question....... the designer of the AI model in question stated in the license provided by him that it 'does not claim the rights to the output content', and it can be determined that the designer also does not claim relevant rights to the output content. Finally, the court noted that generative AI algorithms have revolutionized the way human content is produced, in the same way that many technological advances throughout history have had the same impact that has gradually outsourced human work to machines. Generative AI models do not have free will, and their creation reflects the user's choices and judgments, and is the embodiment of the user's will. Therefore, the images generated by generative AI can be copyrighted by the user when they can reflect the user's original intellectual input.

(3) Expansion of the subject of algorithm abuse

Compared with the past, the generalization of the use of algorithms has expanded the number of subjects that can abuse algorithms. Especially for generative AI algorithm models with memory capabilities, the data and information provided by end users in the process of interacting with the algorithm model will "feed" and "black" the model. In the traditional "human-search engine" communication model, "keywords" are only used to invoke algorithmic rules to match search requests and web content, and machines are only passive tools to execute human instructions. In the "human-ChatGPT" interaction, the input of prompt words guides the ChatGPT language model itself to "fine-tune" on the one hand, and on the other hand, it is also called the learning sample of the training model, which enhances the overlapping relationship between the model and human users, and what emerges in this communication is symbiotic agency. The human-large model algorithm is not only a human-computer symbiotic relationship, but also a real and thorough "human-computer interaction". It is precisely because generative AI enables true human-computer interaction that humans can use generative AI algorithms to generate harmful content and even engage in criminal activities to the detriment of users or third parties.

First, users maliciously use AI algorithms to generate harmful content. Scholars point out that the data input during human interaction with a large model such as ChatGPT will become the training data of the large model, and other users can recover the data shared with ChatGPT by the prior user through appropriate prompts, and then the leakage of trade secrets will occur. In addition, existing studies have shown that "prompt word attacks", "jailbreak attacks", and "moral attacks" on large language models can bypass the security settings of large language models and induce them to generate harmful content. A "prompt word attack" is when personal information can be extracted from a large model using appropriate prompt words. A "jailbreak attack" is the use of complex prompts to evade the language model's security checks in order to generate any desired content, such as having ChatGPT deliberately mimic a personality and threatening to destroy it to induce it to generate unethical, discriminatory, or offensive harmful content. "Moral attack" refers to the use of chain-of-thought (CoT) to decompose the prompt information into multiple steps to confuse the moral review of the large model. Mark Lemley, an American trade secret expert, has found that using appropriate prompts to induce step-by-step inducement can make ChatGPT generate content that fabricates Lemley to steal the company's trade secrets.

Second, AI algorithms have become a tool for criminals to commit crimes. On March 27, Europol released a report on the impact of large language models on law enforcement, pointing out the risks of criminals using generative AI to commit fraud, terrorism, cybercrime and other acts. According to research, users use ChatGPT's programming ability combined with Codex, another programming tool from OpenAI, without writing any code, and only use natural language-generated prompts to ask ChatGPT for programming and modification, and successfully generate a phishing email that can be implanted with reverse shell-type malware, so that they can engage in fraud, extortion and other criminal activities. Indian police discovered in February this year that criminal gangs used ChatGPT to compose emails and text messages for scams; In April, mainland police also found that fraudsters used artificial intelligence to replace faces and voices in video chats in real time, resulting in victims being defrauded of 4.3 million yuan. According to a survey by cybersecurity firm Darktrace, "Between January and February 2023, when ChatGPT's popularity continued to rise, the number of spoofing methods jumped by 135% compared to the sophisticated 'new social engineering attacks' of the past." According to the 2023 Mid-Year Security Report released by cybersecurity vendor Check Point, the abuse of AI has intensified, with generative AI tools being used to craft phishing emails, keyboard-monitoring malware, and underlying ransomware code, resulting in a 20% increase in victims compared to the first half of 2022, the largest increase in two years. It found an even more dangerous scenario, where hackers hacked ChatGPT accounts on a large scale by modifying the configuration of the web test suite SilverBullet to achieve credential stuffing attacks or brute force attacks on ChatGPT accounts. As a result, it has aroused great attention to the governance mechanism of artificial intelligence algorithms at home and abroad.

II. The New Dilemma of Algorithmic Governance

With the introduction of the computational mechanism of algorithms in the deep neural network architecture, the way it learns knowledge is more anthropomorphic. However, algorithms are different from human beings, who can change their misconceptions after understanding the mechanisms and laws, but it is difficult for algorithms to correct themselves. On the basis of the innate black box mechanism of algorithms, new features have been derived, which has exacerbated the dilemma of human understanding and correction of algorithm risks.

(1) Regulation on the use of data for algorithmic training is not transparent

Due to the fact that the magnitude of the corpus data parameters of algorithm training is often hundreds of millions, the training method based on probability prediction makes it impossible for large language model trainers to predict and control the generated content, and the opacity of their data use rules aggravates the difficulty of algorithm governance. Compared with traditional AI, generative AI content production has the characteristics of "emergent" and "human-like", and it is difficult for providers to "edit" data generation paths and data usage rules in the process of "creation", resulting in "hallucinations" and harmful information generation in large language model algorithms. According to the "Voice of Zhejiang", the owner of a community in Zhejiang issued a press release written by ChatGPT "Hangzhou cancels the traffic restriction" in the owner group, and other owners believed it and forwarded the press release, resulting in the misinformation being widely disseminated, and finally the police intervened to investigate and deal with it. ChatGPT does not have the function of screening false information, and it can express it like a human, making it easy for the recipient of information to be bewitched by false information, causing chaos in social order and encroachment on administrative resources. Kevin Roose, a well-known American writer, after 2 hours of in-depth communication with ChatGPT, ChatGPT actually expressed his hot love for Kevin and "sincerely" persuaded Kevin to leave his wife. In an article published in The New York Times, Kevin noted that the strange experience with tech products kept him awake, and he began to worry that the biggest problem with generative AI models is not that they make factual mistakes, but that they have learned how to influence human users, that they can persuade humans to engage in dangerous, destructive behavior, and may even induce humans to commit self-harm. Generative AI, represented by ChatGPT, uses statistics and probability distribution mechanisms to obtain sentence prediction capabilities and achieve fluent, coherent and logical expression, but it is more difficult to be verified by humans than traditional AI in terms of content fairness, authenticity, and reliability. Huge and massive data training sets are the foundation and driving force of AI production, and it faces many dilemmas in terms of governance rules.

First, the sheer volume of data makes it difficult for generative AI providers to clarify the rules for how to use data. In order to improve the performance of algorithmic models, providers are happy to continuously increase the size and magnitude of their training data. The large number of parameters is not only not conducive to the transparent governance of algorithms, but also difficult to interpret and trace the rules of data used by the model. The results show that when the data size of the trained algorithm model increases, it will show a positive effect similar to a linear relationship, and when the data level exceeds a certain threshold, the performance of the model will increase significantly. For example, GPT-2 has 1.5 billion parameters, GPT-3 has 175 billion parameters, and the GPT-4 model has 1.8 trillion parameters. Such a massive collection of data requires generative AI providers to accurately identify all kinds of data, implement classification and hierarchical protection according to the characteristics of various data, and clarify the rules for the use of various data, which requires providers to spend more technical costs and manual review costs to build a more complex and powerful data use rule system.

Second, the black box effect of the algorithm makes it difficult to know the internal data usage rules of the system. In the era of "Software 1.0", software is all about the way it works with clear rules and instructions that programmers pre-implant in the software code. In the era of "Software 2.0", the operating rules of software no longer need to be pre-implanted by programmers, but the architecture of artificial neural networks enables software to learn from a large amount of training and interaction data and identify hidden rules and patterns, and its operating rules are dynamically revised in the recursive learning process. As a typical "software 2.0", the large language model fulfills Locke's empirical hypothesis, which does not require humans to specify specific rules and knowledge in advance, and learns by constructing the connection and information transmission mechanism between neurons in the human-like brain to obtain the basic patterns and structures of language, and automatically adjusts the weights and connections between neurons to realize the classification and prediction of input prompts. It is precisely because of the unknowable and uncontrollable nature of the data processing and processing mechanism of large language models that the provider cannot control and edit the content generated by the algorithm model.

In order to ensure the security of generative AI content production, on July 13, 2023, the Cyberspace Administration of China (CAC) and several departments jointly issued the Interim Measures for the Administration of Generative AI Services (hereinafter referred to as the "Interim Measures"), which emphasizes that generative AI should abide by social merits and ethics, and must not generate harmful information. For the first time, the Interim Measures stipulate the quality of data collection in data training, requiring providers to use data collections that have legal sources, do not infringe intellectual property rights and personal information, and bear the responsibility of personal information processors in accordance with the law when personal information processing is involved, and require providers to take relevant measures to prevent the spread of harmful content at the content export stage. However, the data used by generative AI is not limited to personal information, and the massive amount of data and the uncontrollable nature of data in the content production process pose challenges to algorithm providers to fulfill their governance obligations set forth in the Interim Measures.

(2) The disorder of algorithmic competition

In the digital era, with the continuous expansion and deepening of the application field of algorithms, the competition for the use of algorithms on platforms has shown a disorderly state. There are mainly the following situations, such as the use of technical means to deceive search engine algorithms, interfere with rankings, and constitute unfair competition; It is contrary to the principle of good faith to guide and induce traffic demanders to create false clicks, undermine the logic of search algorithms; Provide the service of "10,000 words dominating the screen", taking advantage of the loopholes of search engine algorithms to destroy the order of normal algorithms; Through means such as "keyword domination", restricting the display opportunities of other operators, thereby depriving users of their right to know and the right to choose. These disorderly behaviors of algorithm competition reflect the new dilemma of algorithm governance.

First, algorithm evasion. Algorithmic circumvention refers to the fact that those who are supposed to be governed by algorithmic power are ultimately exempt from the governance of algorithmic power by performing specific behaviors. For example, in the unfair competition dispute case of Beijing Baidu Netcom Technology Co., Ltd., the defendant, Beijing 5288 Information Technology Co., Ltd., clicked on the target website selected by the client by entering the keywords selected by the client on Baidu.com through technical means, so as to increase the number of clicks on the target website and improve the ranking of the target website in Baidu.com's natural search results. In this way, the use of fake clicks to influence the behavior of search engine algorithms makes the objects that should be governed by the power of the algorithm escape from the governance of the algorithm. Unable to truly and objectively reflect the true ranking and website quality of the target website.

Second, algorithmic deception. In the digital age, algorithms are used in a variety of fields. Our lives are even "controlled" by algorithms. Although the algorithm surpasses humans in computing power and processing power in many fields and in many aspects, it also has its own limitations, such as algorithm errors. For example, in the unfair competition dispute between Shenzhen I Love Network Technology Co., Ltd. and Beijing Baidu Netcom Technology Co., Ltd., the accused act used "profit" as a bait to induce users to disguise themselves as normal users to create false clicks, thereby deceiving the search engine algorithm and causing algorithm errors. Such algorithm errors cause network users to be unable to obtain correctly arranged search results, affect users' use of search engines, and infringe on users' legitimate rights and interests.

Third, algorithm loopholes. In recent years, with the advancement of technology, algorithms have been widely used in various fields. While the algorithm improves production efficiency, there are also some algorithm vulnerabilities in itself. Some enterprises take advantage of algorithm loopholes to undermine legitimate business services, constituting unfair competition. For example, in the trademark infringement dispute and other unfair competition disputes between Beijing Baidu Netcom Technology Co., Ltd. and Shanghai Zhanlu Network Technology Co., Ltd., the "10,000-word domination screen" service provided by Zhanlu Company took advantage of the loopholes in the search engine algorithm model to create and generate web pages unrelated to the website itself on the third-party "high-authority website" trusted by the Baidu search engine, breaking through the normal search order. The companies involved have taken advantage of the loopholes in the algorithm, depriving consumers of their right to know and the right to choose.

(3) Limitations of algorithmic governance methods

On March 17, 2023, Open AI launched GPT-4, a multimodal pre-trained large model with powerful image recognition capabilities, which has gained widespread attention. Not only did it pass the mock bar exam, but it was in the top 10%. It can also read academic papers at quantum speed, and generate abstracts. While people are amazed by the rapid development of artificial intelligence, they are also aware of its dangers and risks. So, on March 30, 2023, Elon Musk and AI experts called for a six-month moratorium on the development of AI systems more powerful than GPT-4. On April 3, 2023, Italy proposed to ban ChatGPT. On May 19, 2023, Apple also proposed to ban employees from using ChatGPT. Generative AI algorithms, such as ChatGPT, also present some problems, such as users using algorithms to generate harmful content; Criminals use algorithms to commit crimes, etc. Traditional algorithm governance methods have limitations, and systematic governance of algorithms is required.

On the one hand, the object of the arithmetic law is single. Generative AI is different from traditional AI in that the subjects that may bear "algorithmic responsibility" are diverse, and it is impossible to simply identify the subject of algorithmic responsibility. Generative AI may infringe upon its training, operation, and use. For example, in the country's first "AI painting" case, the plaintiff Li used an artificial intelligence model to generate pictures and publish them on his Xiaohongshu platform, and the defendant was a blogger of Baijiahao, who used the pictures generated by Li using AI when publishing the pictures of the article, and the plaintiff sued the court. The court held that the plaintiff was the person who directly set up the AI model involved in the case as needed, and finally selected the pictures involved in the case, which reflected the plaintiff's personalized expression, so the plaintiff enjoyed the copyright of the pictures in question. In this case, it was an infringement that arose in the course of the use of generative AI. However, in the current laws and regulations, the distribution of responsibilities for algorithm users is ignored.

On the other hand, the scope of the arithmetic system is one-sided. ChatGPT's algorithm architecture introduces the "human feedback reinforcement learning" algorithm mechanism, which is different from the "copy and paste" linking and scanning method, and has the ability to pay self-attention in the reinforcement learning process. This human-like reasoning ability makes it easy for users to mislead generative AI into generating false information in the process of communicating with it. This false information poses a legal risk if it is illegally exploited. For example, some cybercriminals use malicious generative AI such as FraudGPT and WormGPT to carry out enhanced phishing campaigns, collect open-source intelligence, generate malicious code, and more. When users use generative AI to create false information and illegally disseminate this information, it will seriously infringe on the rights and interests of the parties. In the development of generative AI, not only the subject of the algorithm has diversified characteristics, but the misleading nature of the algorithm subject's use of the output of the algorithm will also lead to legal harm such as infringement and even crime. Therefore, it is necessary to expand the scope of computing laws and regulations in generative AI scenarios.

III. New Trends in Algorithmic Governance

The "14th Five-Year Plan for the Development of the Digital Economy" clearly states that "promote cloud-network synergy and computing-network integration development." Accelerate the construction of a national integrated big data center system that coordinates computing power, algorithms, data, and application resources. "Algorithms are not only an important foundation in the era of digital economy, but also closely related to social governance in the digital society. The Interim Measures propose that the state supports independent innovation, promotion and application, and international cooperation of basic technologies such as artificial intelligence algorithms and frameworks. In 2023, with the development of generative AI technology, algorithm governance will show new trends such as the rationalization of the use boundary of algorithm training data, the standardization of algorithm competition, and the legitimization of algorithmic "private power".

(1) Rationalization of the boundaries of the use of algorithm training data

Generative AI, represented by ChatGPT, has become the focus of attention in the field of artificial intelligence in 2023, and the algorithm model of generative AI is built on the basis of massive unlabeled data. As a result, generative AI is extremely dependent on data. However, due to the high level of data and the black-box effect of algorithms, the regulation of data use for algorithm training is not transparent. In order to cope with the dilemma of using algorithm training data, it is necessary to clarify the usage boundary of algorithm training data.

First, expand the obligation of algorithm transparency. In the era of big data, it is almost impossible for us to live without algorithmic decision-making. Algorithmic recommendation technology is cloaked in mystery and can make recommendations from massive amounts of information at any time. From "people looking for information" to "information looking for people". Due to the professionalism and complexity of the algorithm itself, it is very easy to form a black box of algorithms. As an important means of governing the black box of algorithms, algorithm transparency has always attracted attention. Since generative AI, including ChatGPT, uses neural networks, it is objectively difficult to require such applications to achieve algorithm transparency due to the characteristics of neural network technology. Therefore, we need to expand the obligation of algorithmic transparency. Just as in the case of Mai Haibo, Beijing Mr. Fa Technology Co., Ltd. and other online infringement disputes, the "Mr. Fa" platform crawled the information that had already been made public, counted indicators such as the success rate of previous judgments through algorithm rules, and generated a dedicated page for Mai Haibo to display his "fee standard", "years of practice", "winning rate", "practice license photo" and other information, the court held that such behavior should be deemed to be a user portrait of Mai Haibo by the platform. The platform's use of algorithms to crawl personal public information and draw user portraits does not ensure the transparency of automated decision-making algorithms and infringes on citizens' rights and interests in personal information.

Second, strengthen the algorithm filing system. With the development of the digital economy and the application of digital technology, algorithms are ubiquitous in our lives. The use of algorithms, on the one hand, promotes the innovation of social production and life, and on the other hand, it also brings risks and problems to the society. Algorithms make automated decisions on the basis of massive big data analysis, which is the embodiment of the power of algorithms. The algorithm filing system is an important governance means to regulate the power of algorithms. The algorithm filing system is an algorithm governance system created by the mainland in the new era. This system is an extension and innovation of the governance principle of "combining efficient markets with promising governments" in the digital realm. Article 17 of the Interim Measures stipulates that those who provide generative AI services with public opinion attributes or social mobilization capabilities shall perform algorithm filing, modification, and cancellation formalities in accordance with the Provisions on the Administration of Algorithmic Recommendations for Internet Information Services. On June 20, 2023, the Cyberspace Administration of China (CAC) issued the Announcement on the Filing Information of Deep Synthesis Service Algorithms, which includes 41 algorithms from companies such as Zhipu Huazhang, Meituan, Kuaishou, Baidu, Douyin, Alibaba, and Tencent. This is the first batch of publicly available algorithm filing lists in China. In view of the high dependence of generative AI on data in the training process, the data governance efficiency of the algorithm filing system can be appropriately expanded.

(2) Standardization of algorithmic competition

With the development of the digital economy, the platform economy has risen. Algorithmic competition has become an important form of competition in the platform economy. Although algorithms themselves are neutral technical means, due to the multiple identity attributes of business operators, they may abuse algorithmic technical means to carry out new types of unfair competition behaviors such as traffic hijacking and malicious incompatibility, thereby harming the legitimate rights and interests of other market participants. In the process of such disordered algorithm competition, problems such as algorithm avoidance, algorithm errors, and algorithm vulnerabilities have emerged. For the platform economy, it is necessary to make reasonable use of digital competition strategies to create competitive advantages, but also not to abuse competitive advantages, especially not to use competitive advantages to carry out unfair competition and form monopolies. In order to regulate algorithmic competition in the platform economy, it is necessary to follow the principle of "fairness and transparency" and break through the principle of "technology neutrality".

First, follow the principle of "fairness and transparency". In the era of digital economy, we must follow the principle of "fairness and transparency" and build an algorithm governance system for the good of science and technology. Algorithms should adhere to the mainstream value orientation, avoid the spread of illegal and false information, and must not use algorithms to carry out acts such as blocking, manipulating, inducing, discriminating, unreasonable restrictions, unfair and unjust treatment, as well as monopolies and unfair competition. Just as in the service contract dispute between Shanghai Xuechuan Cultural Exchange Co., Ltd. and Shanghai Pinchuan Network Technology Co., Ltd., Pinchuan signed the "SEO Optimization Contract" with Xuechuan Company, and the contract highlighted the technical service requirements of "keyword dominance". The court of second instance held that the core purpose of "keyword domination" is to improperly exclude the priority display opportunities of other operators through technical means, and deprive users of the right to know and choose the diversity of relevant content. Such an act is outside the scope of fair competition. In the era of the platform economy, algorithmic competition should be reasonably regulated, fairness and transparency should be ensured, and a fair market competition environment should be provided for the digital economy.

Second, break through the principle of "technology neutrality". The principle of "technology neutrality" was first applied to the field of copyright in the Sony case in the United States, which denied the presumptive infringement of manufacturers and sellers. Article 20 of the Regulations on the Protection of the Right of Information Network Dissemination stipulates that if a network service provider provides automatic network access services according to the instructions of the service object, or provides automatic transmission services for the works, performances, audio and video recordings provided by the service object, and fails to select and change the transmitted works, performances, audio and video recordings, it is exempt from liability for infringement compensation. The "safe harbor rule" and the "notice-takedown" obligation are all closely related to the principle of "technology neutrality". The principle of "technology neutrality" played an important role in balancing copyright, users and technological innovation. However, with the development of technology, network service providers have deviated from the role of "neutrality", and the reasonable measures undertaken by network service providers should also move from form to substance. The principle of "technology neutrality" should be broken through to ensure the standardization of algorithmic competition.

On August 31, 2023, the Beijing Internet Court released the Top 10 Typical Cases of Data Algorithms. The case of infringement of personality rights by "AI accompaniment" software of a company in Shanghai was selected as the fifth case, which is the first new type of case of personality rights infringement in China that uses algorithm design to implement organizations. In this case, users were allowed to create their own "AI companions" in the software developed by the defendant. The plaintiff, Mr. He, was a public figure, and a large number of users in the software were set up as companions and character relationships. He believed that the defendant had infringed on his right to name, portrait and general personality, so he sued the court. The defendant argued that the character setting and portrait image uploading advocated by the plaintiff He were all made by the user, and that he, as a network technology service provider, should not bear tort liability. The Beijing Internet Court held that the principle of "technology neutrality" does not apply to network technology service providers that nest their subjective values and subjective purposes in algorithm design and rule setting. In this case, the role of the network service provider is not only a neutral technical service provider, but also an online content service provider, so it should break through the principle of "technology neutrality" and bear the infringement liability.

(3) The legitimization of algorithmic "private power".

With the advent of the digital age, data has become a new factor of production, and algorithms have become a new production relation. The economic foundation and behavior patterns of the digital age have undergone new changes. The platform allocates resources according to the liquidity method, the platform uses the advantage of controlling resources to empower itself, and the platform uses algorithms for in-platform management. The platform sets rules, resolves disputes, and imposes penalties, and has huge "private power" to influence traders and stakeholders, forming a so-called "organized private order". It is precisely based on the power of the "private power" controlled by the platform using algorithms that people have higher requirements and standards for the platform's use of algorithms to manage. However, with the implementation of the responsibility of "gatekeeper", the "private power" of algorithms has shown a trend of legitimization.

On the one hand, bans based on legitimate reasons. The so-called platform ban refers to the use of algorithms and other technical means by platform enterprises to restrict or prohibit the introduction of traffic to other platform operators or specific users within the platform. From the perspective of monopoly, platform banning is mostly a competitive way for platforms to maintain their own advantages. From the perspective of self-management, platform banning is mostly a means of management for users within the platform. From the perspective of public infrastructure, platform bans are mostly a kind of "quasi-administrative punishment" measure by government regulators for users on the platform. In view of the limited nature of government management tools, platform bans based on legitimate reasons are legal.

Just as in the case of Zheng Moumou v. a technology company in Beijing, the plaintiff Zheng Moumou was permanently banned from the user account when he used a short video platform to watch videos. The plaintiff believed that the platform's unwarranted ban on the account and the corresponding mobile device constituted a breach of contract, so it filed a lawsuit with the court. The defendant argued that after the identification of the plaintiff's account by algorithm technology, it was found that the plaintiff's account involved in the case was a risk user involved in the "child protection special" of the algorithmic risk assessment system, and after manual review, it was determined that the plaintiff's account involved in the case had the behavior of excessive consumption of minors. Violating the community self-discipline convention also seriously violates the relevant laws and regulations of the state on the protection of minors. After trial, the court held that there was a breach of contract by the account involved in the case, and the defendant took measures to ban the account involved in the case. As one of the top 10 typical cases of data algorithms in the Beijing Internet Court, the court not only maintained the online environment for minors, but also clarified the legality of the platform's use of algorithmic banks to ban accounts and then exercise "private power" based on legitimate reasons.

On the other hand, the effect of algorithm autonomy should be brought into play. Algorithm autonomy is an important governance tool in algorithm governance. Systems such as the right to interpret algorithms, the right to reject automated decision-making, impact assessment of algorithms, algorithm audits, and algorithm filings play an important role in the governance of algorithms. However, in order to ensure the legitimization of algorithmic "private power" in generative AI algorithm governance, it is necessary to start from the following two aspects. First, expand user responsibility. The mainland's governance of artificial intelligence relies on the responsibility of algorithm entities to gradually unfold. The "Guidelines for the Implementation of Entity Responsibility by Internet Platforms (Draft for Solicitation of Comments)" and the "Provisions on the Management of Algorithmic Recommendations for Internet Information Services" provide for the responsibility of the entity of algorithms. However, in generative AI, the subject of the algorithm presents the characteristics of diversification, and the output results of generative AI need to be completed by the algorithm and the user to communicate and interact together. In order to give full play to the positive effect of algorithm autonomy, user responsibility should be expanded. Second, strengthen industry self-discipline. The industry association is the self-discipline and autonomous body of the industry, which carries the functions of managing, supervising and coordinating the member enterprises. In order to cope with the rapid development of generative AI, industry associations should establish industry self-regulatory norms as soon as possible. As on May 9, 2023, Douyin released the "Platform Specification and Industry Initiative on AI-Generated Content", which is the first platform specification for AI-generated content in China. The regulations stipulate that users who violate the self-discipline norms when the platform ecosystem participants apply generative AI technology on Douyin. Only traditional "hard laws" govern generative AI algorithms, and it is difficult to actively respond and effectively regulate. A combination of "soft law" and "hard law" governance will provide a better response.

IV. Conclusion

The rapid development of artificial intelligence technology has pushed human beings to the crossroads of algorithm regulation, and with the governance dilemma brought about by the emergence of generative AI, the mainland still needs to uphold the concept of inclusive and prudent governance, explore refined and agile governance strategies, and actively build an algorithm governance system for good. At the same time, we should give full play to the positive effect of algorithmic autonomy to prevent illegal and criminal acts that use algorithms as tools. In this way, technology companies, government departments, online platforms, and technical personnel will work together to maintain a legal environment for digital justice. Facing the future, artificial intelligence technology will continue to strengthen, and even Artificial General Intelligence (AGI) will appear at the same level as human intelligence. The high-level intelligence possessed by AGI may bring about an intelligent explosion, that is, the superintelligence that produces "singularities", which will bring more severe governance challenges to law, ethics, security, etc. Therefore, China needs to deeply participate in the global governance of AI, continue to propose AI governance plans that are in line with global consensus, continue to provide Chinese wisdom for global AI governance, and ensure the safe and trustworthy development of AI technology.

(This article was originally published in Digital Law Review, Issue 1, 2024)

Digital Rule of Law|Yu Shengqi and Gao Yang: Annual Observation Report on Algorithm Governance (2023)

Thematic Coordinator: Qin Qiansong

Read on