The hot "crazy" ChatGPT urgently needs "compliance brakes"

Author: Xiao Sa Legal Team

Core Tips:

ChatGPT, which is based on natural language processing technology, has three main legal compliance issues that need to be solved in the short term:

First, the intellectual property issues of the answers provided by chatAI, the main compliance problem is whether the answers produced by chatAI generate corresponding intellectual property rights. Do I need to grant IP?;

Second, does the process of data mining and training of chat AI on huge amounts of natural language processing text (generally called corpus) need to obtain corresponding intellectual property authorization?

Third, one of the answering mechanisms of chat AI such as ChatGPT is to obtain a statistical-based language model by mathematically statistically performing statistics on a large number of existing natural language texts, which leads to the fact that chat AI is likely to be "a serious nonsense", which in turn leads to the legal risk of false information transmission.

In general, the mainland is still in the pre-research stage for artificial intelligence legislation, there is no formal legislative plan or related draft motion, relevant departments are particularly cautious about the supervision of the field of artificial intelligence, with the gradual development of artificial intelligence, the corresponding legal compliance problems will only increase.

ChatGPT is not a "cross-era artificial intelligence technology"

ChatGPT is essentially a product of the development of natural language processing technology, and is still essentially just a language model.

At the beginning of 2023, the huge investment of global technology giant Microsoft has made ChatGPT the "top" in the technology field and successfully out of the circle. With the sharp rise of the ChatGPT concept sector in the capital market, many domestic technology companies have also begun to lay out this field, while the capital market is enthusiastic about the concept of ChatGPT, as legal workers, we can't help but evaluate what legal security risks ChatGPT itself may bring, and what is its legal compliance path?

Before discussing the legal risks and compliance path of ChatGPT, we should first examine the technical principles of ChatGPT - can ChatGPT answer any questions that the questioner wants, as the news suggests?

In the eyes of the team, ChatGPT seems to be far less "godly" than some of the news claims - in a word, it is just an integration of natural language processing technologies such as Transformer and GPT, which is essentially still a language model based on neural networks, rather than a "cross-era AI progress".

As mentioned earlier, ChatGPT is the product of the development of natural language processing technology, as far as the development history of this technology is concerned, it has roughly experienced three stages of grammar-based language model - statistical-based language model - neural network-based language model, ChatGPT is located in the stage of language model based on neural network, want to more straightforward understand the working principle of ChatGPT and the legal risks that this principle may cause, The first thing that must be clarified is how statistical-based language models, the predecessor of neural network-based language models, work.

In the statistics-based language model stage, AI engineers determine the probability of successive connections between words by counting a huge amount of natural language text, and when people ask a question, AI begins to analyze which word collocations are high probability under the language environment of the composition of the constituent words of the problem, and then stitch these high-probability words together to return a statistical-based answer. It can be said that this principle has run through the development of natural language processing technology since its emergence, and even in a sense, the language model based on neural networks that emerged later is also a modification of the language model based on statistics.

To give an easy-to-understand example, the team typed the question "What are the tourist attractions in Dalian?" in the ChatGPT chat box? As shown in the figure below:

The first step of AI will analyze the basic morphemes in the problem "Dalian, which, tourism, and scenic spots", and then find the natural language text set where these morphemes are located in the existing corpus, find the most likely collocations in this set, and then combine these collocations to form the final answer. For example, AI will find that in the corpus of the three words "Dalian, tourism, and scenic spots" with a high probability of appearing, there is the word "Zhongshan Park", so it will return to "Zhongshan Park", and the word "park" has the greatest probability of matching words such as gardens, lakes, fountains, statues, etc., so it will further return "This is a historic park with beautiful gardens, lakes, fountains and statues." ”

In other words, the whole process is based on probability statistics based on the existing natural language text information (corpus) behind the AI, so the returned answers are also "statistical results", which leads to ChatGPT will be "a serious nonsense" on many issues. As answered in the question "What are the tourist attractions in Dalian", although Dalian has Zhongshan Park, there are no lakes, fountains and statues in Zhongshan Park. Dalian did have "Stalin Square" in history, but Stalin Square was never a commercial square, nor did it have any shopping centers, restaurants and entertainment venues. Apparently, the information returned by ChatGPT is fake.

Second, ChatGPT as a language model is currently its most suitable application scenarios

Although in the previous part we frankly explained the disadvantages of statistical-based language models, ChatGPT has been a greatly improved neural network-based language model based on statistics-based language models, and its technical basis Transformer and GPT are the latest generation of language models, ChatGPT is essentially a combination of massive data combined with a Transformer model with strong expression ability, thus making a very deep modeling of natural language. Although the returned statement is sometimes "nonsense", it still looks a lot like "human reply" at first glance, so this technology has a wide range of application scenarios in scenarios that require massive human-computer interaction.

For now, there are three such scenarios:

First, search engines;

Second, the human-computer interaction mechanism in banks, law firms, various intermediaries, shopping malls, hospitals, and government service platforms, such as customer complaint systems, guidance and navigation, and government consultation systems in the above-mentioned places;

Third, the interaction mechanism of smart cars and smart homes (such as smart speakers and smart lights).

Search engines combined with AI chat technologies such as ChatGPT are likely to present a traditional search engine-based approach supplemented by a language model based on neural networks. At present, traditional search giants such as Google and Baidu have deep accumulation in language model technology based on neural networks, such as Google has Sparrow and Lamda, which are comparable to ChatGPT, and with the blessing of these language models, search engines will be more "humanized".

The application of AI chat technology such as ChatGPT in the customer complaint system and the guidance and navigation of hospitals and shopping malls and the government consultation system of government agencies will greatly reduce the human resource costs of relevant units and save communication time, but the problem is that statistical answers may produce completely wrong content responses, and the risk control risks brought by this may need further assessment.

Compared with the above two application scenarios, the legal risk of ChatGPT applications becoming the human-computer interaction mechanism of the above devices in the fields of smart cars and smart homes is much smaller, because the application environment in such fields is relatively private, and the wrong content of AI feedback will not cause large legal risks, and such scenarios do not have high requirements for content accuracy and the business model is more mature.

3. Legal risks and compliance paths of ChatGPT

tentative exploration

First, the overall regulatory picture of artificial intelligence in the mainland

Like many emerging technologies, the natural language processing technology represented by ChatGPT faces the "Collingridge dilemma", which includes information dilemma and control dilemma, the so-called information dilemma, that is, the social consequences of an emerging technology cannot be predicted in the early stage of the technology; The so-called control dilemma is that when the adverse social consequences of an emerging technology are discovered, the technology has often become part of the overall social and economic structure, so that the adverse social consequences cannot be effectively controlled.

At present, the field of artificial intelligence, especially natural language processing technology, is in a rapid development stage, and the technology is likely to fall into the so-called "Collingridge dilemma", and the corresponding legal regulation does not seem to "keep pace". There is currently no national level artificial intelligence industry legislation in the mainland, but there are already relevant legislative attempts at the local level. Just last September, Shenzhen announced the country's first artificial intelligence industry special legislation "Shenzhen Special Economic Zone Artificial Intelligence Industry Promotion Regulations", followed by Shanghai also passed the "Shanghai Regulations on Promoting the Development of Artificial Intelligence Industry", I believe that similar artificial intelligence industry legislation will be introduced in various places soon.

In terms of the ethical regulation of artificial intelligence, the National New Generation Artificial Intelligence Governance Professional Committee also issued the "New Generation of Artificial Intelligence Ethics Code" in 2021, proposing to integrate ethics into the whole life cycle of artificial intelligence research and development and application, perhaps in the near future, similar to the "three laws of robotics" in Asimov's novel, it will become an iron law of supervision in the field of artificial intelligence.

Second, the legal risk of false information brought about by ChatGPT

Shifting the focus from macro to micro, leaving aside the overall regulatory picture of the AI industry and the ethical regulation of AI, the practical compliance issues existing in AI chat foundations such as ChatGPT also need urgent attention.

As mentioned in the second part of this article, ChatGPT's working principle leads to the fact that its replies may be completely "serious nonsense", which is extremely misleading. Of course, false responses to questions such as "what are the tourist attractions in Dalian" may not cause serious consequences, but if ChatGPT is applied to search engines, customer complaint systems and other fields, the false information it replies to may cause extremely serious legal risks.

In fact, such legal risks have emerged, and Galactica, a language model in the field of Meta service scientific research that was launched at almost the same time as ChatGPT in November 2022, was taken offline by user complaints in just 3 days after testing because of mixed true and false answers. Under the premise that the technical principle cannot be broken through in a short time, if ChatGPT and similar language models are applied to search engines, customer complaint systems and other fields, they must be reformed for compliance. When it is detected that the user may ask professional questions, the user should be guided to consult the corresponding professional instead of looking for answers at the artificial intelligence, and the user should be reminded that the authenticity of the questions returned by the AI in chat may need to be further verified to minimize the corresponding compliance risks.

Third, the IP compliance issues brought about by ChatGPT

When shifting from macro to micro, in addition to the authenticity of AI reply messages, the intellectual property issues of chat AI, especially large language models such as ChatGPT, should also attract the attention of compliance officers.

The first compliance challenge is whether "text data mining" requires a corresponding IP license. As indicated earlier, the working principle of ChatGPT, which relies on a huge amount of natural language texts (or speech corpora), ChatGPT needs to mine and train the data in the corpus, ChatGPT needs to copy the content of the corpus to its own database, and the corresponding behavior is usually called "text data mining" in the field of natural language processing. Whether the act of text data mining infringes the right of reproduction is still controversial on the premise that the corresponding text data may constitute a work.

In the field of comparative law, both Japan and the European Union have expanded the scope of fair use in their copyright legislation, adding "text data mining" in AI as a new fair use situation. Although some scholars advocated changing the mainland's fair use system from "closed" to "open" in the process of revising the mainland Copyright Law in 2020, this proposition was not adopted in the end, and the current mainland Copyright Law still maintains the closed provisions of the fair use system, and only the thirteen circumstances stipulated in Article 24 of the Copyright Law can be recognized as fair use, in other words, the current mainland Copyright Law does not include "text data mining" in AI into the scope of reasonable application. Text data mining still requires corresponding intellectual property rights licensing in mainland China.

The second compliance conundrum is whether the responses generated by ChatGPT are original? Regarding the question of whether AI-generated works are original, the team believes that its judgment criteria should not be different from the existing judgment standards, in other words, whether a response is completed by AI or human, it should be judged according to the existing originality standards. In fact, behind this question is another more controversial issue, if the AI-generated response is original, can the copyright owner be AI? Obviously, under the IP laws of most countries, including the mainland, the author of a work may only be a natural person, and AI cannot become the author of a work.

Finally, if ChatGPT splices together third-party works in its own responses, how should its intellectual property issues be handled? The team believes that if ChatGPT's reply splices copyrighted works in the corpus (although according to the working principle of ChatGPT, this situation is less likely to occur), then according to China's current copyright law, unless it constitutes fair use, it is not necessary to obtain the authorization of the copyright owner before copying.

The hot "crazy" ChatGPT urgently needs "compliance brakes"

Read on