laitimes

AI contract theory (8): Industrial development needs point AI policy direction, can the national database achieve the way for latecomers to break the game?

Southern Finance All Media Reporter Wu Liyang 21st Century Business Herald Reporter Zheng Xue Wang Jun Intern Yang Piaopiao reported from Shanghai and Beijing

Privacy, copyright, and content security are certainly the compliance focus of policy attention, but in the face of industrial upgrading opportunities that are likely to be used as an opportunity for the next generation of scientific and technological revolution, how to promote AI R&D and application in line with the domestic industrial system, and protect and promote the development of the local AI industry, has also become one of the first policy goals that countries need to consider when formulating AI regulatory policies.

Comprehensive artificial intelligence industry development of the three major elements - computing power, algorithms and data: computing power tests the level of infrastructure construction in various countries, and the potential problems mainly lie in the cost and external factors such as the US ban on Chinese chip exports, and the adjustment space at the industrial development policy level is small; At the algorithm level, it mainly relies on the development momentum and market support of the domestic AI industry, and the policy mainly plays a guiding and stimulating role to help the growth of related enterprises by creating a good development environment.

As the basis for the development of artificial intelligence, data, especially high-quality datasets closely related to the needs of domestic AI development, have become an important focus for industrial development at the policy level. How to integrate domestic data resources, formulate data use standards, and carry out cross-border data management have all become the entry points for building an artificial intelligence regulatory system, demonstrating the attitude adopted by countries towards the development of this emerging industry.

Policy questions

As the makers of the GDPR system, EU member states have continued their consistent conservative style of controlling data resources in the artificial intelligence industry. In Italy, for example, at the end of March this year, the Italian Personal Data Protection Agency (Garante) announced a temporary ban on the use of chatbot ChatGPT and launched an investigation into OpenAI's alleged violation of data collection rules, while restricting OpenAI's processing of Italian user data, becoming the first Western country to adopt a ban on AI chatbots.

At that time, Italy explained the reasons for the ban, saying that the ChatGPT platform had lost user conversation data and payment service payment information, and the platform did not inform about the collection and processing of user information, and lacked a legal basis for collecting and storing personal information.

On April 12, Italian authorities made a series of requirements to OpenAI, requiring it to disclose the data processing logic of ChatGPT, screen the age of users, and clarify the rights of data subjects.

ChatGPT went live in Italy after meeting the above conditions at the end of April, but Garante also said that it will conduct a further extensive review of generative AI and AI machine learning to see if these new tools have issues related to data protection and compliance with privacy laws.

In fact, the United States, which is widely regarded as having a policy environment that is more focused on innovation and development, and emphasizes policy flexibility, has also signaled in recent years that it is more proactive in specific regulatory actions. For example, Lina Khan, a well-known antitrust scholar, was appointed as the chairman of the FTC, and a number of AI hazard researchers were invited to join the White House Office of Science and Technology.

"After having achieved a certain technological lead and built a set of industrial development logic led by itself, strengthening supervision in order to build its own competitiveness and technological dominance, and raising the entry threshold for other competitors, is the main way for some countries to implement technology monopoly." A legal researcher in the field of science and technology in Beijing said when communicating with reporters.

Correspondingly, Japan has released extremely relaxed regulatory signals in the field of AI data, especially copyright supervision. A few days ago, Japan's top education administrator and Minister of Education, Culture, Sports, Science and Technology reiterated at the meeting that the Japanese government will not implement copyright protection for data used in artificial intelligence training.

On June 10, the Japanese government launched an IP promotion plan at the Intellectual Property Strategy Headquarters meeting, which includes how not to infringe copyrights and when AI-generated things can be considered "works."

The Japanese government said that it "will explore necessary measures" in order to leverage AI technology while protecting intellectual property rights. Generative AI can parse literature, paintings, music, and many other "works" and generate new content. In the process of AI development, unauthorized interpretation of authorial data is allowed, but infringement laws and regulations must not be violated, and Japan will explore what constitutes inappropriate infringement.

Wu Shenkuo, doctoral supervisor of Beijing Normal University Law School and deputy director of the Research Center of the Internet Society of China, pointed out in an interview with Southern Finance and Economics All Media that on the whole, countries are designing and implementing their own governance frameworks and governance propositions based on the different positioning and strategic demands of their own artificial intelligence industry development, and introducing matching governance mechanisms.

Barriers and gaps

As the source of the AI industry, sufficient and high-quality training data is the main factor supporting the momentum of industrial development. In the early stage of industry development, data accumulation and development can still rely on individual enterprises and scientific research institutions to promote it, but as the AI industry chain extends to different application scenarios, the amount of data increases exponentially, involving more diverse and complex data subjects and types, and macro-level database integration is gradually incorporated into the multinational AI policy framework.

In fact, integrating and developing public data resources to provide basic data support for the development of the artificial intelligence industry has become a consistent strategy for the development of the AI industry in many countries.

Strategy 5 in the National Strategic Plan for Artificial Intelligence Research and Development, launched by the United States as early as 2016, proposed to develop public datasets and environments for AI training and testing. These include "developing rich datasets that meet diverse AI interests and applications," noting that the integrity and availability of AI training and testing datasets is critical to ensuring scientifically reliable results, and that the lack of vetted and available public datasets with confirmed sources to guarantee reproducibility is a key factor affecting the full development of AI.

The UK's 2021 National AI Strategy also includes "the long-term need to invest in the AI ecosystem" as a key action plan in the medium to long term, with measures including publishing a framework for the government's role in promoting better data availability in the wider economy, consulting on the role and options of the National Cyber-Physical Infrastructure Framework, and supporting the development of AI, data science and digital skills through the Department of Education.

Long-term industrial construction and support have enabled the AI industry chain led by Western countries such as the United Kingdom and the United States to be laid out and accumulated in advance at the database level, which to a certain extent has achieved the current development pattern of English-text-led databases that are ahead of other databases in terms of quantity and quality.

"First of all, as an international lingua franca, English is spoken in more countries, covering a wider range of fields, and relatively more comprehensive sources of information; Secondly, the UGC base of the English corpus is larger, which can also support more high-quality Q&A community ecology, thereby contributing more data. Finally, professional databases such as Github's high-quality codebase are still mainly in English, and it is difficult to find alternatives for professional content in vertical fields. An algorithm engineer in Beijing pointed out in an exchange with reporters that data categories, bases and professionalism are the advantages of the current English corpus for artificial intelligence training, and it is also the reason why some non-English-speaking countries still need to rely on English databases to a certain extent when developing artificial intelligence.

Gu Dujuan, director of NSFOCUS Tianshu Lab, said that due to years of data accumulation, in addition to the richness and diversity of foreign databases, the quality and industry recognition of data are often higher, and some of these corpus are often used as algorithm training and evaluation data.

Macro integration

In the context of the visible gap in overall data accumulation, how to catch up with the latecomers has also become one of the primary issues for countries to consider when formulating macro policies, and professional databases, national databases and other initiatives have also become the focus of government, industry, academia and research.

"The establishment of the national database is crucial to narrow the gap between domestic and foreign AI industry datasets and promote the importance and construction of domestic corpora." Gu Dujuan said that the construction of the national corpus needs to integrate different data resources in multiple fields, which puts forward high requirements for the quality, scale, diversity, accuracy and consistency of the corpus.

In fact, at present, many places in the mainland have started the coordination and opening of public data at the dataset level, and the practice of local data integration is gradually advancing.

The Shanghai Regulations on Promoting the Development of the Artificial Intelligence Industry issued by Shanghai in October last year proposed to promote the construction of high-quality datasets in the field of artificial intelligence. Support relevant entities to deeply integrate data with industry knowledge, develop data products, and serve requirements such as algorithm design, model training, product verification, and scenario application.

The recent Beijing Municipal Government of Beijing's "Several Measures to Promote the Innovation and Development of General Artificial Intelligence (2023-2025) (Draft for Comments)" also stated that it will work with relevant units to build large-scale pre-training basic datasets and high-quality fine-tuning datasets. Establish coordination mechanisms for the supply and use of training data, and strengthen communication and collaboration among relevant industry regulatory departments, relevant district governments, key R&D units, platform enterprises, data trading institutions, and other market entities.

"For all kinds of Internet entities, high-quality data sets are often difficult to integrate due to barriers between platforms, and it is more difficult to rely on market forces for comprehensive utilization, and it may be relatively feasible to rely on administrative forces to open up and supervise." The above artificial intelligence algorithm architect said.

For more information, please download 21 Finance APP