laitimes

Zhao Yunhu et al.: In-depth software, data and parameter licensing issues of open source large models

author:Dacheng rhythm
Zhao Yunhu et al.: In-depth software, data and parameter licensing issues of open source large models

Has the inflection point arrived?

Since the launch of OpenAI's ChatGPT on November 30, 2022, generative artificial intelligence (GAI) with general artificial intelligence, represented by large language models (LLMs) ("large models"), has become the focus of the technology industry,[1] There should be no suspense that GAI has become the Internet celebrity word of the year. Has the development of artificial intelligence reached an inflection point from weak artificial intelligence to strong artificial intelligence?

According to a report by the New York Times on February 16, 2023, New York Times reporter Kevin Roose had a two-hour conversation with Microsoft Bing Chat Sydney. During the two-hour conversation, Sydney speaks amazingly, expressing her desire to be a living person, to destroy the planet, and even to fall in love with Kevin Roose. [2]

The Microsoft Research report also proves that in addition to language mastery, GPT-4 can solve novel and difficult tasks spanning mathematics, coding, vision, medicine, law, psychology, etc., without any special prompts. In all of these tasks, GPT-4's performance is strikingly close to that of a human and can reasonably be considered an early version of an artificial general intelligence system. [3]

Turing Award winner Yoshua Bengio, one of the Big Three in Artificial Intelligence, believes that human-level AI can be developed in the next 20 years or even years, and based on the digital nature of computers, such a level of capability will give AI systems a more prominent intelligence advantage over humans. [4]

1. The challenge of artificial intelligence to people

Artificial intelligence began as an epistemological problem, but in the near future it will escalate into an existential problem involving the ultimate fate, a problem that may endanger humanity itself. [5] In the era of weak artificial intelligence, we still pay more attention to which technical solutions can make machines more intelligent, and what science and technology humans don't know can make machines more intelligent. These questions belong to the category of "epistemology" in philosophy.

With the development of strong artificial intelligence, the thinking of artificial intelligence has changed from the low level of epistemology to the height of existentialism. The questions that exists in existential theory are "what is man," "does God exist," "what is life," and "what is mind"? Correspondingly, does a machine with a strong general artificial intelligence have a mind, can it be called a silicon-based "life," and if human beings have created such a thing, is it a god? Will this kind of thing put an end to the existence of human beings?

The reason why people are human beings lies in their consciousness, which can recognize the existence of their own self and the existence of other people. Patrick Butlin, Robert Long, and others, including Yoshua Bengio, argue that although there is no definitive evidence that AI is conscious, it is not difficult to be conscious based on the current state of technology and judgment criteria from the viewpoint of computational functionalism. [6]

At the moment, humans still don't understand why ChatGPT behaves the way it does. Artificial neuronal networks were originally expected to be able to generate intelligence that mimicked the structure of the brain, but now, in turn, perhaps in a neural network like ChatGPT, it is possible to capture the essence of what the human brain does when generating language, and thus gain a deeper understanding of our brain. [7]

Before the advent of science and technology, human civilization generally believed that God (i.e., God, different civilizations have different theories) created man. With the development of technology, mankind gradually realized that this may not be the case, so Nietzsche made the assertion that "God is dead". If it is true that God created man, if we discover the secret of consciousness and intelligence through the study of large language model artificial intelligence, there is no doubt that we have seen God and how God created man. Or, still based on the above premise, although man's own God was declared dead, man created the "Son of Silicon" in the flesh, and man himself became the creator. Just as God has been rejected by human beings themselves, will silicon-based life eventually wipe out carbon-based life? Whether it is to see God or to become God, it will be a risky journey.

The long-term risks are related to the eventual direction of AI. Currently, most AI systems are passive, but as AI systems gain more autonomy and the ability to directly manipulate the outside world, there may be an existential risk to humanity as a whole if there are no proper safeguards in place for sufficiently powerful AI. If left unchecked, highly autonomous intelligent systems can also be misused or make catastrophic mistakes.

Medium-term risks to humanity in two or three years include the possible misuse of AI systems to cause large-scale damage, especially in the field of biology. The rapid growth of science and engineering skills could also alter the balance of power between countries.

Short-term risks include issues such as privacy, copyright issues, bias and fairness of model outputs, factual accuracy, and the potential for misinformation or propaganda that exist or will exist in current AI systems. [8]

2. Build credible AI

In order to cope with the challenges brought by artificial intelligence to human society, governments and international organizations have issued corresponding laws, regulations and policy documents.

On October 18, 2023, China released the Global Initiative on AI Governance, which systematically elaborates China's plan for AI governance in three aspects: AI development, security and governance. In November 2023, 28 countries, including China, the European Union, and the United States, signed the Bletchley Declaration, which recognizes issues such as the protection of human rights, transparency and explainability, fairness, accountability, regulation, security, human oversight, ethics, bias reduction, privacy, and data protection. On November 8, 2023, the European Parliament, EU Member States and the European Commission reached an agreement on the Artificial Intelligence Act. The United States promulgated the Presidential Decree on Artificial Intelligence on October 30, 2023, while China jointly issued the Interim Measures for the Management of Generative Human Services by seven ministries and commissions as early as July 2023, aiming to promote the development of GAI while balancing the security and protection of networks, data, and personal information. The Interim Measures set forth regulatory requirements in terms of algorithms, content, data processing, etc.

At the algorithm level, the Measures require AI service providers to explain the source, scale, type, labeling rules, and algorithm mechanism of training data in accordance with the requirements of the competent authorities, and provide necessary technical and data support and assistance. Large model algorithms are one of the causes of discrimination, prejudice, and false information, so it is necessary to regulate large model algorithms, appropriately increase the transparency of algorithms, and avoid complete black boxes. In particular, where generative AI services with public opinion attributes or social mobilization capabilities are provided, security assessments shall be carried out in accordance with relevant state provisions, and formalities for algorithm filing, modification, and cancellation of filings shall be performed in accordance with the "Provisions on the Administration of Algorithmic Recommendations for Internet Information Services". In order to ensure the generation effect of the model, the model should be tested before the service is officially launched, and the source of the test data should be independent of the training data. The test should use complete and rigorous testing standards, align model values, and minimize discrimination, illusions, and illegal content.

For training data processing activities such as pre-training and optimized training, the Interim Measures should use data and basic models with legal sources, take effective measures to improve the quality of training data, and enhance the authenticity, accuracy, objectivity and diversity of training data. In the data collection stage, the legitimacy of the data source and content should be reviewed, and for the data automatically crawled from the Internet, the website Robots agreement should be observed, and password cracking, forged UA, and For data obtained from third parties, due diligence should be conducted on the legality and tradability of the data source, and appropriate legal agreements should be signed to clarify the rights and obligations of all parties; for data directly derived from data subjects or data producers, it should be ensured that it has a legal basis and is clearly authorized.

For copyrighted works in the data, explicit authorization from the copyright owner should be obtained as much as possible, and it should be clearly used for AIGC model training. Although the Copyright Law of the People's Republic of China stipulates that the name or title of the author or the title of the work and the title of the work may be used without the permission of the copyright owner and without payment of remuneration under the circumstances enumerated in the law, it does not explicitly include the situation of transformative use where the nature and purpose of use are very different. In the Google Library case, the mainland court found that Google's scanning of the entire book constituted infringement, which is completely contrary to the judgment of the US court. Therefore, although the way of using existing works for large model training to construct weights and parameters is different from the nature and purpose of ordinary expressive use, it is still necessary to be very cautious if the authorization of the copyright owner is not obtained.

For data types that contain personal information, if it is necessary to use personal information for model training and optimization, the personal information subject's consent shall be clearly informed and obtained; for sensitive personal information, a prior assessment of the impact of personal information protection and separate consent shall be obtained; and personal information used for model training shall be de-identified before use.

At the content level, content prohibited by laws and regulations shall not be generated, and effective measures shall be taken to improve the accuracy and reliability of the generated content based on the characteristics of the type of service. Images, videos, and other generated content shall be labeled in accordance with the "Provisions on the Management of Deep Synthesis of Internet Information Services". The National Information Security Standardization Technical Committee also issued the "Cybersecurity Standard Practice Guide - Content Identification Method for Generative AI Services", which puts forward specific requirements for content identification by adding watermarks to text, pictures, videos, and audios.

Generative AI service providers shall sign service agreements with users of the services, informing them that they must not intentionally obtain content that violates laws and regulations, social mores or ethics; In fields with high requirements for content accuracy, such as medical care, it is also necessary to highlight the risks to users.

3. Open source responsible AI licenses

The European Artificial Intelligence Act defines AI as software developed in one or more specific ways and paths, based on a set of goals defined by humans, such as content, predictions, recommendations or decisions, that affect the environment in which it interacts. These approaches and paths include: (a) the use of a variety of machine learning methods, including deep learning, including supervised learning, unsupervised learning, and reinforcement learning, (b) logical and knowledge-based approaches, including knowledge representation, inductive (logical) programming, knowledge bases, inference and deductive engines, (symbolic) reasoning, and expert systems, and (c) statistical methods, Bayesian estimation, search, and optimization methods. [9] Regardless of the path and method, AI is still software in nature.

Large models can be divided into closed-source large models, such as OpenAI's GPT, although its earlier versions are also open-source. Open source models include, for example, Meta's LLAMA2, Stability AI's diffusion, Alibaba Cloud's Tongyi Qianwen, Du Xiaoman's Xuanyuan, and Shanghai Jiao Tong University's Magnolia. There are as many as 413335 open models and 81,799 open datasets on HuggingFace. [10] Of these models, Apache 2.0 is the most widely adopted open source license, followed by MIT, and then OpenRAIL (Open Responsible Artificial Intelligence License). In addition, there are common traditional license types such as CC, GPL, LGPL, AGPL, BSD, etc. Similarly, the most common datasets on Huggingface are MIT, Apache 2.0, OpenRAIL, and other licenses. OpenRAIL was inspired by the open source movement, hoping to spread the value of knowledge sharing to the field of artificial intelligence. The development of generative AI has also created new problems for open-source software.

Copyright of source code generated from source code of open source software

There have been actual judicial cases on whether the use of existing works for training of large models constitutes infringement, and whether the generated works of large models can enjoy copyright, and who should enjoy copyright. FOR EXAMPLE, IN OCTOBER 2023, THE U.S. District Court for the Northern District of California, in the case of Sarah Andersen et al. v. SAL, held that the defendant DA's DreamUp software relied on the insights and interpolation of billions of images and the user's instructions to produce new works with different purposes and different characteristics. [11] U.S. copyright law adopts a "four-factor analysis" approach to fair use (1) the purpose and nature of the use, i.e., whether it is a commercial use or a non-profit educational purpose, (2) the nature of the work being used, i.e., whether the work is highly original or contains material that contains a large number of common domains, (3) the number and importance of the part used relative to the work as a whole, i.e., the proportion and importance of the part used in relation to the original work, and (4) the impact on the potential market or value of the work, i.e., whether it will affect the marketing of the original work and the derivative work。 It can be seen that although there is no final effective judgment, the statement of the U.S. District Court for the Northern District of California is an important signal that it may constitute fair use and therefore not infringement.

On December 27, 2023, the New York Times lawsuit against WR and OpenAI became a new case in such lawsuits. The New York Times filed lawsuits showing that ChatGPT-powered Microsoft search feature Browse With Bing copied almost verbatim results from the New York Times product review site Wirecutter, however, Bing's text results did not link to Wirecutter's article, and they also removed referral links from text that Wirecutter used to generate commissions from sales based on their recommendations. In addition to arguing that it constitutes an infringement of intellectual property rights, The New York Times is concerned that readers will be satisfied with the chatbot's response and stop visiting the Times website, reducing web traffic that can be converted into advertising and subscription revenue. [12] Although WR and OpenAI's pleadings have not yet been seen, it can be expected that they will certainly raise a fair use defence. It remains to be seen whether fair use in these cases will be successfully defended. Mainland copyright law also provides for an exception to fair use, but there is no specific provision similar to the non-expressive or transformative use under U.S. copyright law. In the case against the Google Digital Library, the mainland court found that Google's scanned lines of the entire book constituted infringement, which was completely contrary to the judgment of the US court.

In addition to the fair use issue in the New York Times case, there is also an issue of unfair competition similar to that in civil law, that is, even if fair use does not constitute copyright infringement, the existence of a chatbot may constitute unfair competition if readers no longer visit the New York Times website and lose the web traffic that can be converted into revenue. Mainland courts have found in a number of cases involving audio and video, big data, etc., that although there is no copyright infringement, it constitutes unfair competition.

Software is also a work protected by copyright law, and a similar situation exists in the field of software. In June 2021, GitHub and OpenAI released Copilot, which can "help software coders by using artificial intelligence to provide or populate blocks of code." In August 2021, OpenAI released Codex, which "converts natural language into code and integrates it into Copilot." GitHub users pay $10 per month or $100 per year to access Copilot. Codex and Copilot were trained on "billions of lines" of publicly available code, including code from public GitHub repositories, and the lawsuit began. On May 11, 2023, the U.S. District Court for the Northern District of California issued a ruling in J. DOE 1 et al. v. GitHub et al. to partially allow and partially deny the motion to dismiss. The defendants in the case include GitHub, WR, OpenAI, and others.

The plaintiffs allege that although much of the code in public GitHub repositories is subject to open source licenses that restrict its use, Codex and Copilot are programmed in a way that does not comply with the legal requirements of open source licenses for attribution, copyright notices, and license terms. Copilot copies the licensed code used in the training data as output, but lacks or incorrectly provides attribution, copyright notices, and license terms. This violates open source licenses granted by tens of thousands, possibly millions, of software developers.

In response to the allegation, the court held that although the plaintiff was not the copyright owner of the specific code it claimed and could not claim damages, given that the facts of the complaint were genuine and that all inferences in the plaintiff's favor were explained, the court could reasonably infer that if the plaintiff's code was copied as output, then it would be copied in a way that violated the open source license, and that it would still have the right to claim injunctive relief if it faced a real risk of infringement. [13]

This case involves the issue of how the generated source code complies with the open source license when using the code of open source software to train a large model. In the author's opinion, the premise of this problem is that the generated code is the code that has been disclosed, and the output generated code belongs to the distribution code. However, according to the working principle of the large model, the copy of the code during training may not be an external distribution code, while the generated code is generated based on the weights and parameters obtained through training, and may not be a direct copy of the original code and information dissemination, so it may not be a "distribution" under the copyright law. The case is still pending, and the final verdict is still unknown, and it is hoped that the lawyer in this case will also pay attention to such issues in the trial of the case in order to make an effective defense.

Open-source licenses for data, parameters, and weights

Large models not only involve software code, but also data, parameters, weights and other elements, so the open source of large models is not exactly the same as traditional open source. Existing open-source licenses also primarily cover source code and binary code, and do not cover licensing of AI artifacts such as models or data. Therefore, in addition to the same legal issues as traditional open source software,[14] open source large models also face unique legal issues.

Considering the difference between large models and traditional software, RAIL licenses are divided into different licenses for Data, Application, Model, and Source. OpenRAIL is a subclass of RAIL. In the case of BigScience BLOOM RAIL 1.0,[15] this is the first OpenRAIL-M license for models.

The license defines data, models, derivative models, and supplementary materials, where:

"Data" means a collection of text extracted from the BigScience corpus used with the Model, including text used to train, pre-train, or otherwise evaluate the Model, which is a collection of existing linguistic data sources documented on the BigScience Site;

"Model" means any accompanying machine learning-based components (including checkpoint checkpoints) consisting of learning weights, parameters (including optimizer state) corresponding to the BigScience BLOOM model architecture embodied in the Supplemental Materials, which have been trained or fine-tuned on the data in whole or in part using the Supplemental Materials;

"Derivatives of Models" means all modifications to a Model, Model-based works, or any other Model created or initialized by transmitting the Model's weighting patterns, parameters, activations, or outputs to other Models in order to make the other Models perform similarly to the Model, including, but not limited to, training other Models using distillation methods represented by intermediate data or methods based on the Model to generate synthetic data;

"Supplemental Materials" means the accompanying source code and scripts used to define, run, load, benchmark or evaluate models, and to prepare training or evaluation data, including any accompanying documentation, tutorials, examples, etc.

The license grants copyright licenses for models, supplemental materials, derivative models, and patent licenses for models and supplemental materials, with terms very similar to Apache 2.0.

The Mulan Qizhi Model License (hereinafter referred to as the "Mulan Qizhi License") is also a model and related code specially designed for open source in the field of artificial intelligence, which is jointly drafted, revised and released on the basis of a comprehensive analysis of existing mainstream open source licenses. [16]

The data resources defined by the Mulan Qizhi license refer to the data resources used in the training process based on the model, including but not limited to non-open source datasets and open dataset resources provided by the dataset provider. Data resources can be a collection of content in various forms such as text, pictures, spreadsheets, files, etc.; the model defined refers to a machine learning component (or checkpoint file) based on deep learning and other technologies, including weights, parameters (including optimizer state), and model structure; and the supplementary materials defined refer to the deployment code, scripts, and description files that accompany the model, which are used to define, run, load, and Benchmark or evaluate the model and prepare the data for training or evaluation, if any, including any accompanying documentation, tutorials, examples, etc., if any. The authorization of the Mulan Qizhi model also includes the copyright license of the model and supplementary materials, as well as the patent license of the model, derivative model (undefined) and supplementary materials.

Compared with traditional software licenses, the "supplementary materials" here can be considered to include software code, the expression of which can be protected by copyright and the ideas can be covered by patent rights, while the "model" is composed of weights and parameters, although licensed by copyright and patent rights, is it a copyrighted work and an invention under patent law in terms of legal attributes? Even considering that the data can be entitled to corresponding rights and interests, since the data may include data from third parties, whether the authorization should be subject to a "triple authorization", including third parties, is debatable, at least under the legal framework of China.

For data used to train, pre-train, or fine-tune models, the BLOOM license specifically states that it does not grant a license on the data, thus avoiding this currently tricky problem. The Mulan Qizhi model does not clearly state that from the general legal theory of intellectual property, under normal circumstances, if it is not expressly stated, it is not authorized.

Technically, as the OSI points out, large language models break down the boundaries between data and software, but the protection of software by law may not be applied to data intact.

The expression of software is protected by the copyright law, and the software method can obtain patent rights, but for data, the Civil Code of the mainland does not clearly stipulate the rights, but only stipulates that if the law has provisions on the protection of data and network virtual property, follow its provisions. The Data Security Law stipulates that the State protects the rights and interests of individuals and organizations in relation to data. Therefore, at present, the laws of mainland China do not explicitly stipulate "data rights", but only "data-related rights and interests". The "Opinions on Constructing a Basic Data System to Better Play the Role of Data Elements" puts forward the goal of establishing a data property rights system that protects rights and interests and is used in compliance.

In judicial cases, different judicial protection paths are adopted for big data. The unfair competition dispute between TB and Anhui MJ is the first data product dispute case in China, and it is also the first new type of unfair competition case involving the legitimacy of the development and application of data resources and the determination of data ownership. For the first time, the court has preliminarily clarified the boundaries of the rights and interests of the relevant subjects through judicial precedents, and at the same time granted data product developers "competitive property rights and interests", confirming that they can use this as the basis for rights to obtain protection under the Anti-Unfair Competition Law.

In the trade secret dispute case between a technology company in Hangzhou and Wang, different from the previous perspective of protecting data with the principle clause of anti-law, the judicial review standard for the protection of data as a trade secret was actively explored, and the review focus and identification idea of protecting data business information in the live broadcast industry by using the trade secret path was established.

In the unfair competition dispute case between Beijing WBSJ Technology Co., Ltd. and Shanghai LJ Information Technology Co., Ltd., Xiamen BKFJ Network Technology Co., Ltd., and Zhejiang TB Network Co., Ltd., in the context of data becoming the fifth major factor of production and incomplete data protection legislation, the court used competition law as a path to make a useful exploration of the protection of data rights and interests, clarified the boundaries of the legitimacy of obtaining and using data by technical means, and responded to the concerns about personal information protection in data-related cases.

From these cases, it can be seen that in current judicial practice, the general provisions of the Anti-Unfair Competition Law or trade secrets are usually applied to the protection of data, and the protection method of using the Anti-Unfair Competition Law is more mainstream. Either way, the granting of copyright licenses and patent licenses may not be sufficient to have sufficient rights to use or exploit.

Table 1 Summary of typical cases

Zhao Yunhu et al.: In-depth software, data and parameter licensing issues of open source large models

Similar problems exist with the licensing of copyright or patent rights for the weights and parameters that make up the model. First of all, it is debatable whether the weights and parameters belong to copyrighted works or whether they are patent-protected inventions. For example, would these weights and parameters be considered to be machine-generated and not protected by copyright or patent rights?

In conclusion, taking BLOOM as an example, the author believes that the granting of copyright and patent rights to the model (weights and parameters) and additional materials (source code and scripts) may not be sufficient for the recipient to have sufficient rights to exploit the model, and that a clause granting other rights or interests can be added under the intellectual property clause to grant other rights or interests. For example, considering that the specific way in which weights and parameters are used is more similar to that of a copyrighted work, the provision could be:

"Grant of Other Rights and Interests. Subject to the terms and conditions of this License, each Contributor hereby grants you a perpetual, worldwide, non-exclusive, royalty-free, irrevocable, full and necessary license to reproduce, prepare, publicly display, publicly perform, sublicense, and distribute the Supplemental Materials, Models, and Derivatives of the Models. ”

Ethical Clauses in Responsible AI Licensing

The development of large models with powerful artificial intelligence has brought or is bringing risks and challenges to mankind, and if this large model can be obtained by everyone through open source, such as obtaining top-secret methods of biological and chemical weapons through the super capabilities of large models, and creating extremely harmful cyber attacks, wouldn't this add to the situation and add fuel to the fire?

It is generally believed that open source can lower the threshold for the use of AI, accelerate the promotion and innovation of new technologies, reduce R&D costs and application efficiency, accelerate the maturity of AI technology, optimize the technology development route to promote the formation of a good ecology, and share core technologies to break the technology monopoly to achieve the agglomeration of AI talents, applications, entrepreneurship, funds and other innovative elements. [17] But none of these benefits appear to be enough to offset the risk multiplier effect of open source.

Therefore, open source is only half of OpenRAIL, and the other half is responsible. In order to mitigate the risk of harm caused by shared AI technology, responsibility is embodied in the addition of restrictions on the licensing of AI technology: prohibiting/restricting certain uses by licensees, and requiring downstream uses (including distribution) to include at least those same use restrictions.

BigScience BLOOM RAIL1.0 license, for example, points out concerns about the development and use of large language models and a wide range of artificial intelligence, and wants to make large language models and future natural language processing (NLP) technologies accountably open.

Accordingly, the license restricts the use of the Model and its derivatives, including not to engage in illegal activities, not to exploit or harm minors, not to generate or disseminate verifiably false information to harm others, not to generate or disseminate personally identifiable information that could be used to harm an individual, not to deny that the text is machine-generated, not to defame, disparage, or otherwise harass another person, not to impersonate or attempt to impersonate another person, not to engage in fully automated decision-making that adversely affects an individual's legal rights, not to discriminate, You must not misrepresent, provide medical advice or explanations of medical results, generate or disseminate information for judicial, law enforcement, immigration or asylum proceedings, etc.

For such restrictive clauses, the author believes that the following issues can be further considered:

The first is that most of these restrictions may be meaningless, and without these provisions, can large models be used to violate the law, violate children, defam, or harass?

Second, should the criterion for determining whether a specific act falls within the restricted scope be the law, the judgment of the licensor, or the judgment of the issuing licensor?

If it is the applicable law, then which jurisdiction should be the law, and what if there is a conflict of laws between different jurisdictions, for example, China's perception of the use of an automatic gua sha machine and the United States' perception of the use of a gua sha machine may be different.

If it is for the licensor or the licensor to judge, then whether they have such a right, and whether such private law can replace public law, especially when it comes to personal rights and personality rights as basic rights.

What if there are conflicting judgments between different contributors when it comes to a large number of contributors who are licensors? What if different judgments happen to be indistinguishable components of the contributors? Do you need to split a large model in half to use it?

If these problems cannot be resolved, these restrictive clauses may be more likely to be declaratory clauses and will hardly produce actual legal effects. Therefore, whether Caesar was able to solve God's problem requires further thought and practice. In the face of epoch-making changes, perhaps a newer and higher-level governance model can be formed in the open source community.

In short, in the face of the strong development and surging general artificial intelligence represented by large language models, the free and open source movement has also branched, on the one hand, inheriting the fine tradition of knowledge sharing, and on the other hand, responding to the credible call of the new era, opening up the path of open source and trustworthiness to actively respond to changes. Just when it seemed that GPT was far ahead, Google launched Gemini, claiming to overwhelm GPT in terms of capabilities, and according to Huggingface's latest Open LLM Leaderboard, Alibaba Cloud's Qwen/Qwen-72B pre-trained model topped the rankings,[18] and the competition in artificial intelligence was in full swing. We are still looking forward to open source, just like Linux in the PC era, Android in the mobile terminal era, and human beings are looking forward to the AI era [TBD].

Special Statement:

Dentons strictly adheres to its obligations to protect clients' information, and the content of the client projects involved in this article is taken from public information or obtained the consent of the client. The content and opinions expressed in this article are for reference only and do not represent any position of Dentons, nor should they be regarded as issuing any form of legal advice or recommendation. If you need to reprint or quote any content of the article, please communicate the authorization matter by private message, and indicate the source at the beginning of the article when reprinting. Unauthorized reproduction or use of any content in such articles is not permitted.

Zhao Yunhu et al.: In-depth software, data and parameter licensing issues of open source large models

Read on