laitimes

Zhu Tianhua et al.: How far is artificial intelligence open source from the public?

author:Shangguan News
Zhu Tianhua et al.: How far is artificial intelligence open source from the public?

In 2021, Shanghai, a youth artificial intelligence and programming practice activity. Surging News Reporter Zhou Pinglang Photo

"Open source", someday will come?

At the just-opened 2023 World Artificial Intelligence Conference, Yann LeCun, winner of the 2018 Turing Award and chief AI scientist of the Meta AI Basic Artificial Intelligence Research Institute, remotely connected and participated in the roundtable discussion. During the dialogue, he believed that strict regulation of artificial intelligence does not make artificial intelligence platforms safe, good and practical, and that "in the long run", the only way to achieve this goal is open source.

"Open source" reflects the demand for the details of AI technology to be made public. In fact, in 2022, the Cyberspace Administration of China and other four units jointly promulgated the Provisions on the Administration of Internet Information Service Algorithm Recommendations, marking that China became one of the first countries to legally require the disclosure of AI technical details. The Provisions require relevant service providers to disclose rules related to algorithm recommendation services, and have established an Internet information service algorithm filing system to publicize to the public.

However, in the first 30 "algorithms" published, the general public could not find specific instructions on how the data was processed (such as the ranking of evaluation weights for different types of data) from the browsable information as expected, but only a rather general description of the information collection category and the final result. These contents are clearly stated and intuitively displayed in the "Privacy Policy" established in accordance with the relevant regulations and on the user interface of the software. From the basic meaning of algorithms as a determined data processing method, the description of these publicities is completely different from "algorithms".

Compared with publicity, the requirement of "open source" is undoubtedly a step further. In April 2023, Twitter claimed that it had published the code of its recommendation algorithm on the internet. Musk stressed that this is to "improve the transparency of the platform and enhance the trust of users, customers and media." However, after careful research, the researchers pointed out that the code is not exhaustive, especially omitting the crucial underlying model. Twitter responded that this was to "ensure that the security and privacy of users are protected," although the code describing the structure of the underlying model does not contain any user data.

On the other hand, just as Microsoft did not open source its speech synthesis model VALL-E, OpenAI did not open source ChatGPT, Yang Likun, as a scientist specializing in vision, also did not disclose his latest masterpiece "Segment Anything Model" (meaning "divide everything") training method. Perhaps, the phrase "long-term" used by Yang Likun is implying that the road to "open source" is far away.

Talking about "open source" cannot be separated from the context of reality

As a term for "geek fan", the term "open source" is gaining popularity in different fields. The stalwarts shaping this trend are still Richard Stallman and his Free Software Foundation. There is not much controversy surrounding the two, but its contribution to the open source movement is still undeniable.

But when we broaden our gaze a little, from the history of computer technology, it is not difficult to find that "open source" is a natural thing. In the stage of "a hundred schools of thought" in the hardware architecture of computer systems, the computer systems owned by various institutions are different to varying degrees. In order for a program to run on different systems, source code becomes the only way to deliver a software product: only by providing the source code can users who are experts solve the problems they encounter on their own systems.

However, since the 70s of the last century, the standardization of hardware and the popularization of personal computers have made great changes in the computer industry. Users are no longer technologists, and software products become a thriving business. The huge space for interests requires legal protection. Between 1974 and 1981, the United States established a series of software-related laws that recognized the copyright protection of software works and provided for the application of patents. In this context, in 1983, IBM proposed an "object code-only" delivery strategy (i.e., only code that works on a computer, not source code). Since then, almost all software companies no longer provide source code when shipping software products.

Zhu Tianhua et al.: How far is artificial intelligence open source from the public?

Screenshot from Word's About window

The unfolding of the "open source" movement is partly a protest against this change. Also in 1983, Stallman began working full-time for the Free Software Foundation. Based on the legal provisions protecting software copyright, the Free Software Foundation has creatively proposed the GPL-covered license with the intention of making source code an integral part of software delivery again.

In this way, "open source" is born in a specific technical and legal context, and is closely related to this context. Once detached from this historical fact, the result of moving "open source" directly to other fields will either be mundane or meaningless.

At the same time, simply advocating the ideal of "open source" cannot automatically lead to the solution of problems. In fact, in order for "open source" to take root, open source projects generally need good organizational management. How to maintain the sustainable operation of a team has always been a difficult business. Some open source projects that used to be "favors" for large companies have turned closed source due to the demand for benefits, often bringing huge chain reactions.

AI models face a similar situation. They are usually just a part of the software, and there are no strong requirements for open source of models, and it can be said that this good ideal is fundamentally contrary to the interests of R&D institutions. It's hard to imagine that relying solely on a desire to "do good" can drive return-seeking R&D companies to actively choose "open source" - perhaps, it is precisely the regulation that Yang Likun preemptively excludes that can make it embark on the road of open source.

Closed code, open mind

While Yang Likun described the prospect of artificial intelligence "open source", there are also critics who believe that open source cannot truly solve the trust crisis faced by artificial intelligence technology. To use a properly inappropriate analogy: the danger of radioactivity does not disappear just because it is "open source"; it will only be magnified if everyone in a society can use it. This analogy is not a fantasy – in the 50s of the 20th century, the use of X-rays ranged from treating headaches on the part of patients themselves to choosing the right shoes for customers. The improper use of X-rays in everyday life has had widespread consequences, prompting people to take protective measures and develop operational norms.

Yang Likun put forward "open source" as a solution, and what needs to be paid more attention to is the question to be answered behind it. "Open source" makes sense because it provides a way of expressing it to provide insight into the technological processes that are actually happening in the process of building some kind of AI model for good.

However, in order to gain this insight, source code may not be irreplaceable. In fact, the source code emphasized by "open source" is itself just a means of conveying ideas. On the other hand, the protection of software code does not extend to ideas, processes, methods of operation or mathematical concepts above the expression (program code).

In recent years, in some studies on the "platform economy", there have been attempts to use algorithm descriptions to explain individual perception and system behavior, and how the value appeal of the platform side is embedded in the automatic control of the program step by step. This provides a desirable avenue for understanding the role of AI technologies in society. The call for disclosure of algorithm details shows that there is still hope for the promotion of such design-based criticism.

To truly achieve a similar level of understanding of AI technology, one needs neither a lot of code that "misses" key content, nor a "description" that is so general that it only involves both the input and output ends. The openness of thought is actually more important than keeping code closed for various reasons: the understanding of technical details is not primarily code-oriented, but nothing more than "processes, methods, and concepts."

Of course, the understanding mentioned here does not involve the deeper problem of the explainability of artificial intelligence technology itself. From a public interest point of view, however, this deep technical explainability may be a blindfold: after all, people don't need to know all the physical processes that take place at the time of the gun to understand the dangers of the shooting, and the design of the gun itself does not constitute a reason for immunity. Similarly, if an AI model causes harm to others, the key remains to hold accountable those who determine how it was designed. In this way, artificial intelligence models, algorithms, codes and other endless technical elements will not become one after another to create information barriers and evade responsibility, and the future depicted by Yang Likun through "open source" is likely to really come.

(Zhu Tianhua, Assistant Researcher, Institute of Literature, Shanghai Academy of Social Sciences;

Hanyang Chen, Independent Software Developer)

Read on