For the first time in the history of a tech company: a Meta open source AI model of the size of the GPT3 parameter

2022-05-07 09:19:31

Large-scale language models that perform amazingly in generating paragraphs of text, simulating human dialogue, and solving mathematical problems are clearly one of the hottest areas of AI development in recent years. But such a large language model is not only capable of generating harmful content on its own, but can also spread such content through downstream applications built on it.

In theory, more people's participation should help solve the problem. However, because language models require a lot of data and computing power to train, so far, they are still only unique to large technology companies. Among broader groups such as academia, as well as ethicists and social scientists who fear the misuse of AI, there is only the option of bystanders.

"I believe the only way to build trust is to be extremely transparent." Joelle Pineau, managing director of Meta AI, said. On May 3, local time, Meta AI opened the large language model OPT-175B (Open Pretrained Transformer, OPT) with 175 billion parameters.

For big tech companies, it's an unprecedented move. Even in the history of large language models, this is the first time that pre-trained models, training code, and usage code have been made public without reservation.

"A lot of us are university researchers," Pinault said, "and we know that there is a clear gap between universities and industry in terms of their ability to build these models." The benefits of having researchers discuss this technique together are obvious. She hopes others will take a closer look at their work, disassemble and analyze it, or build on it. She believes that when more people are involved, breakthroughs will be achieved faster.

There are about 175 billion parameters in the OPT language model (these are parameters that neural networks can be tuned during training), which is essentially the same size as OpenAI's pioneering neural network GPT-3, while having the extraordinary capabilities and inevitable flaws of the paid service GPT-3.

Pino made no secret of the fact that "this was carefully designed," and the team considered matching GPT-3 in terms of accuracy and harmfulness of language tasks when building OPT. OPT is designed to provide researchers with a similar language model to study.

OpenAI declined to comment on Meta's statement.

Google, the parent company of OpenAI, is exploring the use of large language models in its search products, but has also been criticized for its lack of transparency. Google has been controversial in this regard, having fired Timonit Gebru, an ai-intelligence ethics researcher who wanted to publish a paper about the possibility that Google's language system at the time might have learned from the site containing biased and hate speech, and recently fired an employee who disputed the published study.

So, why is Meta doing this? After all, Meta is also a tech company that rarely mentions how the algorithms behind Facebook and Instagram work, and was known for having its internal research team hide problems that work against it.

According to MIT Technology Review, a big reason for Meta's different approach is Pino himself, who has been pushing for transparency in the AI development process for years.

In the way research is published at core academic conferences, Pinault asked researchers to submit the included code and details about how the experiment was conducted along with the results. She has been advocating for this culture in its AI lab since she joined Meta (then Facebook) in 2017.

"Meta's commitment to open science is why I'm here," Pino said, "and I wouldn't be here to work because of other conditions." ”

In addition to the code, Meta also publishes development logs. The log contains daily updates from team members to the data training: how it was added to the model, and when, which worked, and which didn't. In more than 100 pages of notes, the researchers recorded every error, crash, and reboot during the three-month training session that ran non-stop from October 2021 to January 2022.

Percy Liang, director of the Center for Fundamental Model Research at Stanford University, summarizes the openness of large models into 4 levels:

The first layer of paper openness, proving the feasibility of some ideas, and providing construction ideas; the second layer of API openness, allowing researchers to explore and evaluate the ability (such as inference ability) and limitations (such as bias) of existing models; the third layer of model weight openness and training data openness, allowing researchers to gradually improve existing models, develop more in-depth interpretability techniques and more effective fine-tuning methods, so that researchers can better understand the role of training data in model behavior; fourth layer computing power open, Allows researchers to experiment with new architectures, training goals and processes, perform data fusion, and develop entirely new models in different fields.

"Higher levels of openness allow researchers to focus on deeper issues while also creating more risks." Percy Liang makes this clear.

Meta's open-source of its large language model to this extent is a very bold move that may create risks that are unimaginable today. This is also the reason why OpenAI gave for not releasing GPT-3's predecessor, GPT-2.

"I can't tell you that this model doesn't create other terrible risks." Pinault refuted the idea that "just because it's too dangerous" and therefore shouldn't be released. "I understand the weaknesses of these models, but it's not a research mindset," she says. ”

According to MIT Technology Review, Margaret Mitchell, an AI ethics researcher who was fired by Google for "violating its code of conduct," sees the opt's launch as a positive move. But she believes that there is a limit to transparency. She asked, "Has the language model been rigorously tested enough?" Do the foreseeable benefits outweigh its foreseeable harms? How can misinformation, or racist and misogynistic language, be avoided in the process? “

University of Washington computational linguist Emily M. Emily M. Bender, who has worked with Mitchell at google centers, worries about how to deal with potential harms. "The real key to mitigating the risk of any machine learning technology is to evaluate and explore specific use cases, such as what is this system designed for? Who will use it? How will the system output be presented to them? ”

For Pino, those concerns should be addressed through more open discussion, not less communication. "People around the world have different opinions about what kind of conversation is appropriate, and AI is part of the conversation," Pinault did not expect the language model to say something that everyone agreed with, "but how do we deal with it?" That is to listen to the voices of others in the course of discussion."

For the first time in the history of a tech company: a Meta open source AI model of the size of the GPT3 parameter

Read on