laitimes

The Linux community has banned the code of the large model!"shit" appears 7 times in small compositions

author:InfoQ

作者 | 褚杏娟、核子可乐

The Gentoo Linux distribution has officially stopped contributing AI-generated and assisted code.

On April 14, the Gentoo Board of Directors unanimously adopted a new AI policy that explicitly prohibits the provision of any content created with AI natural language processing tools to Gentoo. If such a tool did not address copyright, ethics and quality issues, the motion could be reopened.

This policy restricts Gentoo code contributions and official Gentoo projects, but does not prohibit the addition of packages for AI-related software or upstream software developed with the help of AI tools.

Gentoo Linux is a Linux operating system based on the Portage package management system, with almost unlimited adaptability, officially known as meta-distribution. The Gentoo Council is an elected committee that governs the Linux distribution.

Banning AI code contributions was originally proposed by Gentoo Council member Michał Górny on February 27. In the email, he said:

In light of the recent rapid spread of the "AI" bubble, the Gentoo Linux project team has begun to seriously consider the issues that come with it. In my opinion, the only reasonable course of action at the moment is to completely ban the contribution of projects created by "AI". Specifically, it should be explicitly forbidden to use ChatGPT, Bard, GitHub Copilot, etc. to create ebuilds, code, documentation, messages, and bug reports for Gentoo Linux.

To be clear, this is only "original" content for the Gentoo Linux project, and we have no influence on the upstream project's use of AI technology.

Here's why:

1. Copyright issues. At present, the copyright ownership of generated content is still unclear. And to be sure, almost all large language models have been trained on a large number of copyrighted materials, and the current well-known "AI" vendors in the market are obviously not concerned about copyright infringement. And the results generated by these AI tools may not be legally usable at all.

2. Quality issues. Large language models are particularly good at outputting plausible nonsense. I think that with a lot of care, big models can help, but we can't expect all contributors to the Gentoo Linux project to be acutely risk-aware.

3. Ethical issues. As mentioned earlier, "AI" vendors are not concerned with copyright, nor are they concerned with the rights and interests of people. The AI bubble is creating a huge waste of energy, which in turn is an excuse for layoffs and further exploitation of IT practitioners. AI technology is driving the spamming of content on the Internet, and spam and fraud are popping up at an unprecedented rate.

Gentoo has always had its own value judgments and wants to support those who lack the availability of mainstream distributions. I think that "hand-developed by real people" will be one of the features and strengths of the Gentoo Linux project, and that there will be policies in place to ensure that no spammy enters the project.

Michał Górny also listed links to AI spam examples in the email, and in the examples of the links, there were a lot of descriptive errors:

The Linux community has banned the code of the large model!"shit" appears 7 times in small compositions

Source: https://github.com/pkgxdev/pantry/issues/5358

In addition to banning the submission of AI-generated code, Górny hopes that Gentoo will make other unique contributions to the entire Linux community.

"I think it's a good opportunity to get the word out about the project," Górny said in an interview. A lot of projects are embracing AI at the moment, and I've found that a lot of Gentoo's users actually appreciate the traditional approach to software engineering, which says that people are more important than 'productivity'. ”

The ban is a precautionary measure, and there are no specific issues in the Gentoo community caused by AI-generated code. "We're taking early precautions. ”

AI is banned altogether, but it may be released in the future

Copyright is undoubtedly becoming a long-term challenge in the field of AI models. Most of these models used copyrighted material during training, and even Nvidia faced a lawsuit dispute. In addition, AI has been known to generate all sorts of nonsensical text and code, and has even been observed to output entire software packages out of "hallucinations".

The committee initially discussed Górny's proposed ban at a scheduled monthly meeting on March 10. However, since the exact terms of the ban have not yet been determined, several board members want to discuss more details and not take any action at this time. The ban was finally enacted at the April 14 Board meeting, which passed by a 6-0 vote, with only one member absent from the vote.

"My personal view is that we're just starting to focus on this issue. By the time the ban is actually announced and met with the majority of users, there should be more user feedback for our reference. ”

The Gentoo community also discussed a potential ban on culling AI in email threads and IRC chat rooms. Górny noted that there was agreement that "certain restrictions" should be imposed. With the ban in full effect, it is likely that more Gentoo community members will share their views on AI technology in the future.

Of course, enforcing this ban will also be challenging, as it is not easy to distinguish between human-written code and machine-generated code. In Górny's view, the biggest significance of the ban is not the actual effect.

"Our main goal is to be clear about what behaviors are acceptable and what are not, and to politely ask contributors to respect community norms," he said. "Specifically, the AI ban is primarily an extension of the existing rules for copyrighted code.

Górny adds, "If we receive a contribution that contains a 'weird' error, it doesn't seem likely that the error was caused by an adult. We're going to ask questions about it, and I'm afraid that's all we can do. ”

It is worth mentioning that the prohibition explicitly includes provisions that allow future policy content to be reviewed, reflecting the forward-looking concerns of some Board members. According to board member Sam James, "No one can predict how things can change significantly over the course of a year, or they may stand still. ”

The council has anticipated what might happen in the future and is considering opening the door to AI if necessary, using Gentoo code as material to train the corresponding models. This would theoretically both eliminate concerns about copyright infringement and lead to higher quality code.

Netizen: Wise!

"After looking at the link thread, I completely agree with Gentoo. This is a top review on Hacker News. Some netizens followed the post, "The content in the post really makes me unbelievable, how can people think that an automatically generated meaningless description is better than no description at all?"

"It's very wise to discard meaningless descriptions, and it's wise to try some kind of policy to prevent it. Some netizens said. People are really tired of the crap of big models. For example, Górny used the word "shit" 7 times in his original email on February 27, and although some netizens said that he was a little emotional, it could also be seen that he was disgusted with the problem of large models.

Of course, there are also those who argue that "banning LLM content" is a false effort. "If you want to ensure the quality of your code, you should focus on making sure that the code review and merging process is more thorough and filters out subpar contributions more effectively, rather than wasting time trying to implement strategies that simply can't be enforced, which will only give a false sense of trust and security. Netizen Tooster said that this is a legitimate concern, but it should also be addressed at the organizational level.

Most of the discussions about big models and copyright revolve around the core question of what it means to learn. To put it simply: human brain memory learning does not infringe copyright, so does algorithmic scraping learning infringe copyright? Gentoo's ban announcement has brought the topic back to discussion.

Some netizens believed, "It's fair that no one can use the copied copyright code verbatim, whether it's through human memory of something or through a computer to copy it." "But prohibiting humans, AI, or other agents from learning freely shared code on the Internet goes against the spirit of open source.

Humans learn by reading code without copyright infringement (copying knowledge into a person's brain in some way), but deep learning algorithms that learn by processing code tags scraped from public sources like GitHub don't have the same obviousness. "Is the human brain a copyright laundering machine?" asked netizen "zdimension". He believes that algorithmic scraping learning, which is also a learning behavior, should not be banned, but he does not deny the consequences of doing so, "We have seen a lot of bad results brought about by the democratization of GPT." ”

Facts have shown that there is still no solution to this problem.

On the other hand, Linus Torvalds, the founder of the Linux operating system and a leader of the open source movement, is optimistic about this problem. In an interview in February, Torvalds said that large language models are not seen as a threat, but as a useful tool. Things like reviewing code and maintaining subsystems are areas where big models can come into play, and where obvious stupid bugs can be found.

"The way most of us work is, in some way, a powerful version of auto-correction. I see it as a tool that can help us do better. Torvalds said. Nor is he bothered by the hype of artificial intelligence, but sticks to his passion for low-level hardware.

Torvalds is also optimistic about large model hallucinations and erroneous content, "I see errors every day that can occur without large language models. So I'm probably not too worried about that. I think we've done a good job ourselves. It's not hard to understand what he's saying when you think he's going to be mad about some of the bugs submitted by the community from time to time.

Original link:Linux A community blocks the code of the large model!"shit"7 times appeared in the small composition,Netizen:This move is very wise!_AI&Large model_Chu Xingjuan_InfoQ selected articles

Read on