laitimes

The large model review research signed by hundreds of scholars was questioned for "plagiarism", and the Zhiyuan Research Institute officially issued an apology letter

Reports from the Heart of the Machine

Machine Heart Editorial Department

Zhiyuan Research Institute said: "In this case, the institute immediately organized an internal investigation, confirmed that there were problems in some articles, and then initiated the invitation of third-party experts to conduct independent review and relevant accountability."

Yesterday, a news about the suspected "plagiarism" of the review research triggered a heated discussion in academic circles at home and abroad:

The large model review research signed by hundreds of scholars was questioned for "plagiarism", and the Zhiyuan Research Institute officially issued an apology letter

Daphne Ippolito, a Ph.D. student at the University of Pennsylvania and a Google student researcher, tweeted that a 100-author review study by the Zhiyuan Institute, A Roadmap for Big Model, allegedly plagiarized the contents of several papers, including one of his team's studies, "Deduplicating Training Data Makes Language Models Better." The latter has previously been taken over by ACL 2022.

The incident quickly fermented and attracted widespread attention and discussion in the community.

In response to the doubts, on April 13, the Beijing Zhiyuan Artificial Intelligence Research Institute issued a "Letter of Apology on the Problems of the "A Roadmap for Big Model" Review Report", and said: "In this case, the Institute immediately organized an internal investigation, confirmed that there were problems in some articles, and has launched an independent review of third-party experts and carried out relevant accountability."

The large model review research signed by hundreds of scholars was questioned for "plagiarism", and the Zhiyuan Research Institute officially issued an apology letter

The preliminary results of the internal investigation of Zhiyuan Research Institute are as follows:

1. The report is a review of the field of large models, hoping to cover all important literature in this field at home and abroad as much as possible, led by the Zhiyuan Research Institute, responsible for framework design and manuscript summary, and invited 100 researchers at home and abroad to write 16 independent special articles, each article invited a group of authors to write and sign individually, a total of 200 pages. Since the report was published, it has been continuously revised and refined based on feedback, and has been updated to the third edition on the arXiv website by April 2.

2. On April 13, we learned that Google researcher Nicholas Carlini pointed out on his personal blog that the report had plagiarized several paragraphs of their paper, along with other paragraphs and statements that had plagiarized other papers. We checked this item by item and re-examined and confirmed that the 179 words in section 3.1 of article 2, the 74 words in section 3.1 of article 8, the 55 words in section 2.3 of article 12, the 159 words in section 2 of article 14, and the 146 words in section 1 of article 16 should be copied and should be copied. We decided to remove the content from the report immediately, and the report revision will be submitted to arXiv for update today. Authors of all articles have been notified to conduct a comprehensive review of all content, and new versions will be released after rigorous review.

3. As the organizer of the report, Zhiyuan should strictly review all the contents of each article, and such a problem is hard to blame. We deeply blame ourselves for this, and especially thank our friends in academia and the media for helping us identify problems. We will deeply learn lessons, rectify the scientific research management and paper publication process, and hope that friends from all walks of life will supervise our work.

Details of the alleged plagiarism

Nicholas Carlini, one of the authors of the paper suspected of being plagiarized, said: "One of my co-authors was reading the Big Models paper and noticed that some of the texts seemed familiar, and after a quick look, we found that there was actually a bunch of text that was copied directly from our paper."

Currently, on the arXiv page of the "Big Model" paper, the administrator has marked two articles with a high degree of text coincidence.

The large model review research signed by hundreds of scholars was questioned for "plagiarism", and the Zhiyuan Research Institute officially issued an apology letter

In the blog, authors who claim to have been plagiarized also gave evidence: "Big Models" copied references and related work sections of Carlini's paper. As shown below, on the left is the text from the "Big Models" paper, and on the right is the corresponding text from the original paper. The text being "copied" is highlighted in green:

The large model review research signed by hundreds of scholars was questioned for "plagiarism", and the Zhiyuan Research Institute officially issued an apology letter
The large model review research signed by hundreds of scholars was questioned for "plagiarism", and the Zhiyuan Research Institute officially issued an apology letter
The large model review research signed by hundreds of scholars was questioned for "plagiarism", and the Zhiyuan Research Institute officially issued an apology letter
The large model review research signed by hundreds of scholars was questioned for "plagiarism", and the Zhiyuan Research Institute officially issued an apology letter

After the incident sparked a number of discussions, Nicholas Carlini himself said in an update to the blog:

The large model review research signed by hundreds of scholars was questioned for "plagiarism", and the Zhiyuan Research Institute officially issued an apology letter

This article received a lot more attention than I thought. (More people visit this page per hour than I did last week.) )...... When I'm not clear about what's going on behind the scenes, I want to avoid making judgments. Perhaps some junior authors have a good intention of copying the text with a single citation. Perhaps it's the pressure from above that makes some students feel like their only option is to turn in their manuscripts on time. For senior authors, they may have read the text and thought it looked very reasonable, just making some adjustments to the text without knowing where it came from.

I hope this article will draw attention to such things. For example, about 1% of published and received papers have a higher rate of data replication than this report. I should have given this background when I first blogged. So, again, please don't criticize this paper particularly harshly.

Finally, I would like to say that I believe that this matter is enough to sound the alarm bell for everyone, and the community should strictly maintain academic norms. As UC Berkeley Professor Ma Yi said on Weibo: "Places that strictly maintain academic norms will be respected by their peers." The domestic academic atmosphere is relatively impetuous, and the awareness of academic norms is weak, and I hope that other units can take this as an example to jointly improve our academic environment."

The large model review research signed by hundreds of scholars was questioned for "plagiarism", and the Zhiyuan Research Institute officially issued an apology letter

Reference link: https://nicholas.carlini.com/writing/2022/a-case-of-plagarism-in-machine-learning.html

Read on