laitimes

Contributors are the best "reviewers" of ai top clubs! Chinese scholars have proposed a new mechanism for peer review

Contributors are the best "reviewers" of ai top clubs! Chinese scholars have proposed a new mechanism for peer review

Reporting by XinZhiyuan

Author: Dong Wanping Wen Gang

EDIT: Sleepy

In recent years, the number of papers in the machine learning summit has exploded, the pressure of review is huge, and its peer review system has been questioned. In response to this challenge, Professor Penn proposed a new peer review mechanism assisted by the authors of papers.

Have you had enough of reviewing opinions from Conferences such as NeurIPS, ICLR, ICML, etc.?

Have you ever had the best paper rejected, but a relatively poor paper was accepted?

I believe that for many practitioners in the field of machine learning and artificial intelligence, this phenomenon has become strange.

Contributors are the best "reviewers" of ai top clubs! Chinese scholars have proposed a new mechanism for peer review

Artificial intelligence expert Ian Goodfellow complains about peer review on Twitter

The success of machine learning relies on large conferences, and the field is evolving very rapidly. The journal review cycle is relatively long, so most of the latest work is first presented at the conference, such as NeurIPS, ICLR, ICML, etc., which plays a very important role in the development and growth of machine learning.

Generally speaking, academic conferences invite experts in a certain field to review the manuscript – that is, through a peer review system – to decide whether the paper is worth publishing. It can be said that the success of the summit is also largely due to the peer review system.

Conversely, if research is published without reliable peer review, it may cause many problems: most people, that is, non-experts, cannot distinguish between good and bad research results; it will also cause confusion to the research, and future generations may cite wrong results and conclusions, which will undoubtedly hinder the progress of research in the field of machine learning.

As a result, as the number of researchers and papers multiplies, peer review reliability becomes even more important today. The analysis of the reliability of this system and related improvement methods have gradually become a hot topic and have attracted the attention of academia and industry.

How can we improve the mechanism of peer review and improve the reliability of the review process?

Recently, Professor Weijie Su of the Wharton School of Business and the Department of Computer Science at the University of Pennsylvania published an article in this year's NeurIPS that provides new ideas for improving peer review, proposing a simple and practical approach that combines the ideas of statistics and optimization.

The study argues that since it is unrealistic to increase the number of reviewers or assign more papers to each reviewer, we can ask contributors to provide information to assist us in decision-making, "making the best use of our talents." However, it is also necessary to ensure that contributors do not provide false information for their own benefit. So, how should this mechanism be designed?

Contributors are the best "reviewers" of ai top clubs! Chinese scholars have proposed a new mechanism for peer review

Address of the thesis: https://arxiv.org/abs/2110.14802

Professor Su Weijie proposed a new mechanism: the Isotonic Mechanism, which theoretically ensures that the mechanism can not only motivate contributors to provide real information, but also increase the reliability of review results.

background

As the saying goes, "success is also xiao he, failure is also xiao he", peer review should be the filter of high-quality, high-impact research. However, with the fires of artificial intelligence and machine learning conferences in recent years and the surge in the number of submissions, the peer review system seems to have changed a little.

For example, in the famous NeuroIPS experiment in 2014, it was observed that the review scores were surprisingly highly arbitrary. Theoretically, if the manuscript is re-reviewed, more than half of the articles accepted by NeurIPS in 2014 will be rejected!

Contributors are the best "reviewers" of ai top clubs! Chinese scholars have proposed a new mechanism for peer review

Results of the review opinions of the second group committee on the first group committee in the 2014 NeurIPS experiment,

Of the randomly selected papers accepted by the first group, 50.9% were rejected by the second group

The reason is that the large number of submissions has led to a shortage of professional reviewers, and the conference has to let many novices who have not published papers to review. And the increase in the number of reviews per capita has greatly reduced the time that reviewers spend on each paper, and a paper is often sentenced to death in a few minutes!

The number of submissions to a machine learning summit like NeurIPS has grown from 1673 in 2014 to 9122 this year, and many researchers have submitted 10 or more individually, but the number of qualified reviewers simply cannot grow so fast.

This has led to a plummeting quality of peer review, which has been criticized for failing to live up to its original intent. If the peer review system is not reformed, in the long run, it is bound to weaken public confidence in machine learning and hinder the development of artificial intelligence.

Contributors are the best "reviewers" of ai top clubs! Chinese scholars have proposed a new mechanism for peer review

An explosion of ML/AI conference papers

Poster address: http://www-stat.wharton.upenn.edu/~suw/paper/iso_poster.pdf

Of course, the academic community has long noted the relevant shortcomings of the peer review system, and has proposed some improvements: from voluntary review to hired review, or more public review (such as ICLR review on OpenReview), and so on. However, they are either impractical or they create new problems.

Peer review remains the "worst system available".

An introduction to the Isotonic Mechanism

Suppose the contributor submits n papers with a true score of R1, R2, ..., Rn, and suppose the contributor knows the ordering of these true scores (mathematically expressed as a permutation of 1, 2, ..., n).

Then, the mechanism requires contributors to report their ranking of the n papers π, combined with the original average scores given by the reviewers y1, y2, ..., yn, solve a convex problem and give the final score.

Formally, this convex optimization problem is:

Contributors are the best "reviewers" of ai top clubs! Chinese scholars have proposed a new mechanism for peer review

In addition, the mechanism assumes that contributors are rational. That is, the ultimate goal of contributors to report ranking π is to maximize their own interests. Mathematically, the final score that the contributor wants the mechanism to derive can be maximized by the utility function as follows:

Contributors are the best "reviewers" of ai top clubs! Chinese scholars have proposed a new mechanism for peer review

Here it is assumed that U is an unconvex function.

Theoretical guarantee of the order preservation mechanism

Let's put the introduction of the hypothesis and the discussion of rationality a little later in order to highlight the main result of the article, which is the theoretical superiority of the order preservation mechanism over the original score:

Contributors are the best "reviewers" of ai top clubs! Chinese scholars have proposed a new mechanism for peer review

1. The best strategy for a contributor is to truthfully report the true ranking of his/her paper's original scores; even if the contributor cannot be completely sure of all the true score rankings, all the true information known to the report is also the optimal choice.

Contributors are the best "reviewers" of ai top clubs! Chinese scholars have proposed a new mechanism for peer review

2. The adjusted final score provided by the mechanism is indeed strictly more accurate than the original score provided by the reviewer.

Just the ranking of the report score will improve the accuracy, and its practicality is self-evident. Not only that, the author of the article also further popularized the more general situation, the article only knows the real score of the chunk order, the robustness of the mechanism (robustness), the utility function can not be expressed as the sum of the respective utility of n papers, fully demonstrates the strong correction function of the order preservation mechanism, as well as rich practical significance.

At this point, let's go back to the hypothesis. In addition to the requirement for function U, the contributor himself must have a certain understanding of the real information (so as to assist), and the noise of the reviewer's score relative to the true score is constantly distributed under the displacement (commutative). These assumptions are also more practical.

Special attention needs to be paid to the assumption that the function U is convex, which is crucial to the conclusion of the above. This seems to contradict the diminishing marginal benefits of traditional economic theory. But here utility measures not the size of the "quantity," but the score that determines whether the paper will be used as a poster, an oral presentation, or even a plenary report. For many researchers, the pursuit of greater influence in conference papers reflects their real needs, so the convexity of the utility function has its own legitimacy.

Background to the proposed mechanism

The Isotonic Mechanism is named because the corresponding convex problem is the form of the problem of "Isotonic Regression" in statistics.

Preserved order regression is conceptually about finding a set of non-decreasing fragmental continuous linear functions, i.e., order preservation functions, so that they are as close as possible to the sample.

Contributors are the best "reviewers" of ai top clubs! Chinese scholars have proposed a new mechanism for peer review

Orderly Regression: https://en.wikipedia.org/wiki/Isotonic_regression

On the other hand, shortly after this article was written and submitted to NeurlPS, the organizers of the conference asked all authors to do a quality ranking of their articles, which coincided with the sequencing mechanism. The article was very "timely", although in the end it was not used to make decisions on this year's NeurIPS.

Contributors are the best "reviewers" of ai top clubs! Chinese scholars have proposed a new mechanism for peer review

Article with a coincidence of the NeuriPPS 2021 conference

It is worth noting that professor Su Weijie, who proposed the mechanism, almost half of his articles were published in journals in the fields of statistics, optimization and information theory, and a large number of papers were published at the top of the machine learning conference, and they deeply understood the difference between the quality of the review and the quality of the accepted articles.

In general, the average quality of articles accepted by machine learning tops with a huge number of submissions is much worse than that of journals. At the same time, a feature of machine learning is that an author or research group often submits multiple papers at a time, such as reinforcement learning researcher Sergey Levine submitting 32 papers at a time to ICLR 2020! The theory of the order preservation mechanism also shows that the larger the number of articles n, the greater the improvement.

It is worth noting that the theoretical proof of the order preservation mechanism uses many mathematical techniques such as convex functions and superior super-inequalities, and students who are familiar with mathematical competitions will not be unfamiliar.

Summary and outlook

This paper proposes a "sequencing mechanism" to improve the peer review system by using the information provided by contributors, that is, to incentivize contributors to report true ranking, so as to obtain better decisions.

This mechanism is easy to implement and has a theoretical optimality guarantee, and if it can be used in reality, it is very promising to alleviate the low-quality review problem of the current machine learning summit to some extent.

However, using additional author information to improve peer review is a new type of research direction that requires some effort before it can be put into practical use. For the sequencing mechanism, there are still some tasks to be done in the future:

Although the convex utility function is somewhat in line with the preference of the researcher, for some researchers who pursue the number of mid-drafts, the utility function may be some special non-convex function (such as a stepped function). How can I apply my skills to this kind of problem?

There is already some initial results in improving peer review, how can we incorporate them?

The accuracy of the sequencing mechanism is measured using L2 error. Is there a more realistic error function?

How can contributors respond to the strategic use of sequencing mechanisms, such as deliberately submitting low-quality papers to raise scores in disguise?

In the case of interdisciplinary review and multiple reviewers and multiple authors, how to ensure the commutability of noise, and how to modify the order preservation mechanism?

Is there a side benefit to the sequencing mechanism requiring a ranking of the quality of papers? For example, requiring authors to have a clearer understanding of the quality of their own papers may reduce the common "guest authorship" of conference papers.

In any case, the mechanism is aimed at a major question about the future of the field of machine learning. If this problem can be solved, it will have a huge impact, and it can even apply this rating system to various evaluation links, which is of great practical significance.

About the Author

Weijie Su is an assistant professor in the Department of Statistics and Data Science at the Wharton School of the University of Pennsylvania and the Department of Computer Science in the School of Engineering. Co-Director of Penn's Machine Learning Research Center. He received his undergraduate and Ph.D. degrees from Peking University and Stanford University, respectively. He has received the NSF CAREER Award and the Sloan Research Award.

Resources:

https://arxiv.org/pdf/2110.14802.pdf

https://www.toutiao.com/i7039916197835506209/?timestamp=1639147753&app=news_article&group_id=7039916197835506209&use_new_style=1&req_id=202112102249130101310380762754C599&wid=1639647590857

https://arxiv.org/pdf/2109.09774.pdf

https://www.reddit.com/r/MachineLearning/comments/r24rp7/d_peer_review_is_still_broken_the_neurips_2021/

https://hub.baai.ac.cn/view/10481

https://zhuanlan.zhihu.com/p/90666675

https://cloud.tencent.com/developer/article/1172713

http://eprints.rclis.org/39332/

Read on