laitimes

PNAS: Hundreds of millions of papers are becoming a bulwark to scientific innovation

The scientific boom is marked by hundreds of millions of papers, which are hindering the emergence of innovative ideas. Johan S. G. Chu from Northwestern University's Kellogg School of Business and James A. Evans of the University of Chicago's Department of Sociology studied 1.8 billion citations from 90 million papers in 241 disciplines and found that large-scale publications did not lead to a change of central ideas in a field, but rather to the rigidity of the paradigm of classical thought. The study points out that current scientific innovation requires disruptive academic innovation and a focus on new ideas.

Written by | Guo Ruidong

Review | Yue Ran, Liu Peiyuan

PNAS: Hundreds of millions of papers are becoming a bulwark to scientific innovation

Thesis Title:

Slowed canonical progress in large fields of science

Address of thesis:

https://www.pnas.org/content/118/41/e2021636118

1. The Matthew effect in the scientific community

In most research areas, the number of papers published each year increases significantly over time. Many of the incentives of the scientific community, such as increasing the number of scientists and research funding, have their final output measured by the number of papers. The number of papers published determines the career trajectory of scholars and the evaluation of academic institutions and national research capabilities.

However, can the increase in the number of scientists and papers translate into an expansion of cognitive boundaries? If so, what is the process? The previous view was that the scientific research process can be described by the sand pile model, and even if not every paper can rewrite the textbook, the new paper can be seen as adding a grain of sand to the sand pile, increasing the possibility of an avalanche. After the avalanche, a new paradigm of science emerged, just as human cognition escalated from Newtonian mechanics to relativity.

Under this assumption, publishing more papers within a specified period of time becomes the most reliable path to tenure and promotion. Citations have become the most central measure of the importance of individuals, teams, and journals in a field, and the more they are cited, the better.

However, the assumptions on which the above criteria are based have been proved wrong. Let's look at a representative example, when about 10,000 papers are published in the field of electrical and electronic engineering each year, the most cited 0.1% of papers, the total number of citations accounts for 1.5%, and the most cited 1% of papers account for 8.6%. And when the field grew to 50,000 papers published each year, the top 0.1 percent of papers received 3.5 percent citations and the top 1 percent received 11.9 percent citations. By the time the field expanded to 100,000 papers published each year, the top 0.1 percent of papers received 5.7 percent citations in the field and the top 1 percent received 16.7 percent citations. In contrast, the bottom 50 percent of least cited papers declined as the field expanded, from 43.7 percent of 10,000 papers per year to just over 20 percent of 100,000 papers per year.

PNAS: Hundreds of millions of papers are becoming a bulwark to scientific innovation

Figure 1: The number of papers published in different fields (horizontal axis) and the decay coefficient of the number of paper citations (vertical axis), different colors represent different types of citations, the higher the value of the vertical axis, indicating that the weaker the trend of the yearly decline in citations obtained by this study, for example, papers other than the top 1% of the most cited papers have an average annual reduction of about 17% of citations, while those papers below the top 5% have a tendency to decrease by a quarter year by year.

This suggests that the proliferation of new papers may deprive readers of the cognitive slack needed to understand new ideas. Just as the brain must be emptied before new knowledge can be learned; researchers must have free time to pay attention to non-mainstream research. When the number of papers published each year is very large, the rapid emergence of new papers may force the academic community to focus on those that are already widely cited, thereby limiting attention to those that are lesser known — even if these low-profile papers prove to be novel and transformative ideas in hindsight.

Back to the sand pile model mentioned earlier. When the sand falls too fast, the small avalanches in the vicinity interfere with each other, causing no single grain of sand to trigger the movement of the entire sand pile. That is, the faster each new grain of sand falls, the smaller the area that can be affected. Corresponding to the scientific research community, if the publication speed of the paper is too fast, no new paper can accumulate influence through the local process of diffusion and preferential connection, thereby changing the overall paradigm of the scientific research community.

When the number of papers published in a field increases each year, citations disproportionately flow to papers that have already been cited a lot, like the Matthew effect, where the rich get richer and the poor get poorer. The large number of newly published papers did not accelerate the paradigm shift of the field, but rather consolidated the academic position of the most cited papers. This means that scientific progress may slow down and be trapped by existing research paradigms. With the continued growth of papers published each year in most fields, and the conservatization of scientific research will be the general trend, this paper will discuss how to take policy measures to reorganize the scientific production value chain after discussing the analysis of data, so that the public attention is focused on promising and novel ideas.

2. Too many papers make the scientific community more conservative

The Gini coefficient is used in economics to evaluate income equality, and the higher the income inequality, the more severe the income inequality. It is used to evaluate the inequality in the number of new citations that papers receive each year. Figure 2A illustrates that the higher the number of citations in the scientific community, the more unequal the process of obtaining citations, when there are more papers published in a field, specific papers, especially those with high citations, get relatively more new citations; while B points out that when more papers are published in a field, the ranking of citations in the field becomes more stable (and thus more relevant), and the publication of each new paper disproportionately increases the citations of the most cited papers.

PNAS: Hundreds of millions of papers are becoming a bulwark to scientific innovation

Figure 2: Each point in the figure represents a paper, the left figure refers to the total number of papers in a logarithmic processing field for that year (horizontal axis), and the scatter plot between the Gini coefficient (vertical axis) of the proportion of the number of citations obtained by a particular paper each year; the right figure shows the number of newly obtained citations between a specific paper year and the total number of citations in the field in the current year, and the Spielman correlation coefficient (vertical axis) between the total number of citations in the field in the current year, and the different colored lines represent the fitted curves of the ten disciplines with the largest number of papers.

If, according to the previous understanding, the paper is cited, it is a process of gradually accumulating influence, so that people in the academic circle can concentrate their attention, and the number of new citations per year is relatively equal. If a study is cited heavily in a given year, it is due to the recognition of its innovative value, and then the newer study replaces the study, so that its citation growth will be unequal from year to year, regardless of how many papers were published in the field in total that year. No matter how disruptive a paper is, it is unlikely that the number of papers published in the field this year will increase significantly, and if so, it will be a special case rather than a trend.

But the facts overturn the traditional concept of the phenomenon of papers being cited, and a more reasonable description is that those who do not cite existing classical high-citation papers are difficult to obtain new citations and become classics. When many papers were published in a short period of time, scholars were forced to adopt heuristic methods to understand the field. Readers who are overburdened with cognition deal only with new studies that fit the existing paradigm, rather than evaluating the value of a particular study based on its own merits. A novel idea that does not conform to an existing model will be less likely to be published, read, or cited.

Further supporting the above explanation is the fact that the more papers published in a field, the more difficult it is for new papers to become the most cited and widely known classics in that field. The probability of a newly published paper becoming the part with the highest citation in the thousandth of the time it takes decreases with the total number of papers published in the field. As shown in Figure 3:

PNAS: Hundreds of millions of papers are becoming a bulwark to scientific innovation

Figure 3: Scatter plot of the number of papers published in the field (horizontal axis) and the probability that the published paper will become the highest citation (a vertical axis) and the desired year (b vertical axis).

When a field is small, papers slowly rise to the top 0.1% of the most frequently cited over time, corresponding to the process of slowly gathering attention from the scientific community. In contrast, papers that belong to the mainstream research paradigm in areas with high dissertation volumes quickly peak, at odds with the cumulative process by which scholars discover new works by reading references cited in other people's studies.

PNAS: Hundreds of millions of papers are becoming a bulwark to scientific innovation

Figure 4: Figure a red/blue corresponds to the proportion of inheritance/subversion (vertical axis) and the scatter plot of the total number of papers published in the field in that year (horizontal axis). Figure b is a scatter plot of papers published each year with a subversive index that can reach the highest probability (vertical axis) of the maximum on average (vertical axis) and the total number of papers published in the field that year (horizontal axis).

According to a 2019 paper by Lingfei Wu, Dashun Wang, James Evans, and others[1], the subversive index can be calculated for each paper. Figure 4 points out that when 1,000 papers are published each year, the proportion of subversive papers (D>0) is 49%. When 10,000 papers were published, the predicted proportion of disruptive papers dropped to 27 percent, and 100,000 papers dropped to 13 percent. When 10,000 papers were published each year, the proportion of papers being the most disruptive 5% fell from 8.8% at 1,000 to 3.6%, while when 100,000 papers were published each year, the proportion was only 0.6%.

3. Summarize and prospect possible ways to improve

Recent evidence suggests [2] that more research effort and funding are now needed to generate similar scientific gains – productivity is falling sharply. Are we missing out on potential new research paradigms by getting caught up in overly involuted research fields? For these questions, the findings of the study can give a partial answer, which can be summarized as the following six points:

In contrast to a field that publishes very few papers per year, when the field publishes many new papers each year:

1) New citations are more likely to cite the most cited papers than to the less cited papers;

2) The list of the most cited papers changes very little every year – classic rigidity;

3) The likelihood that a new paper will eventually become a classic will decline;

4) New papers that do enter the most cited papers will not enter the ranks through a gradual cumulative process of dissemination;

5) The proportion of new papers developing existing scientific ideas will increase, and the proportion of existing ideas will decrease;

6) The likelihood of a new paper becoming a highly destructive paper decreases.

These findings are disturbing for the current direction of scientific development. If too many papers are published in a short period of time, the new idea cannot be carefully compared with the old idea, and the process of accumulating advantages cannot select valuable innovations. Ironically, the "more is better" and "quantitative assessment" nature of science today can hinder revolutionary change in mature fields. The proliferation of journals, the popularity of paper preprints and open reading online, leading to a blurring of the journal hierarchy may exacerbate the problem.

It should be noted that because well-known scholars pass on their perception of the world to students through field-centered reading lists, syllabuses, and course sequences, field boundaries are naturally reinforced through the career model of promotion and reward. This means that the conclusions of the study should not be over-generalized. For example, even if the most cited articles in a field remain the same, progress may remain. Just as although the most cited article in Molecular Biology was published in 1976 and has been the most cited article every year since 1982, it is difficult to say that the field has stagnated.

Reducing the number of papers, the number of publications, closing journals, closing research institutions, reducing the number of scientists, these are all unfeasible measures. Limiting the number of articles without changing other incentives may hinder the publication of novel, important new ideas, favoring low-risk papers that fall under the existing research paradigm.

Some changes in how scholarship is conducted, disseminated, consumed, and rewarded may help. A clearer journal hierarchy, and the most prestigious and popular publishing houses, focusing on less mainstream research can foster disruptive academic research and focus attention on novel ideas. Changing reward and promotion systems, avoiding quantity-based metrics, and valuing fewer, deeper, and more novel contributions could reduce the number of papers competing for attention in a field while encouraging more innovative work that is less in line with existing research paradigms.

A widely adopted measure of novelty, relative to the traditional h-index, can be used to measure the academic level of researchers. The new indicators will prompt future researchers to better understand the troubling novel ideas that are less rooted in established norms. For example, the Epsilon index proposed by Stefani of the Santa Fe Institute, in which the Greek letter ε symbolizes residuals in statistics.[3]

This new indicator takes into account many differences in the field of study to provide a fairer comparison. As an off-the-shelf app, it's free to use – just by entering some data for a sample of researchers from open source databases like Google Scholar, you can get results. This makes it possible to make fairer comparisons between researchers at any stage of their careers across disciplines of the same size, including interdisciplinary research, to spur more disruptive innovation.

bibliography

[1] Wu, Lingfei, Dashun Wang, and James A. Evans. "Large teams develop and small teams disrupt science and technology." Nature 566.7744 (2019): 378-382.

[2] Bloom, Nicholas, Charles I. Jones, John Van Reenen, and Michael Webb. 2020. "Are Ideas Getting Harder to Find?" American Economic Review, 110 (4): 1104-44.

[3] Bradshaw C J A, Chalker J M, Crabtree S A, et al. A fairer way to compare researchers at any career stage and in any discipline using open-access citation data[J]. Plos one, 2021, 16(9): e0257141.

This article is reproduced with permission from the WeChat public account "Jizhi Club".

Special mention

1. Enter the "Boutique Column" at the bottom menu of the "Return to Simplicity" WeChat public account to view the series of popular science articles on different topics.

2. "Return to Park" provides the function of retrieving articles on a monthly basis. Follow the official account, reply to the four-digit year + month, such as "1903", you can get the index of articles in March 2019, and so on.

Read on