AI Report: Machine Translation Amplifies Social Bias, Global Ethics-Related Publications Increase Fivefold

2022-03-31 11:26:28

Recently, Stanford University released the Artificial Intelligence Index Report 2022. This is the fifth consecutive year that Stanford University has released a report in the field of artificial intelligence, and this year's report, China ranked first in the number of publications and citations of relevant journals.

The report also points out that as the capabilities of artificial intelligence systems, such as natural language processing, image recognition and other technologies are rapidly increasing, their existence and other biases and harmfulness are also increasing, which promotes the construction of ethical and legal fields in various countries, and 25 countries have passed 55 intelligence-related bills in five years.

China publishes the most papers, and the number of Sino-US co-authors ranks first in the world

Behind the rapid development of artificial intelligence, research and development strength is indispensable. From 2010 to 2021, the total number of AI publications doubled, from 160,000 to 330,000, including journal articles, conference papers, and academic papers.

AI Report: Machine Translation Amplifies Social Bias, Global Ethics-Related Publications Increase Fivefold

Among them, the publications of pattern recognition and machine learning have grown rapidly, and the total number has doubled since 2015. In contrast, other areas more affected by deep learning have seen smaller increases, including computer vision, data mining, and natural language processing.

Regionally, East Asia and the Pacific led the way with 42.9% of journal publications in 2021, followed by Europe and Central Asia (22.7%) and North America (15.6%). In addition, the increase was most pronounced in South Asia, the Middle East, and North Africa, where the number of AI journal publications has increased by about 12-fold and 7-fold, respectively, over the past 12 years.

China, on the other hand, maintains its leading position in the number of papers. Since 2010, China has occupied the first place in the number of papers for many years. Last year, China continued to lead the world in the number of publication contributions from AI journals, conferences and knowledge bases — all three publication types combined were 63.2 percent higher than the United States. At the same time, the citation rate of journal papers also ranks first in the world with 27.84%.

Notably, despite heightened geopolitical tensions, the 11 years from 2010 to 2021 saw the largest number of multinational AI publications cooperating between China and the United States, and a fivefold increase since 2010. The number of publications on U.S.-China cooperation is 2.7 times the number of publications on UK-China cooperation, which ranks second.

Large language models are more likely to reflect bias, and ethical regulation needs to keep up

Natural Language Processing (NLP) didn't perform very well in this year's report, but the report highlights bias issues in its training data.

The data shows that large language models are more likely to reflect bias from the training data. Natural language models with 280 billion parameters developed in 2021 increased toxicity by 29% compared to 117 million parameters developed in 2018. This phenomenon is very common on multiple public corpus websites, and the harmfulness of language models comes in large part from the unfiltered underlying training data.

Machine translation systems have also been shown to reflect and amplify social biases in their datasets. Stanford University used WinomT's benchmark data to measure bias in machine translation by comparing the original text and translated gender pronouns, such as whether "she" would be translated to "he" or "he" to "she" when translated into other languages.

The results of the data proved that in most of the test languages, the accuracy of male gender translation was slightly higher than that of female gender.

In addition to this, biases in multimodal model learning have also caught the attention of researchers. In recent years, rapid progress has been made in multimodal language visual models, setting new records for tasks such as image classification, creating images from text, etc., while also reflecting social stereotypes and biases, with the report noting that images of black people are misclassified as non-human at more than twice the rate of any other race.

Thankfully, research on ai-fictency transparency and fairness has also exploded since 2014. Publications related to ai ethics have more than fivefold increased. Publications published by industry researchers at the AI Ethics Conference in recent years increased 71% year-over-year. In addition, automated fact-check datasets have been growing year on year since 2010, with 25 new datasets added in 2021, including 12 non-English datasets.

In addition to ethics, various countries are also increasing their strength in the legal governance of artificial intelligence. From 2016 to 2021, a total of 25 countries passed 55 smart-related bills, with the United States taking the top spot. Since 2017, the United States has passed three new bills every year, and 13 bills have been issued so far. The United States was followed by Russia, Lees, Spanish and Britain. In terms of the number of laws enacted in 2021, Spain, the United Kingdom and the United States led the way, with three issued each.

This does not mean that other countries have no intention of regulating AI. The report mentions that in the legislative processes of the 25 countries counted in 2021, a total of 1323 references to artificial intelligence have been made. Spain, the United Kingdom, the United States, Australia and Japan are among the top leaders.

AI Report: Machine Translation Amplifies Social Bias, Global Ethics-Related Publications Increase Fivefold

Read on