laitimes

Using big data to study Tang poetry and Song poetry, the conclusion is very subversive

Using big data analysis of Tang and Song poetry, the conclusion may be beyond your imagination - Bai Juyi, who ranked first in the number of works among the Poets of the Tang Dynasty, ranked outside the top ten in influence; the most sought after lyricist was not Su Shi and Xin Qiyi, but Zhou Bangyan; the comprehensive impact index showed that Du Fu was stronger than Li Bai and Xin Abandoned Disease than Su Shi...

Using big data to study Tang poetry and Song poetry, the conclusion is very subversive

Du Fu has the highest comprehensive impact index

The above new findings are the result of the analysis of Wang Zhaopeng, chief expert of the Major Project of the National Social Science Foundation of China "Construction of The Geological Information Platform of the Tang and Song Dynasties Literature Chronicles" and chair professor of the School of Literature and Journalism of Sichuan University.

Tang poetry is the first peak in the history of Chinese poetry. There are more than 50,000 poems in the Tang Dynasty, more than 3,000 poets, and poets and poems have reached an unprecedented magnitude. There are nearly 1,500 Song dynasty poets, and more than 21,000 words are composed.

From the perspective of individual poets, who has the most works in Tang and Song poems? Wang Zhaopeng's big data shows that Bai Juyi ranked first in the number of Tang poetry works, with nearly three thousand poems; Du Fu and Li Bai followed closely behind, both exceeding the thousand mark. In the Song Dynasty, Xin Abandoned Disease ranked first in terms of word composition, with more than 600 words, followed by Su Shi and Liu Chenweng. The number of Song poems was dominated by Lu You, with more than 9,000 poems, followed by Liu Kezhuang and Yang Wanli.

Using big data to study Tang poetry and Song poetry, the conclusion is very subversive

The number of Song poems is dominated by Lu You

According to the ranking of the comprehensive impact index, the first influence of Tang Dynasty poets was Du Fu, followed by Li Bai and Wang Wei, and Bai Juyi, who ranked first in the number of works, ranked outside the tenth in influence. The song dynasty poets were the first in terms of volume and influence, and Su Shi and Zhou Bangyan ranked second and third respectively. At the top of the list of Influence of Song Poetry is Su Shi, followed by Lu You, whose works are at the top of the list.

When it comes to the famous masters of Tang poetry and Song Poetry, people are accustomed to calling "Li Du" and "Su Xin", as if Li is superior to Du and Su is better than Xin. However, the comprehensive impact index shows that Du Fu is stronger than Li Bai and Xin Shuyi than Su Shi. What is even more surprising is that the most sought after lyricist is not Su Xin but Zhou Bangyan. Among the 100 and 300 Song poems, Zhou Bangyan accounted for 15 and 40 songs each, and the share was much higher than that of Su and Xin.

Is it scientific and feasible to use objective data to measure and analyze rather subjective poetry appreciation? In an exclusive interview with a reporter from Beijing Youth Daily, Wang Zhaopeng stressed that although the data can describe the development and process of literary history to a certain extent, there are also obvious limitations.

Using big data to study Tang poetry and Song poetry, the conclusion is very subversive

In the Song Dynasty, the number of words composed by Xin Renyi ranked first

Research began 30 years ago and has accumulated millions of pieces of data

Q: What was the original intention of the topic of "The World of Tang and Song Poetry in Big Data"?

A: I started doing quantitative analysis of Tang and Song poetry in 1992. The original intention was that everyone had their own Famous Poems of the Tang and Song Dynasties in their minds. Which Tang and Song poems in history are regarded as famous pieces, I want to use statistical data to analyze and measure.

Q: So how do you use big data to measure the quality of Tang and Song poetry? How is this data calculated?

A: The quality of Tang and Song poetry works has not yet found effective data to evaluate and measure. I am currently trying to construct a system of evaluation indicators of the quality of literary works in order to collect data. This requires a relatively long process. In addition, the evaluation index system established by individuals requires the recognition and consensus of the academic community.

Using big data to study Tang poetry and Song poetry, the conclusion is very subversive

Q: Regarding the literary index system, what is the current research status of the academic community?

A: Literary data in the era of big data needs to be classified and hierarchical to establish an index system for literary history data to ensure the reliability and validity of data. However, at present, there are not many scholars who use big data to do research on Tang and Song poetry, and the big data of Tang and Song poetry shared by the academic community is also quite limited.

From 1992 to the present, although I have accumulated more than one million pieces of data related to Tang and Song poems, they are still incomplete and uneven. Some period data are more, some are less time period data; some have more data of this type and less data of that type; some have more data of poets, and some have less data of poets. We often lament that "when the book is used, there is less hate", and the data is even more so. When comprehensively analyzing Tang and Song poems, it is often felt that the data is not enough.

In my opinion, the literary evaluation index system should be established with the work as the center. The influence of a writer is premised on the influence of the work. The evaluation of works can be divided into two dimensions, one is the internal literary value of relatively stable works, and the other is the external influence of dynamically unpopular works. Its literary value can be considered from the two levels of content and form.

The influence of works is measured from three levels: creators, critics, and ordinary readers. The first is the influence on the creator, including citation, use, imitation, adaptation, translation, etc., reflecting the exemplary and attractiveness of the work; the second is the criticism of the critic and the study of scholars, reflecting the reputation and attention of the work at the level of literary criticism and academic research; the third is the circulation and awareness rate among ordinary readers. After determining the value of the work, the basic elements and structure of the impact, the calculation model is then constructed, and then the computer runs in the relevant resource library, corpus and network, mines and extracts the relevant data, and finally calculates the score of each work.

Using big data to study Tang poetry and Song poetry, the conclusion is very subversive

The data cannot measure the artistic content and aesthetic value

Q: I noticed that as you mentioned in your project, according to statistics, in the nearly six hundred years from the Eastern Han Dynasty to the end of the Sui Dynasty, there were only more than 5,000 poems in total, and in the Tang Dynasty, the poems exceeded 10,000 for the first time and directly crossed to more than 50,000, reaching an unprecedented peak. Tang poetry has increased by more than seven times compared with the previous eight generations of poetry, the number of poets has increased from more than 600 to more than 3,000, and the poets and poems have reached an unprecedented magnitude. Where does this data come from, and are there any important literature references?

A: The data comes from two papers by my old friend Professor Shang Yongliang: "Quantitative Analysis of the Distribution and Development Trend of Eight Generations of Poetry" and "Quantitative Analysis of the Hierarchical Distribution and Generational Group Development of Famous Poets of Tang Dynasty".

Q: Bai Juyi has the largest number of poems, but his influence is outside the top ten, how is this determined?

A: Weigh with data. We used a variety of data to rank the influence of Tang dynasty poets. Bai Juyi's influence in modern and contemporary times is greater than that in ancient times. His comprehensive influence is far inferior to that of Li Bai and Du Fu.

Q: So what is the basis for determining the quality of Tang poems and Song poems through big data, and is there a tree-like statistical chart to support it?

A: There is no treemap yet, in the process of trying. At present, the influence of Tang poetry and Song poetry can only be measured by big data - including the attractiveness of the creation of future generations of lyricists, the reputation of future generations of word critics, the popularity among ordinary authors, and so on. At present, it is not possible to use data to measure the artistic content and aesthetic value of Tang and Song poems.

Using big data to study Tang poetry and Song poetry, the conclusion is very subversive

The peak of Su Dongpo's creation was during the Huangzhou period

War was not the only factor driving the cultural center southward

Question: Did the use of big data to study the poetry of the Tang and Song Dynasties encounter some academic difficulties, and how did they overcome them?

A: Literary research has never had a sense of data, and the difficulty lies not only in where to find data, but also in what kind of data to find. Exactly what kind of data is useful and effective needs both theoretical support and testing in practice. Theoretically, we are constantly seeking to find theoretical and methodological enlightenment from statistics, quantitative informatics, and quantitative history; in practice, trial and error have failed. The most painful thing is that the database is built, the article is also written, and suddenly found that the data source is incomplete, so I had to make up the data from scratch, and the written paper was torn down and restarted.

Q: What other new findings have you made in your big data specific research?

A: The significance of data can not only confirm traditional conclusions, but also revise traditional conclusions, and can also discover new problems and change traditional cognition. For example, there is a famous conclusion in The Chinese cultural geography that the Chinese cultural center is gradually moving from the northern Central Plains to the south, the first southward shift is the Yongjia Rebellion of the Eastern Jin Dynasty, the second southward migration is the Tang Dynasty Anshi Rebellion, and the third southward migration is the Song Dynasty Jingkang Rebellion. Three wars and chaos pushed the cultural center south, and after the Jingkang Rebellion, the cultural center was completely moved to the south. Our big data found that the literary center completely moved to the south at the beginning of the Northern Song Dynasty, and the number of authors in the south surpassed that of the north, and there was no need to wait until after the Jingkang Rebellion. Moreover, war is not the only factor driving the cultural center south.

We also found that the literary center of the Song Dynasty was gradually moving towards the southeast coast. According to today's prefecture-level administrative divisions, nanping in Fujian in the Song Dynasty has the largest number of authors, ranking first, and Fuzhou ranks second, which is very surprising. Related to this, the number of Jinshi in the Song Dynasty was the first in Fuzhou and the second in Nanping. It can be seen that at that time, Nanping and Fuzhou had developed education, and there were many scholars and many poets. Education and literature are highly positive.

In addition, we also found that the peak of Su Dongpo's creation was in Huangzhou, one-third of his words were written during the period of disparaging Huangzhou, and half of his famous works were written in Huangzhou. For example, the first part of the Song Ci, "Nian Nu Jiao Chibi Huaigu", was written in Huangzhou. Huangzhou has achieved the glory of Su Shi's works.

Using big data to study Tang poetry and Song poetry, the conclusion is very subversive

Text/Beijing Youth Daily reporter Zhang Enjie

Editor/Ying Qiao

Read on