laitimes

A magical Peking University major: she is the only one in the same level who persevered from the beginning of school to graduation

author:China Science Daily

Text | Sun Tao, Chen Yuhan

"I have been chanting poetry and literature, surviving the high number line generation, tasting the subset of classics and history, studying the structure of algorithms, gnawing on syntax and grammar, and training neural networks. ”

This is Lin Zi's sketch of his undergraduate career. Her major is too small, and she is the least present of the four majors in the Department of Chinese of Peking University, Applied Linguistics (Chinese Information Processing). There are only 4 or 5 graduates every year, some students transfer out because they can't stick to it, and some students have a soft spot for Chinese majors and transfer in, and Lin Zi is the only one in the class of 2015 who reads it from beginning to end.

In this liberal arts and sciences major, she will study ancient Chinese, theoretical linguistics, and modern Chinese grammar, as well as advanced mathematics, programming, and data structures and algorithms. Fragmentation has become the key word of her university.

A magical Peking University major: she is the only one in the same level who persevered from the beginning of school to graduation

Woods. Source: Courtesy of the interviewee

Lin Zi carved out a path of his own.

In her sophomore year, she interned in the laboratory of the School of Information Science and Technology (hereinafter referred to as Xinke), and in her junior year, she published two AI top papers and one paper, and after graduation, she went to Google headquarters in Silicon Valley and worked full-time in the natural language processing (NLP) research department for two years, and her work was also included in Kevin, a research scientist at Google In 2021, she returned to academia to pursue her Ph.D. in computer science at the University of California, San Diego, where her papers have been cited more than 2,000 times over the past three years.

What kind of amazing profession is this?

The beginning of enlightenment

Although she was a science student in high school, Lin Zi liked literature very much, and she also won the first prize in the New Concept Essay Competition, which won the quota of independent enrollment of the Department of Chinese of Peking University, and passed the written test and interview, and finally obtained a reduced score admission.

In fact, her parents hope that she will study economics and engage in finance in the future, at least with a better job. But Lin Zi felt that he still had to do what he wanted, so he firmly plunged into the Chinese department. It's just that I didn't expect that because of my status as a science student, I was automatically assigned to the liberal arts and sciences interdisciplinary major of applied linguistics, a major jointly built by the Department of Chinese and the Department of Information Technology.

When he first heard that his major was applied linguistics, Lin Zi was a little blind, after all, he had never even heard of linguistics before, let alone the words applied linguistics and Chinese information processing.

Later, everyone realized that, as the English "Computational and Applied Linguistics" (Computational and Applied Linguistics) on the diploma indicates, computing is more essential, and it is a major at the intersection of computer science and linguistics.

The first year's schoolwork is hugely destructive to everyone. There are no specialized courses in the freshman year, but basic courses.

Lin Zi found that when taking linguistics courses, some students were not suitable for courses that required memorization and closed-book exams, especially in ancient Chinese classes full of traditional Chinese characters, which not only needed to understand but also traced back to those words, computational linguistics modules required them to learn programming languages and algorithms, and mathematics, which required them to learn calculus, linear algebra, and probability and statistics.

Some students couldn't accept this separation, so they applied to transfer to their original intention of major in literature or pure linguistics, while some students found that they loved mathematics and computer science more, so they also transferred away.

After colliding in the intersection of arts and sciences for a period of time, they are still in the fog of the sense of separation brought by this major. After inquiring again, there are all kinds of graduation destinations for senior brothers and sisters, some people go to Dachang, some people become professors, and some people go to primary and secondary school mathematics education, and there seems to be no clear guiding light. In this way, Lin Zi groped for a year like a blind man driving a night train.

Relying only on classes, she really couldn't integrate linguistics and computing. Fortunately, Lin Zi found the joy of scientific research since her sophomore year, which helped her appreciate the charm of this major to a large extent.

She finally saw that this major can be exposed to literature and train logical thinking, and it turns out that she can really have both.

"I'm going to calculate when I learn to learn"

According to Zhan Weidong, the head of this mini-major, professor of Chinese and deputy director of the Institute of Computational Linguistics, in 2005, the previous professional curriculum system of applied linguistics was "mainly based on literature and supplemented by science", and then transitioned to the track of "attaching equal importance to arts and sciences and integrating knowledge". However, most of these students "learn to be calculating".

Zhan Weidong is a witness and builder of this profession from scratch. He teaches two courses: "Introduction to Natural Language Processing" (now changed to "Natural Language Processing") and "Language Engineering and Chinese Information Processing". The latter is a compulsory course for applied linguistics majors, but it is only offered once every two years because there are too few students.

Lin Zi found that this major is suitable for science students who have a great interest in language and writing and intend to continue their training in science courses, or liberal arts students who are interested in computational linguistics and natural language processing and want to cultivate scientific thinking.

Lin Zi just learned to be biased towards calculation.

Since her sophomore year, she has been involved in some scientific research work of the Institute of Computational Linguistics, two of which have been published in two influential international conferences in the field of natural language processing: one is to explore the semantic role automatic annotation of Chinese interlanguage corpus, and the other is to propose a new method to combine the artificially constructed linguistic knowledge base about Chinese morphemes with the word vector representation in the deep neural network, so as to improve the effect of word vector representation in the task of computational of word meaning similarity.

To this day, Lin Zi feels that although he has a certain understanding of this major, it is not enough.

She said that the charm of interdisciplinarity lies in the fact that it is a juxtaposition, not an intersection, and it requires the knowledge of both aspects at the same time, but just like cooking, which spice to put a little more in it is actually up to you to decide.

"Thinking Better"

Speaking of cooking, Ai Qi, a Beijing girl from the class of 2009, is a professional, who studied Western catering after graduating with a degree in applied linguistics and now works in Chicago.

Gu Sen, the class of 2007, is the one who engaged in primary and secondary school mathematics education after graduating from undergraduate. He gave Zhan Weidong the impression that his knowledge reserve and way of thinking were "far beyond the average". Zhan Weidong once suggested that he go abroad for further study and develop in the field of artificial intelligence, but he also saw that Gu Sen was indeed good at the role of a teacher, and he once gave a speech in class, and he could explain a problem like an onion, which was "an innate habit of thinking".

Weiwei Sun is a 2002 and first class graduate of the Applied Linguistics major, and another university teacher in Lin Zi, where she teaches Introduction to Formal Grammar. She went to teach at the University of Cambridge in the UK in 2020.

A magical Peking University major: she is the only one in the same level who persevered from the beginning of school to graduation

In June 2006, the first batch of students majoring in applied linguistics in the Department of Chinese of Peking University graduated. The fourth from the left is Zhan Weidong, and the first from the right is Sun Weiwei. Source: Peking University

In addition to Weiwei Sun, Nanyun Peng, a 2005 undergraduate student, went to Johns Hopkins University to pursue a Ph.D. in computer science after graduating from the Institute of Computational Languages of Peking University, and is now working in the Department of Computer Science at the University of California, Los Angeles.

Lin Zi also found his own path.

When asked if she would have done better if she had entered the computer science department directly, she thought that because no one told her how to move forward, she had been constantly optimizing her choice, and if she had studied other majors, "it might not be so interesting", after all, the growth paths of students in other majors were relatively mature.

Lin Zi said that because the scale of applied linguistics is too small, there is no fixed paradigm for everyone's growth, so that you can "understand yourself better, explore what your needs are, and figure out why you ultimately choose to continue your PhD."

Lin Zi also participated in the scientific research competition of Peking University's undergraduate students of Information Technology, and she and two other students won the first prize. "I remember the teacher was quite surprised," she says. At this time, the students of Xinke knew about the existence of a Lin Zi from the Chinese Department.

The project that Lin Zi is working on today is related to natural language large models, which involves in-depth evaluation of the safety and interpretability of large models. But what she is more interested in is whether there is a similarity between the mechanism of large models and human language learning? This is a profound scientific question: children can learn to speak every day after listening to a small amount of language influence from their parents, which is very different from the training process of large language models.

The big model is coming

Compared with Lin Zi, Zhan Weidong already sees himself as a "front wave". When the era of large-scale models came, Zhan Weidong also had a new understanding of this profession.

Zhan Weidong entered Peking University in September 1993 to study for a master's degree and a doctorate. For more than 30 years, he has been at the intersection of language and computing.

In order to develop the machine translation system, Zhan Weidong taught himself the C++ language, and many problems in the program were solved by himself. His doctoral dissertation "Research on Modern Chinese Phrase Structure Rules for Chinese Information Processing" was based on the summary and refinement of his work in machine translation, and won the 2001 National 100 Excellent Doctoral Dissertation Award.

For 30 years, Zhan Weidong has been trying to disassemble languages, refine linguistic rules, and then assemble them back for application. As has always been reductionist, the process is transparent and clear.

However, the emergence of large models has shaken the belief of computational linguists, and the refinement of language knowledge is useless here in large models - only need to feed enough data, supplemented by powerful computing power, to get fresh language output.

Zhan Weidong said that experts in the field have often discussed this situation recently, but there is no clear way to deal with it. Peers still know too little about the large model, "everyone was in a state of confusion last year, and we actually lacked more in-depth and more communication internally."

Zhan Weidong's mentor, linguist Lu Jianming, was not so pessimistic. In an article last year, he said that artificial intelligence is driven by the "troika" of data, computing power, and algorithms, and if "language knowledge" can be added to data, it will definitely greatly promote the cause of artificial intelligence. Lu Jianming is also a major proponent of the creation of the Applied Linguistics program.

Zhan Weidong said that when linguists summarize this knowledge, it may be able to become more effective knowledge that computers can use.

However, what worries Zhan Weidong is that a group of old scholars are retiring one after another, and the vacancies left behind cannot be filled by new people.

On the one hand, good natural language model talent has been monopolized by industry, where there are higher salaries and better resources (computing power and data), and on the other hand, there are very few people who are familiar with both linguistics and large model technology.

At the end of 2023, at the invitation of Zhan Weidong, Lin Zi returned to his alma mater to share his professional learning experience with the younger students. These "back waves" are also pinned on Zhan Weidong's high hopes: what if they come back one day to serve their alma mater?

A magical Peking University major: she is the only one in the same level who persevered from the beginning of school to graduation

The picture shows a group photo after Lin Zi returned to Peking University for a lecture on December 27, 2023. In the middle of the front row is Lin Zi, and the second from the left in the back row is Zhan Weidong. Photo courtesy of the interviewee.