AI is transforming software engineering: An in-depth conversation between CSDN founder Jiang Tao and Peking University professor Xie Tao

Software quality coverage

Editor's Testimonial:

Mr. Xie Tao is also one of the members of the technical committee of the "AI+Software R&D Digital Summit (AiDD)", and has been committed to improving the level and practical ability of software engineering education in China.

The following article is from CSDN by He Miao

CSDN.

Achievement of 100 million technical people

The explosion of generative AI has begun to change the way many industries work, but for software engineering, the transformation has long begun. A top Chinese scholar tried to solve specific software engineering problems in an intelligent way in a classic paper 16 years ago, and in the following ten years, this early classic masterpiece in the field of intelligent software development led countless technicians to open up the territory in the field of software engineering.

Panelists:

Xie Tao is a chair professor at Peking University

Chair Professor of Peking University, Dean of the Department of Software Science and Engineering, School of Computer Science, Foreign Academician of the European Academy of Sciences, Fellow of the International Computer Society (ACM), Fellow of the Institute of Electrical and Electronics Engineers (IEEE), Fellow of the American Association for the Advancement of Science (AAAS), and Fellow of the Chinese Computer Federation (CCF). The main research areas include software engineering, system software, software security, trusted artificial intelligence.

Tao Jiang is the founder and chairman of CSDN

With 25 years of software development experience, he has led the development of Giant Handwriting Computer, Kingsoft Word Ba and Super Jieba .

Founded CSDN (China Software Developer Network) in 1999. CSDN is a professional Chinese IT technology community with more than 43 million registered members and 90% of the national IT development workforce.

In 2011, he founded Geekbang Venture Capital, and as a tech-savvy investor, he has invested in more than 100 high-tech startups such as Aggregated Data, Jusegi Database, Chuanzhi Podcast, LD Excellence (my name is MT), and IT Orange. SequoiaDB as the industry's leading financial-grade distributed relational database product, SequoiaDB has been selected into the Gartner database report for two consecutive years, Chuanzhi Podcast and Lemon Microfun have been listed on the SME Board and ChiNext in 2019 and 2017 respectively, and Tao Mobile Game has also been listed in the first batch of science and technology innovation board filing list by Guizhou Province.

Responsible Editor | He Miao

Listing | CSDN（ID：CSDNnews）

In 2021, a classic paper published in ASE 2007 led ASE to award the "Most Influential Paper" award to Peking University Chair Professor Xie Tao and then PhD student Suresh Thummalapenta. ASE, ICSE AND ESEC/FSE are listed as the three top international conferences in software engineering, and Xie Tao is one of the first Chinese authors among the winners of the Most Influential Paper Award in ASE over the years.

In this paper, Xie Tao and his students propose a way to use machine learning to improve the efficiency of software development, combining large-scale code search, machine learning, and data mining. This also made him one of the first scholars to carry out intelligent software engineering research.

After 14 years of studying and teaching overseas, Xie Tao found an opportunity to return to Peking University as a chair professor to continue high-level scientific research. As one of the earliest scholars to carry out intelligent software engineering research, his understanding of AI and its applications goes deep into the bottom, and for the AI rookie ChatGPT, he said: "ChatGPT is a big improvement in mode. It can keep talking, allowing users to express what they really want, and just the point of figuring out the needs can make the effect and usability of AI a big step forward. He believes that in the future, China will also be able to make its own ChatGPT.

In the past two or three years, in the field of intelligent software engineering, large models have received great attention. The stunning debut of Copilot allows everyone to see the great potential of large models in code generation, code review, code defect detection, etc. Coupled with ChatGPT's expert interactive dialogue, what new directions and opportunities does AI technology provide for the development of basic software? What is the outlook for the development of the domestic basic software industry? We will present you from the in-depth dialogue between Jiang Tao, founder and chairman of CSDN, and Xie Tao, chair professor of Peking University.

The video content is included in the "Science Mantra" series of "This is Science" launched by Toutiao Science Channel and the China Computer Federation (CCF).

Video Address:

Transcript of the conversation:

Jiang Tao: What is the current development status of artificial intelligence in automatic code generation?

Xie Tao: Automatic code generation is to automatically generate code that implements the functions required by users. Suppose the user needs an implementation of a quicksort algorithm (given the input is an integer array). After giving this requirement to the code automatic generation system in natural language, the system can automatically generate the code that implements the fast sorting algorithm. This is a relatively common form.

There are other forms of generation. Taking the sorting function as an example, what is given to the code automatically generated system is a formal requirements specification, which expresses that the given input is an arbitrary integer array, and the output is an array obtained after sorting. The system automatically generates code that implements the sorting function, and does not limit which sorting algorithm is used.

Nowadays, code completion (also called code continuation) is a common tool aid in software development in the industry. Similar to what everyone feels with the input method tool, type the first few words, and the later tool will automatically prompt the following. Code completion is one of the main application scenarios of code automatic generation that has been implemented in the industry. But in terms of use, there are some essential differences between the two. Code generation can be used when the user can't read the generated code or don't know what kind of code they want to write, while code completion usually requires the user to know what code they want to type, AI just makes it faster.

Jiang Tao: Code automatic generation has gone a step further on the basis of code completion based on artificial intelligence. With the release of Copilot by Microsoft's GitHub, major companies have also launched automatic code generation engines, and DeepMind has released the AI programming system Alpha Code, but it has not caused much repercussions, is it because the technology has certain limitations?

Xie Tao: There are several limitations here, which are also the focus of scientific research.

In order to make the automatic generation of code within a software company, it may be necessary to re-learn and retrain according to the local characteristics of the enterprise's software development in order to better provide customized services, which is not easy to do, but I believe it can be solved in the future. HumanEval, a commonly used dataset currently used for code generation task evaluation, is a self-contained dataset where code usually does not call third-party libraries, etc. But in the real development scenario, the written code is likely to call some third-party libraries or self-written libraries. If the generated code calls the internal library of the enterprise, you can deal with it through the above relearning and retraining, but it is not easy to achieve good results.

In addition, the simplicity and comprehensibility of the generated code are not well solved at present. If there is a mechanism to confirm that the generated code is completely correct, without the user having to look at it and change it, you can save a lot of time in the process of use. In addition to seeing the code with the naked eye, how to ensure that the generated code is completely correct? You may be able to provide some test cases, but they are not necessarily complete, and the code is not reassuring that it just passes the test cases. Of course, it is also possible to provide formal requirements specifications and use theorem proofs to prove that the generated code conforms to a given specification, but this requires a large human investment, and the tools to support it are not mature enough to cope with large-scale, industrial code.

Jiang Tao: Can the quality of the code written or generated be judged by AI?

Xie Tao: The industry has begun to try this, such as developing a tool for automatic code review based on big data or AI. Some organizations have deployed such tools to assist human code reviews in a data-driven way. In addition to code reviews, AI can also help with testing tasks. However, these are still open issues, and long-term research is needed to make the landing effect good enough.

Jiang Tao: As artificial intelligence becomes stronger and stronger, will it rob ordinary people of their jobs? Even robbed the programmer's own job? What kind of programmers might be threatened?

Xie Tao: Programmers who think less and design when writing code will be threatened, such as "code farmers" in the traditional sense. Automatic code generation works well in specific areas, such as those of programming for end users. Specifically, the BI department is getting nervous, and originally relied on data analysts to write SQL statements, visual scripts, etc. according to data analysis needs, but now natural language translation to SQL statements has done a good job. In various software engineering tasks such as software testing, test data generation, test prediction generation, etc. can be automated by AI in the future, provided that there is enough enough high-quality data to feed AI. However, software engineers working on complex software such as basic software should not worry, because these R&D efforts are too complex to be threatened by AI.

Jiang Tao: Artificial intelligence is not only a helper for programmers, but also kills some people's repetitive work. Some data analysis jobs may gradually be replaced by robots, and engineers who develop basic software do not have to worry too much. What is the base software? What is the current state of domestic development?

Xie Tao: The definition of basic software in the industry varies, but the basic consensus is that it includes operating systems, programming languages, compilers, database management systems, office software, and browsers. Basic software in a broad sense also includes some development tools, test and operation tools, etc. Because office software is so commonly used and in a key position at work, it has also become one of the bases and is now considered an important part of basic software. In addition, some industrial software supports the foundation of industrial applications and is also included in the basic software of specific industries.

There are many sub-fields of domestic basic software that have stuck necks, such as operating systems, generalized industrial software MATLAB and so on. Maybe everyone has to ask Linux, Android and many other operating systems are open source, why is it stuck in the neck? A large part of the reason is ecological limitations, which make it not have our say.

Domestic operating systems have developed well in recent decades, and the state has been supporting them. But now the "heart" (kernel) of the mainstream domestic operating system is still Linux, although mainland programmers and enterprises contribute a large proportion of the Linux kernel, and even the contribution of domestic manufacturers Huawei to the Linux kernel is now ranked first, but back to the above keyword - the right to speak, it still presents a situation where there are many contributors but less voice for core decision makers. However, the mainland has a certain say in the emerging open source fields related to big data, AI, and cloud native. China's new generation of technology is catching up, and the development of enterprises often can't wait for others to lead, so they invest and breed good R&D core talents, so they have made some achievements in emerging fields.

Jiang Tao: How to define the core talents of basic software?

Xie Tao: There is such an example. Before returning to Peking University, I taught in the Department of Computer Science at UIUC (UIUC). The department has a Ph.D. graduate in 2005 named Chris Lattner, and his PhD supervisor, Vikram Adve, a former colleague of mine, introduced the LLVM architecture compiler as one of the top three compilers alongside GCC. Chris Lattner came to be known as the father of LLVM. His mentor told me that Chris developed the infrastructure for the LLVM compiler during his Ph.D., and after graduating he made up his mind to go to industry, and showed great ability, won many offers, and eventually entered Apple. Chris's thoughts, "Whoever allows and supports me to continue to carry LLVM forward, I'll go wherever I go!" "Apple is fully behind this, so LLVM has good momentum behind it. Here we can see the characteristics of the core talents of basic software.

Jiang Tao: The application of artificial intelligence is also very extensive today, water can carry boats can also overturn boats, AI has become a helper for some people, but also a weapon in the hands of some lawbreakers. CCTV has exposed that the fraud group used artificial intelligence technology to make robots make scam calls, making 17 million calls, and finally more than 800,000 people were deceived, a total of 180 million, and the deceived people could not distinguish the voice of the robot from the phone. Is there any AI technology that can help identify robot harassment calls and protect the general public?

Xie Tao: Big data and AI technology are increasingly being used by criminals such as fraud gangs, and there are endless incidents that victimize the majority of users, and it is difficult to absolutely prevent them. Just like the relationship between attack and defense in the security field, we can only prevent and control it to a certain extent by raising the threshold of fraud. At present, AI technology has made the threshold for launching attacks and scams very low, and it can also make very realistic robot synthetic sounds and very realistic conversation content. For the general public, it is indeed difficult to prevent these AI scams.

Take my personal experience as an example to share my anti-scam experience. A scammer once texted me for the phone number of one of my industry peers. This kind of inquiry is actually less common, and the average person will not directly text to ask for another person's phone number. We can carefully analyze that if the way something happens is not natural and uncommon, then there is likely to be a problem, and we should be more careful.

Jiang Tao: Now that high-imitation technology is getting stronger and stronger, Deep Fake can change faces in videos, what kind of high-imitation technology can achieve now? How to identify a high-quality website or app?

Xie Tao: Counterfeit websites, apps, mailboxes, WeChat, fraudulent text messages, phone calls, etc., judging from the content of the photos, are impossible to prevent, and fraudsters will do it very realistically. Ordinary users should learn to look at the source and look at some peripheral information. For example, look at the URL link of the website. When some emails contain a link to a website, the link that literally displays may not be the same as the link you copied and pasted into your browser. Once I collected. Once an email asked me to reset the password of an account, because I am sensitive to software security, so I did not directly click on the URL link in the email, but copied and pasted the link to the browser, and then quickly found that there was a problem with the URL. Therefore, it is important to develop the habit of opening links carefully.

For the elderly who do not understand IT, there is a direction for elderly users of mobile applications or websites - age-appropriate transformation, and explore how to make mobile applications or websites more friendly to elderly users. For example, first of all, there must be technical support for anti-fraud means, including UI design, alarm reminders and other measures. When using AI models to these scenarios, the parameters should be set to protect elderly users to a greater extent, because they have a higher probability of success by being scammed. From the perspective of the elderly users themselves, receive a call to see where the source is, now the smartphone can automatically mark the source, the elderly should pay attention to these details, do not be too credulous of some unknown source information or calls.

AI is transforming software engineering: An in-depth conversation between CSDN founder Jiang Tao and Peking University professor Xie Tao

Read on

Calligrapher and painter Mr. Xie Tao's works appreciation series "One"

Comrade Xie Tao, deputy director of the Case Management Office of the Discipline Inspection Commission of Tianfu New District, went to the water-holding community to guide the practical activities of "I do practical things for the masses"

Xie Tao's Legend of Jin Opera (II)

Xie Tao's Legend of Jin Opera (6)

【Jin opera masters】 Xie Tao

Xie Tao's dream path: Let the Jin opera art spark the original

Nine Days MSI CEO Xie Tao: If you catch a platinum asteroid, it is worth $5.4 trillion

Fuyang character | Xie Tao

Interview 丨 Jiutian MSI founder Xie Tao: The inclusion of new infrastructure is a watershed for satellite Internet, and the industry has entered a practical period

Xie Tao: Kyaukgyao magnolia accompanies my voice ▸ magnolia fragrance for thirty years ◂

A family of two Fellow Grand Slams! Xie Tao of Peking University was elected as an ACM Fellow and met with his brother Xie Yuan

2021 ACM Fellow is out! Tang Jie, Xie Tao, Li Feifei, Liu Tieyan and other 13 Chinese scientists were selected

Dr. Huaren won the ACM SIGSOFT Outstanding Doctoral Dissertation Award under the tutelage of Professor Xie Tao of Peking University

CSDN founder Jiang Tao talks to Xie Tao of Peking University: AI is transforming software engineering