Shi Xiaoyang, a post-00s researcher at Stanford: I have always been restless and determined to build smart robots

Shi Xiaoyang, a post-00s researcher at Stanford: I have always been restless and determined to build smart robots

Demi Guo, founder of Pika, a video model for Wensheng, and Zipeng Fu, Tony Zhao, and Lucy Shi, members of the R&D team of the "Mobile Aloha" all-round housework robot...... The presence of young Chinese researchers in Stanford University's artificial intelligence laboratory has frequently attracted attention.

史潇洋三人同属于斯坦福IRIS (Intelligence through Robotic Interaction at Scale)Lab,导师为Chelsea Finn。 史潇洋最新的研究成果是名为“Yell At Your Robot”(简称YAY Robot)的系统。 有了这个系统,可以通过“喊话”来训练机器人。 澎湃科技(近日专访了她。

Shi Xiaoyang, a post-00s researcher at Stanford: I have always been restless and determined to build smart robots

Shi Tan Hiroshi

Shi Xiaoyang, 23, graduated from the High School Affiliated to Renmin University of Chinese in 2019 and entered the University of Southern California to study computer science. During this time, she worked on multimodal large models at NVIDIA, collaborated with renowned AI scholars Chelsea Finn, Sergey Levine, and Dr. Jim Fan, and was invited to give a lecture by Google DeepMind.

Shi Xiaoyang, who joined the Stanford University Artificial Intelligence Laboratory as a student researcher, felt the greatest feeling of Stanford's free academic atmosphere. Here, you can get more support from artificial intelligence, computer connections, and research resources. At the same time, she can also feel the enthusiasm of the AI entrepreneurial atmosphere in Silicon Valley, which can be described as both opportunities and risks. There will be an artificial intelligence startup team around you, and even before there is a formal business name and business plan, it has attracted $70 million in investment funds.

As a young technocratic idealist, Shi's research goal is to create intelligent robots that can smoothly perform the complex, long-term tasks of human daily life, "from the home to the factory, to help people deal with tedious and dangerous things." She is a firm believer in human ingenuity and the potential of artificial intelligence.

The following is a transcript of the conversation between The Paper and Shi Xiaoyang:

"Housework scenarios allow robots to learn more deeply"

The Paper: Mobile Aloha, which was out of the circle before, is a housekeeping robot, and this time the Yell at Your Robot system. Many of the scenarios in the experiment are also based on housework scenes. Why are you targeting the housework scene?

Shi Xiaoyang: The traditional robot method prescribes a series of mechanical actions through a program. However, we prefer to train the robot through deep learning, so that the machine has the ability to generalize, hoping that the robot can know how to respond in infinite scenarios.

Housework scenes can vary from day to day or even hour to hour. In the housework scene, the machine can explore and learn in a scene that has never been seen before. It's a matter of algorithms and data. We want to give the machine some simple natural language instructions, and it can do something that has not been done and is not very good at it.

The learning scenario in the factory is relatively fixed, and traditional machines can also do it in certain situations. Housework scenarios are more complex and difficult for traditional robots to do, but machine learning is possible. When we are given a considerable amount of models and data, it is possible for machines to achieve the ability to generalize like humans through deep learning.

Shi Xiaoyang, a post-00s researcher at Stanford: I have always been restless and determined to build smart robots

Ye Robot实验场景

The Paper: Why did you name the system Yell At Your Robot?

Shi Xiaoyang: Originally, we had a relatively academic name, but considering that everyone might not know what they were talking about, we finally decided to do the opposite and choose a more understandable name.

Anyone who makes robots knows that it is very painful to let robots learn. Sometimes, the robot is like a child, for example, in the process of training the robot to cook a complex task, it may be almost "move the hand to the left half a centimeter" to complete, then the simplest and most direct way we think of is to speak, "move a little to the left", or "use a spoon to hold the bag a little more open" and other very everyday language is our instructions. At one of the input layers of the model, we use a large language model to make the machine have a better understanding of everyday language. It's like having a big model to translate what we say into a language that a robot can understand.

The Paper: How long did it take you to develop?

Shi Xiaoyang: The project itself has been done for nearly half a year. We worked this project and did it very quickly.

The project team consists of 8 people, mainly doctoral students and researchers, in addition to two professors and two postdoctoral fellows.

We have group meetings almost every week. Because we are doing a system, from the lower-level software and hardware to the entire data collection system and data quality assessment, all the work is done by ourselves, and after the system is built, we need to continuously iterate the algorithm and model, so that the robot can be trained and evaluated in the real world.

The Paper: What are your plans for this system?

Shi Xiaoyang: There will be more work in the future. This includes allowing the YAY Robot system to handle more complex problems, such as allowing the robot to complete a chore that has never been done before with very simple natural language instructions and some simple teachings, so that the robot can serve according to the user's wishes. There may be more technical issues involved here, such as how to distinguish the good and bad of the data through a set of algorithms in the process of collecting data, how to turn useless data into useful data, how to use large language and multimodal video models, and how to make robots learn how to learn better, etc., which may be something that we will continue to explore in the next few months.

We will work with some universities and enterprises, and all the code of the school project will be open sourced. The advantage of working with enterprises is that they can get more computing resources, but whether they can open up all the technical details needs to be discussed.

Whether it is a university or a company, "high cost of training data" is a problem

The Paper: What feedback did you receive after the release of this system?

Shi Xiaoyang: In fact, after we posted, in addition to comments on social platforms and feedback from academic circles, we also received many emails from companies and venture capital firms. Artificial intelligence companies mainly ask if we can help them train models, such as training robots to book air tickets, or use them on humanoid robots, and some want to understand the details of algorithms, while venture capital companies are very direct and ask us if we want to start a business.

This feedback gives me more hope for AI. The progress and development of many science and technology need the promotion of capital forces and talents. In this process, I have met a lot of very talented people, and if there is capital and the market, AI will have more breakthroughs and more impactful products in the next few months or years, which is very important.

The Paper: So can you really train a robot to help book a ticket?

Shi Xiaoyang: In theory, yes, but in practice, one of the most difficult questions is where does the data come from? The advantage of large language models is that the Internet provides them with a considerable amount of corpus as training data, but there is no data for things that need to be decided, such as booking air tickets and doing housework. Nowadays, many company projects and university research are facing the same situation, that is, the cost of training data is very high.

The research environment at Stanford is generally quite good, but of course it also depends on the field of research and the specific laboratory. Personally, I really like the liberal academic atmosphere here, and the tutors encourage people to explore topics that have never been explored before.

There are also a lot of connections in the computer field, many artificial intelligence companies are in Silicon Valley, and Stanford also has a pretty good atmosphere of school-enterprise cooperation. In the development of artificial intelligence, capital is very important. Why can deep learning take off? It's because of graphics cards and computing resources. The development of graphics cards has benefited in part from the gaming world. Too many people in the world play games, and these game companies are created, and then there are better and better hardware and graphics cards, and with better graphics cards, you can train bigger models. These big models are getting better and better, and there is today's artificial intelligence.

The capital heat is very high, and the robot will develop rapidly

The Paper: You just said that Stanford students are very closely connected to Silicon Valley AI companies. What is the startup atmosphere like in Silicon Valley right now?

Shi Xiaoyang: There are two main types of entrepreneurship directions that are attracting the most attention in Silicon Valley, one is AI, and the other is Web3. There are also many e-commerce platforms in the past, and the focus has begun to shift in the direction of AI.

AI entrepreneurship in Silicon Valley can only be said to be very hot capital. For example, I know of an artificial intelligence startup that received a $70 million investment when it didn't have a name and no business plan. Now startups are springing up like mushrooms, but the competition is also quite fierce.

The Paper: Would you consider starting a business?

Shi Xiaoyang: Yes, but we will also consider the risks and whether the technology has reached the right point in time. I think that business success requires the right time, place, and people, and it is difficult to be satisfied with any of them. I think I'm still quite an academic person, and I still want to explore some academic questions in depth, such as whether robots or artificial intelligence can achieve autonomous improvement and efficient use of data.

The Paper: When did you first become interested in robotics?

Shi Xiaoyang: The interest in robotics may come from the interest in aerospace. When I was a junior in high school, I became interested in aerospace because there are a lot of unsafe jobs in the aerospace field that need to be solved by robots. But there aren't many smart robots in the whole world, so I want to build smart robots.

I am now doing AI research, and I believe that AI is very important to the progress of society, both for technology and for society. But I'm also sure that there are a lot of things in the world that are just as important, such as the news media, I also worked as a student journalist, I studied philosophy and sociology in college, and then I crossed over to aerospace engineering and spent time in business school. I've always been the restless type, asking questions and looking for answers, and now I'm firmly committed to AI.

I think technology needs to evolve at a rapid pace, but at the same time, it's important to make sure it's safe, and to make sure that it's reaching the biggest people as much as possible, not just the elite. Especially in the field of artificial intelligence, there are many social issues, such as legal norms, public education, and social equity, which require more thinking.

The Paper: What is your ideal situation for the development of artificial intelligence in the future?

Shi Xiaoyang: To sum it up in one word, it is IA (Intelligence Augmentation), intelligence enhancement, and now artificial intelligence is AI (Artificial Intelligence). In the future, AI will not only solve our physical needs, such as when we get home, robots will already cook and clean, giving us more free time and space, but it can also promote innovation and scientific development. I hope that intelligent systems can help us solve a lot of these problems.

The Paper: The concept of embodied intelligence is very highly discussed in 2024. In your opinion, what will be the development trend of robotics in 2024?

Shi Xiaoyang: The overall development should be more functional and applicable, for example, there may be more robots in housework scenarios in the future. There will also be some robotics companies that will go to more subdivided areas.

Robots will get more attention. The sub-field of artificial intelligence, which was previously concerned, has made breakthroughs in recent years and is almost being solved. Robots can be regarded as one of the most difficult to solve. Now more and more people are trying to participate and gnaw this hard bone, so there will be a large influx of talents and funds, and robots will enter a stage of rapid development. This could lead to the Fourth Industrial Revolution.

Read on