laitimes

Ng's first Ph.D. student took three Chinese PhDs and left OpenAI to start a business

author:Venture State
Ng's first Ph.D. student took three Chinese PhDs and left OpenAI to start a business

Author丨Linfeng

Editor丨Sea waist

图源丨covariant官网

Imagine if you cast the magic of "smart" on the robotic arm used for sorting, it can communicate with humans in natural language, complete the sorting work, distinguish garbage from items, and consciously unload and load goods, as if a tireless factory worker could do the work of two people for one salary.

Covariant (formerly known as Embodied Intelligence), a robotics startup in Silicon Valley, is committed to bringing the robotic version of ChatGPT into people's work and study lives, and developing general-purpose artificial intelligence suitable for various scenarios. Its core product is the Covariant Brain, which can be adapted to different hardware, and is now mainly deployed on industrial robotic arms. Covariant starts with the automation of logistics warehousing and express sorting to assist humans in completing the heavy and tiring work, and the long-term vision is to develop a universal basic model.

It was founded by Pieter Abbeel, a renowned Berkeley professor and pioneer in deep reinforcement learning, and his three Chinese PhD students, Peter Chen, Rocky Duan, and Tianhao Zhang. It is worth mentioning that the first 3 are all former employees of OpenAI, from its disbanded robot team.

Ng's first Ph.D. student took three Chinese PhDs and left OpenAI to start a business

图源:covariant官网,从左至右为:段岩、张天浩、Pieter Abbeel、陈曦

Their investor lineup can be called the "team building" of the AI science community, including Jeff Dean, a senior researcher at Google, Feifei Li, a professor at Stanford, Yann LeCun and Geoffrey Hinton, two of the three giants of deep learning, Michael Jordan, a distinguished professor at Berkeley, and Daniela Rus, director of the MIT AI Lab. Last year, Bill Gates made a big splash in their Series C funding round.

Ng's first Ph.D. student took three Chinese PhDs and left OpenAI to start a business

Source: Bill Gates shared on LinkedIn

Geoffrey Hinton felt that he had invested less, and tweeted that he regretted it, "I should have invested 100 times more." ”

Ng's first Ph.D. student took three Chinese PhDs and left OpenAI to start a business

Source: Twitter

至今,Covariant融资5轮,融资总额2.22亿美元。 最近一次C+轮融资发生在2023年4月,Index Ventures、Radical Ventures领投,Amplify Partners、Gates Frontier Fund等跟投,融资7500万美元,此前资方包括淡马锡、Radical Ventures、Amplify Partners、Samsung NEXT、峰瑞资本等。

Ng's first Ph.D. student took three Chinese PhDs and left OpenAI to start a business

From the strength of the team, it is clear that they will not draw cakes easily. On March 12, Covariant released RFM-1, a general basic model for robotics, which equips robotic arms with "brains" that enable them to understand and recognize the physical world and allow humans to communicate with them in natural language. RFM-1 can take data to train to do more. On the 27th, it learned to reflect on improvements and propose strategies, just like ChatGPT came into reality.

1. A "brain" will do 5 kinds of logistics and warehousing work

Unlike Figure.ai, Tesla, Agility and other robotics companies that also work in factories, Covariant develops the "brain" that controls robots from pure software and pure artificial intelligence. CEO Peter Chen (hereinafter referred to as "Chen Xi") believes that this will allow them to delve deeper into AI than other companies.

Their core product, Covariant Brain, gives robots the ability to see, think, and act, because they learn enough to "pick up" any object like a chick pecking rice, regardless of size, shape, packaging, size, texture, and pattern.

In the same way that generative AI can write marketing copy on the fly, Covariant empowers bot portfolios to be more productive by employing a single general-purpose AI model, allowing them to pick, sort, and place almost any item on the spot.

Due to the power of the AI system, Covariant Brain has partnered with well-known warehouse and logistics companies such as Knapp, ABB, Bastian, Fortna, etc. "As the picking task became more and more tricky, each time we expected them to fail on the next product, but everything went very smoothly," commented Peter Puchwein, Vice President of Innovation at KNAPP.

Ng's first Ph.D. student took three Chinese PhDs and left OpenAI to start a business

图源:covariant官网

Let's take a look at Covariant's existing warehousing application scenarios. One of them is the Robotic Putwall, which is used for batch picking and return processing. It automatically sorts mixed SKU packs. In October 2022, B2C e-commerce Radial equipped 12 Covariant Putwall robots, which are said to be able to perform an average of about 100,000 picks per robot per month when fully operational.

Ng's first Ph.D. student took three Chinese PhDs and left OpenAI to start a business

What's more: Covariant Putwall

Second, the Robotic Induction, Covariant's robotic delivery system can realize automatic warehousing operations, autonomously put items into unit sorters, bag sorters, automatic guided vehicles, automatic bagging machines and other equipment, identify each product to determine the best gripping point and grasping speed, and classify and group items to the packaging station. These systems work in conjunction with KNAPP's mechanical sorting to deploy smart e-commerce warehouses for GXO.

Ng's first Ph.D. student took three Chinese PhDs and left OpenAI to start a business

来源:covariant Induction简介

The third type, Goods-to-Person Picking, is aimed at a more brainpower-intensive scenario - picking. In traditional warehousing scenarios, humans often move and select goods in shuttles, automated guided vehicles, and other automated access and retrieval systems. Covariant turns this work into a "cargo-robot" picking, which takes most of the repetitive work around. McKesson, a U.S. pharmaceutical distributor, has strict warehouse picking, and due to the complexity of drug packaging and the shortage of professional labor, a large number of intelligent picking is required.

Ng's first Ph.D. student took three Chinese PhDs and left OpenAI to start a business

图源:covariant Goods-to-Person Picking简介

Ng's first Ph.D. student took three Chinese PhDs and left OpenAI to start a business

Source: KNAPP intelligent Pick-it-Easy robot is responsible for picking goods at McKesson Pharmaceuticals in the United States

The fourth category is Robotic Kitting, which is suitable for packaging work with a small footprint, and does automatic assembly of co-packaging, meal preparation or subscription services.

Ng's first Ph.D. student took three Chinese PhDs and left OpenAI to start a business

There is also depalletization, which is the unloading of mixed SKUs onto the conveyor belt to ensure that the storage and picking areas are replenished in a timely manner. A major home improvement retailer allegedly deployed multiple Covariant de-chopping systems to improve efficiency.

Ng's first Ph.D. student took three Chinese PhDs and left OpenAI to start a business
Ng's first Ph.D. student took three Chinese PhDs and left OpenAI to start a business

图源:covariant官网

Combined with the use cases, Covariant has been deploying robots to real-world sites around the world for data collection since 2017. ChatGPT needs to learn a lot of data, and Covariant's bots are no different. In order to ensure the construction of a high-performance robot base model, physical interaction from the real environment is done.

In addition, the collection of data can help robots gain insight into rare events in the physical world and discover special situations that are rarely encountered in a laboratory setting. It mainly collects data such as multi-angle video, still images, site and task descriptions, motor encoders, and pressure sensors.

Traditional robotic arms in warehousing and logistics scenarios are bulky and slow, and they need to pick up specific items and follow the prescribed route. Covariant allows robots to come to the scene and directly challenge the high level of difficulty. They have been letting their system operate to "pick up" deformable objects in high occlusion situations, from cylindrical cups to irregular yellow ducks, which are placed chaotically and inconsistently, which completely tests the suction strength of different materials for the robot's own reasoning, and when the machine synchronization rate reaches 99%, it can already be equivalent to the level of human labor.

In March this year, on the basis of the Covariant Brain AI platform deploying robots to collect a large amount of data, the RFM-1 robot base model was successfully launched, Chen Xi said, the model is basically a large language model, but designed for robot language.

2. Robot models that can be actively solved

Powered by RFM-1 and trained on the largest dataset of multimodal robots in warehouses around the world, Covariant Brain has the robot picking up any SKU or cargo in a day, according to the website. The release of RFM-1, a Transformer model trained on 8 billion parameters, represents a solid step towards a generalized AI model that accurately simulates and operates under the complex conditions of the physical world.

According to the study, RFM-1 is a multi-modal arbitrary sequence model, which is trained on a series of digital sensor readings such as text, image, video, and robot action. It does this by converting all modalities into a common space and performing autoregressive predictions for the next label.

In layman's terms, RFM-1 does image-to-image learning, understands human text instructions, observes the images fed to it, pairs scene images with target grabbings, and feeds back simulation results in the form of video. According to the "machine power" analysis, RFM-1 can be thought of as a video generator, where a command to pick up something is entered, and the system uses the training data (shape, color, size) to identify the object that best matches the description, and then generates a video that predicts what will happen when you pick up the object and determine the best course of action.

Ng's first Ph.D. student took three Chinese PhDs and left OpenAI to start a business

Source: covariant official website, RFM-1 introduction, Figure 1 (top left), Figure 2 (top right), Figure 3 (bottom left), Figure 4 (bottom right)

For example, RFM-1 generates Figure 3 (a simulated pickup action video) based on Figure 1 (initial action) and Figure 2 (prescribed items), and Figure 4 is its actual selection in the real world. It is important to note that RFM-1's actions are to reason and predict changes in the object box in the next few seconds to aid decision-making, rather than mechanically holding it.

Ng's first Ph.D. student took three Chinese PhDs and left OpenAI to start a business

Source: Covariant official website. This image is an image generated by RFM-1 showing that the appearance of the tote is predicted (right), assuming that a specific item is selected from the starting tote bag (left) (center)

Not only that, but anyone can collaborate with a robot using natural language, without the need for programming and engineering backgrounds. As shown in the figure below, the operator can ask the robot to pick up red apples and bathroom products in a few simple words of English.

If a problem is encountered, the RFM-1 will actively ask a human question, and the operator can tell it how to do it in natural language. As shown in the figure below, the robotic arm cannot find the point of focus when grabbing the tennis ball, it will actively ask the human what to do, and after the operator provides guidance, it can accurately continue the operation according to the suggestions.

In the blog post, Covariant mentions that RFM-1 has limitations and has not yet been deployed to customers, and expects the data collected to accelerate the localization and learning of RFM-1's failure modes.

Due to the limited length of the context and the fact that it also operates at a relatively low resolution and frame rate, the RFM-1 can already capture the deformation of large objects, but it cannot simulate small objects/fast movements, which can be difficult to screw and peel.

On March 27, Covariant posted that there was a major update to RFM-1, which allows robots to come up with improved behaviors when they encounter a difficult problem by reflecting on recent behavior. For example, when it fails to grab a sock a few times when it grabs a new item, it reflects on itself and then has an internal dialogue that it thinks it can grab by sucking up the paper shell.

However, Covariant is still far from the goal. They believe that RFM-1 is a general-purpose robotic brain, and it cannot be ruled out that it will intervene in any embodied device (including humanoid robots). To do so, they need to collect data at least 10 times faster.

Chen Xi revealed that they will open up the API to other robot companies with the maturity of RFM-1, "In the future, a large number of robot developers and companies will access our API, and we hope to become their GPT platform." ”

3. Leave OpenAI to collect real-world data, and take the road that OpenAI can't go

Chen Xi said, "In addition to ChatGPT, there are many natural language processing AI on the market, which are used for search, translation, and spam. The approach is to train a specific AI with a smaller subset of data for each use case. The approach to the base model should be to train a large generalized model on more data, so that the AI can also be more generalized. ”

This idea is basically the same as OpenAI's commitment to the path to AGI, as three of its four team members are from the robotics team that OpenAI abandoned. Covariant was founded in 2017 by a well-connected Pieter Abbeel, who led three Chinese PhDs, Peter Chen, Rocky Duan, and Tianhao Zhang, all from the University of California, Berkeley's Artificial Intelligence Laboratory (BAIR).

Ng's first Ph.D. student took three Chinese PhDs and left OpenAI to start a business

图源:covariant,从左至右,张天浩,段岩,陈曦,Pieter Abbeel

Among the team, the most prestigious is Pieter, Ng's first Ph.D. student and Ph.D. in computer science from Stanford University. He founded the Berkeley Robotics Learning Lab, is co-director of BAIR, and became a tenured professor at Berkeley in 2017. His research focuses on robotics and machine learning, and he has co-authored about 357 papers. According to Pieter's personal website, his main research interests include AI, reinforcement learning, and robotics, and as early as 2008 his blog discussed the application of teaching robots to learn from demonstrations (apprenticeship) and trial-and-error learning (reinforcement learning) in machine control. It can be seen that Pieter's research has had a significant impact on Covariant Brain.

Ng's first Ph.D. student took three Chinese PhDs and left OpenAI to start a business

Source: Pieter Abbeel's personal website, https://people.eecs.berkeley.edu/~pabbeel/

Pieter won the ACM Computing Award in 2021. Outside of academia, Pieter is well-connected in the industry, hosting the podcast The Robot Brains, and interviewing scientists and venture capitalists such as Ilya Sutskever, Andrej Karpathy, and Geoff Hinton.

He is a scientist, serial entrepreneur (Covariant, Gradescope), media presenter, and VC partner. Pieter joined OpenAI in its second year of existence as part of its robotics team. Also joined by the genius teenagers Duan Yan and Chen Xi at the same time.

Except for Pieter, the other three Chinese may all be post-90s guys around 30 years old.

Xi Chen, CEO, Ph.D. Berkeley in 2016, BAIR Fellow, like Pieter, focuses on reinforcement learning, meta-learning, and unsupervised learning, and has published more than 30 articles and cited more than 20,000 times.

Ng's first Ph.D. student took three Chinese PhDs and left OpenAI to start a business

Source: Forbes

Yan Duan, CTO, Berkeley, Ph.D., completed his Ph.D. in 2 years, and worked as a software engineering intern at EDX for 3 months. At the age of 21, he became one of the earliest employees of OpenAI, and was selected as one of the "30under30 Elites" by Forbes in 2024, with more than 15,000 citations of industry-related research. Pieter describes him as "10 times more productive than anyone very productive".

Ng's first Ph.D. student took three Chinese PhDs and left OpenAI to start a business

Source: Duan Yan's personal webpage, http://rockyduan.com/

Tianhao Zhang, co-creator, has a double degree in Berkeley and has been studying for a doctorate since 2016. He has worked as a software engineering intern at MongoDB and as a research intern at Microsoft. He is currently on suspension like Chen Xi.

Ng's first Ph.D. student took three Chinese PhDs and left OpenAI to start a business

Source: Zhang Tianhao's personal webpage, http://tianhaozhang.com/

In May 2017, OpenAI released open-source software for simulating and controlling robots, creating a system for use in physical robots, and its system algorithms can learn from failures and strengthen learning. Two years later, they showed the robot in action for the first time, and it didn't work very well. According to Venturebeat, at the end of 2021, OpenAI co-creator Wojciech Zaremba revealed the decision to disband the robotics team. Because from the actual business situation, robotics is a capital-intensive field, and it is not easy for startups to walk.

Of course, OpenAI has not completely abandoned robots, at least in March 2024, Figure 01, which is supported by OpenAI's GPT4v visual language model, will be able to clean up the desktop garbage while chatting. At the same time, the RFM-1 introduced by Covariant also learned to "pick up" soft socks on its own. The two AI "brains" have their own strengths and are used in flexible physical devices.

Ng's first Ph.D. student took three Chinese PhDs and left OpenAI to start a business

Source: TheAIGRID, Figure 01 Active Cleaning Desktop

In Pieter's podcast, Yan Duan talked about the reasons for leaving OpenAI and starting Covariant, and he doesn't deny the efficiency of the OpenAI team, where their focus is to push basic learning algorithms and take on challenging tasks.

One day, he and Chen Xi were in a Chinese restaurant discussing how to take robot learning to the next level, and they thought: "It's not enough to just develop and improve algorithms, it's more important to get the right data." Data, not only annotations, but also the various tasks performed by the robot. ”

The reason is simple: bridging the gap between academia and industry is to learn by doing. Duan Yan's concept is that in order for AI robots to really work, they have to scale horizontally. "We need to deploy robots at scale in a commercial setting, combining experience to improve learning systems – data that is not accessible in a laboratory academic setting. ”

With the departure of the core members of the robotics team to start a business, OpenAI will also shift its research focus to other basic models that are easy to obtain data after 2021. Covariant moved towards the research of AI robots, and after many years of silence collecting data, he returned to Wang Bang.

In 2018, Covariant developed the Covariant Brain and the first AI robotic solution for automated warehousing and pick and place, waiting for an application opportunity.

The first turning point came when Covariant was given the opportunity to deploy. According to Fortune, industrial robot manufacturer ABB held a grabbing competition in 2019 to evaluate potential partners to see if AI was mature enough to be applied to the field of robot automation. ABB invited 20 robotics companies (10 European, 10 American) to challenge 26 items for complex picking, packing and sorting, including apples, clamshell items, toys and more, half of which were kept secret before the competition. Covariant was the only one of them to successfully complete all the challenges. Won a major account ABB, Covariant and soon deployed the cooperation solution to the e-commerce travel service provider Active Ants (part of Bpost).

Ng's first Ph.D. student took three Chinese PhDs and left OpenAI to start a business

Source: ABB

Covariant became the preferred provider of standard customer-facing solution tools for ABB and Bpost, and large customers flocked to it. They work closely with KNAPP's pick-it-easy pick-it-easy robot and are linked to well-known companies in North America, Australia and Germany, such as GXO, McKesson and Obeta.

Ng's first Ph.D. student took three Chinese PhDs and left OpenAI to start a business

图源:covariant官网

Large logistics companies have always had a market demand for AI automated sorting. According to data, automated manual tasks (picking, packing, loading, unloading) for retailers and logistics vendors account for 60% of fulfillment costs. According to KNAPP's Peter Puchwein, who has been working in automated warehouses for 19 years, AI robots can pick 95%-99% of goods, beating only 10% of non-AI robots. Peter doesn't believe the AI startup's edited video showcase. Around 2020, their engineers traveled the world in search of the best picking robots, and finally chose Covariant, which took three or four months to test before finally bringing this type of machine to market.

Peter used the analogy of a worker earning $40,000 a year, and their tireless KNAPP robotic solution cost $30,000, which made the customer "can't say no."

And, at the time, Covariant claimed to have sorted 10,000 different items with an accuracy rate of over 99 percent, meaning it was almost on par with human labor.

As the use cases increased, Covariant gradually rolled out new features, and by the time it partnered with Capacity in 2022, it had fulfilled thousands of orders at more than 500 PPH (processing 500 orders per hour), with less than 0.1% requiring human intervention. It is important to note that PPH is the main benchmark and indicator for measuring robot gripping in academia, industry and standards bodies, and at this time Covariant's PPH is very close to that of humans (400-600 times/hour).

Nevertheless, Chen Xi believes that using MPPH to test the performance of the robot's gripping system has become an obsolete indicator: "We measure more of the reliability of the system, that is, the number of interventions per hour, that is, the frequency of human involvement." His views were endorsed by Lael Odhner, CTO of RightHand Robotics.

In the later period, Covariant continued to broaden the path of enterprise cooperation. In 2022, Bpost's Radial integrated 12 Putwalls and achieved 425 PPHs. In 2023, Otto, Europe's largest online retailer, entered into a strategic partnership with it to deploy more than 100 AI bots to handle order picking.

Ng's first Ph.D. student took three Chinese PhDs and left OpenAI to start a business

Source: covariant official website, cobot with Otto

Covariant expands globally with new global offices every two years. According to the official website, Covariant has set up Shenzhen and London offices from 2019 to 2021.

Overall, Covariant is currently using its "brain" to work with large robot manufacturers such as KNAPP to collect data and experience to continuously upgrade AI systems, which is also its main business model.

Fourth, AI picks up things and walks in the track of the 10 billion market

Don't you think it's a bit "overkill" to let such a powerful AI brain pick things up?

Jordan Jacobs, Partner at Radical Ventures, explains the challenge: "It was very difficult to develop an AI system that could accurately operate a robotic arm to identify the object from a pile of cluttered cargo, turn it upside down, roll it upside down, and straighten it up. ”

Teaching AI robots to pick up things is indeed a seemingly easy task, but it has been a problem that most of the world's largest factories and research laboratories have been bothering for many years. What a 1-year-old baby can do, it is very difficult for a robotic arm. One is to make it able to grasp itself, and the other is to make it able to grasp most items, which include actuator control, fixture friction problems, sensor perception interpretation, and the influence of noisy data.

For example, in 2016, Google ran 14 robotic arms to learn how to grasp things, and in the same year, Amazon won the robot picking competition because its robot picks products at a speed of 100 pieces per hour (compared to 400 pieces per hour for humans).

Ng's first Ph.D. student took three Chinese PhDs and left OpenAI to start a business
Ng's first Ph.D. student took three Chinese PhDs and left OpenAI to start a business

Source: The verge, on Google, on Amazon

Not only that, the robot helps the warehousing and logistics of large factories to "pick up things" in response to demand. The U.S. Bureau of Labor Statistics (BLS) reports that the warehouse and warehousing industry employs more than 1 million workers at 17,000 locations across the country, and that 5% of warehouse workers experience at least one workplace accident each year. In 2021, the industry became the second-highest turnover industry. More data shows that U.S. companies lose $62 billion annually to work-related injuries.

There is also a huge market for the robot track. When Musk released his robot Optimus in May last year, he said that humanoid robots have long-term value in the future, and they are expected to reach 10 billion units under optimistic predictions. According to a research report, by 2023, the global market revenue of industrial robots will be nearly 43.8 billion US dollars. In addition, humanoid robots are expected to confirm the logic of "AI soft cutting hard" and become the ultimate application of AI+, with a CAGR of 71% in the global market size between 2021 and 2030, and the most noteworthy categories are machine vision, industrial robots, service robots, etc. According to a McKinsey report, about 400 million jobs worldwide will be replaced by automated robots by 2030.

From this, we can also see the hidden concern of Covariant, even if you have a powerful "brain", you need to consider the research and development of the body. Some analysts said that the current direct cooperation between Covariant and large customers needs to install AI brains with robotic arm manufacturers such as KNAPP, but KNAPP itself is also a solution provider with highly automated machinery, and if it is limited by the software side and is a secondary supplier, it will have a certain impact on commercial bargaining. In addition, the Covariant channel in the start-up stage is difficult to compete with Amazon, an existing logistics robot company.

When OpenAI disbanded the robotics team in 2021, Zaremba added: "If we were a robotics company, we would continue. I'm a big believer in the approach and direction of the robotics team, but we're missing some components in terms of what we're trying to achieve. "They hinted that the robotics business, which has a low return on investment, is not good for the market.

But this is not the same as Covariant, at that time, OpenAI did not have field data collection for their finished products, and they did not commercialize. Chen Xi envisions that in the future, artificial intelligence will become a catalyst for the explosive growth of robot applications. A single basic model (general AI platform) can support the robot to work across geographies and tasks, so that it can perform tasks intelligently and autonomously, unlike narrow AI, which can only find patterns in a predefined way, and the development of general AI means that it can handle anomalies in the environment.

Chen Xi hopes that in the future, Covariant will be able to provide brains for millions, tens of millions, hundreds of millions, or even billions of robots, "It is not only a single robot application, nor is it just hardware." ”

Read on