laitimes

Byte response was "blocked" by OpenAI: The use of GPT to train models has been stopped in the middle of the year | At the forefront

author:36 Krypton

Text | Zhou Xinyu

Edit | Tang Wing-yee

On the morning of December 16, 2023, Beijing time, an article by Alex Health, author of the technology media Command Line, put OpenAI's complaint against ByteDance on the table.

In this "article", Byte is accused of secretly using OpenAI's model API to train and evaluate models at almost every stage of Project Seed, a large language model development project.

The employees involved know this. Alex Health claims to have seen it firsthand on Byte's messaging platform, Feishu, where employees discuss how to whitewash evidence through data redaction, "Abuse is so widespread that Project Seed's employees often reach the maximum number of times they can access the API." ”

The complaint ended with OpenAI banning ByteDance's account. OpenAI spokesperson Niko Felix issued a statement via Alex Health:

All API customers are required to adhere to our usage policy to ensure that our technology is being used well. While ByteDance has had minimal use of our API, we have suspended their account while further investigation continues. If we find that their use does not comply with these policies, we will ask them to make the necessary changes or terminate their account.
Byte response was "blocked" by OpenAI: The use of GPT to train models has been stopped in the middle of the year | At the forefront

Statement from OpenAI spokesperson Niko Felix.

The so-called "Seed" is a basic large language model development project launched by Byte at the end of 2022. There are two main products under the project, one is the chatbot "Doubao" that has been launched in China, and the other is a robot platform that is under development and plans to provide services to the outside world through Volcano Engine.

An industry insider told 36Kr that it is not uncommon for domestic manufacturers to use the APIs of foreign mainstream models to test the water and train models first: "Use advanced models to run the business first, and then replace them when their model training capabilities reach the standard." ”

A number of people familiar with the matter revealed to 36Kr that the current model business distance of ByteDance, whether it is the product project Flow or the large model project Seed, has the intention of grasping both domestic and overseas business. Due to the policy, the domestic business will use the model independently developed by Bytes, while the overseas business will first use the model API service of foreign manufacturers.

In OpenAI's service regulations, there is indeed content related to competition protection. In order to prevent customers from using OpenAI's services to develop competing products, OpenAI has made strict regulations on the scope of use by customers: only non-commercial AI models for data governance are allowed, or models used to fine-tune OpenAI's external services.

Byte response was "blocked" by OpenAI: The use of GPT to train models has been stopped in the middle of the year | At the forefront

OpenAI's Terms of Service.

After the "blackout" turmoil, ByteDance spokesman Jodi Seth also responded quickly on the same day. She said that the data generated by GPT was used to label the model in the early days of Project Seed and was removed from ByteDance's training data around the middle of this year:

ByteDance has a license from Microsoft to use the GPT API. We use GPT to power products and features for non-Chinese markets, but use our self-development model to power Doubao, which is only available in China.

The statement acknowledges that Byte trained a model with GPT-generated data, but that occurred before OpenAI set the service regulations. It can be seen that the earliest version of OpenAI's service regulations was released on August 28, 2023, and Byte claims to have stopped applying GPT-generated data to the training process before the middle of the year.

Byte response was "blocked" by OpenAI: The use of GPT to train models has been stopped in the middle of the year | At the forefront

OpenAI's first version of the service regulations was updated in August 2023.

Another important point of Byte's response is to emphasize that GPT's API service is obtained through Microsoft's cloud service Azure, rather than directly from OpenAI. In other words, OpenAI's "blacking" seems to be more substantial.

However, even Microsoft Azure has a similar competition protection clause to OpenAI: "Customer shall not use, and will not allow third parties to use, create, train, or improve (directly or indirectly) similar or competitive products or services using Microsoft generative AI services." ”

Byte response was "blocked" by OpenAI: The use of GPT to train models has been stopped in the middle of the year | At the forefront

Microsoft Azure Generative AI Terms of Service

Nowadays, many people are waiting for a response from Microsoft Azure. Microsoft's attitude will be crucial for bytes, whose overseas AI business relies on the APIs of foreign manufacturers.

Byte response was "blocked" by OpenAI: The use of GPT to train models has been stopped in the middle of the year | At the forefront

Welcome to the exchange

Read on