laitimes

OpenAI steals millions of users? Star big model becomes "data thief"!

author:LinkFocus

"Despite having agreements in place to purchase and use personal information, the defendants took a different approach: stealing." A law firm took OpenAI to court in a 157-page lawsuit, accusing it of stealing large amounts of personal information to train AI models, driven by profits.

OpenAI steals millions of users? Star big model becomes "data thief"!

OpenAI's scraping of data was unprecedented in scale, and the company stole about 300 billion words of content from the internet, including books, articles, websites and posts, and even personal information without consent, the indictment said. The data theft involved an estimated million, potentially $3 billion in damages, and violated the terms of service agreements and state and federal privacy and property laws.

"By collecting and misappropriating previously obscure personal data of millions of people to develop unstable, untested technologies, OpenAI puts everyone at immeasurable risk, but any responsible data protection and use measures are unacceptable." Timothy K. Giordano, a partner at the law firm, said.

OpenAI steals millions of users? Star big model becomes "data thief"!

Therefore, the plaintiff asked the court to temporarily freeze commercial access and further development of OpenAI products. This includes allowing people to opt out of data collection and preventing their products from surpassing human intelligence and causing harm to others. In addition to OpenAI, Microsoft, the main backer behind it, was also named as a defendant.

OpenAI isn't the only company using the internet to acquire massive amounts of data to train AI models; Google, Meta, Microsoft, and a growing number of other companies are doing the same. But a partner at the law firm said they decided to go after OpenAI because last year OpenAI spurred bigger competitors to launch their own AI products through ChatGPT, so they were naturally the first target.

OpenAI steals millions of users? Star big model becomes "data thief"!

As data-based models proliferate, data security is becoming increasingly important. Therefore, whether OpenAI lawfully and reasonably collects and uses users' personal information in accordance with its privacy policy, and whether it effectively identifies and excludes "incidentally" personal information contained in its training data sources, may be the focus of the lawsuit.

This wave is not flat, and that wave is rising again. According to Reuters, two more authors sued OpenAI in federal court in San Francisco, arguing that OpenAI misused its work to train ChatGPT, mining data from thousands of books without permission, infringing on the authors' copyrights.

OpenAI steals millions of users? Star big model becomes "data thief"!

According to public information, in March this year, after ChatGPT was found to have accidentally leaked user chat records, the Italian Data Protection Authority announced at the end of March that it would temporarily disable ChatGPT and investigate the tool for allegedly violating privacy rules. Canada is also investigating complaints that OpenAI "collects, uses and discloses personal information without consent."

In April, Reddit officially announced that it would charge companies that call its APIs because OpenAI, Google and other companies use data on the platform to train models. For a while, the training data problems surrounding OpenAI were constantly exposed.

OpenAI steals millions of users? Star big model becomes "data thief"!

Generative artificial intelligence products built on the principle of large models are the "aesthetics of violence" under the blessing of computing power and data, data is the threshold, corpus massive data has a high degree of data compliance risk, ChatGPT with 100 million users and billions of visits is the first to suffer the problem because of its "big tree".

However, this is not an isolated case of OpenAI and ChatGPT, and the data security issues exposed by it, such as privacy leakage, storage of sensitive information, and unauthorized access, are common problems that large model products may face after they are applied. Since the release of ChatGPT, Chinese companies have released more than 70 basic large models. How to achieve data compliance in the next commercial process has become a "must-answer question" that every product needs to face.

OpenAI steals millions of users? Star big model becomes "data thief"!

summary

The wave of AI will not stop, and how to steer the rudder of the forward ship and find a balance between enterprise survival and compliant production has become the proposition of the times under the fourth industrial revolution. For enterprises that have released or are about to release the base big model, ensuring data compliance will be one of the issues they must deal with.

Read on