Sue OpenAI

Chen Yongwei/Text

Recently, the California-based Clarkson Law Firm filed a 157-page indictment in the Northern District Circuit Court against the popular artificial intelligence company OpenAI and its partner Microsoft.

In the indictment, the plaintiffs allege that OpenAI and Microsoft illegally collected, used, and shared the personal information, including children's information, of hundreds of millions of Internet users while developing, marketing and operating their AI products. The plaintiff argued that the defendant's activities violated a number of laws. Accordingly, the plaintiff requested the court to issue an injunction against the defendant and to compensate the defendant for its losses.

Because this lawsuit is the first influential lawsuit ChatGPT has faced since its popularity, it has been called "ChatG-PT's first case" in many news reports. However, strictly speaking, the title of this "first case" may not be appropriate. On the one hand, almost at the same time that Clarkson Law Firm filed the lawsuit, writers Paul Trem-blay and Mona Awad also filed a copyright lawsuit against Ope-nAI in federal court in San Francisco. However, the lawsuit filed by Clarkson Law Firm was more exposed in the media (perhaps thanks to the firm's litigation strategy), so its impact was greater. On the other hand, the lawsuit is not limited to ChatGPT, but includes ChatGPT, Dall-E, Codex and many other OpenAI products. Based on this, it is more appropriate to call this lawsuit "OpenAI first case" than to call it "ChatGPT first case".

AI: The best and worst inventions

In October 2016, renowned physicist Stephen Hawking said in a lecture: "The successful creation of AI may be the biggest event in the history of our civilization." But it could also be the last, unless we learn how to avoid risks. In his view, "the rise of powerful artificial intelligence may be the best thing that has ever happened to mankind, but it may also be the worst thing that has ever happened." ”

At the beginning of Clarkson Lawyers' indictment in court, an introduction to the basic circumstances of the class action was written, which quoted Hawking's famous remarks. According to the lawyers who represented the case, with the success of products such as ChatGPT, a fierce AI arms race is unfolding among large technology companies. While this has greatly promoted the progress of AI technology, it has also forced people to seriously consider Hawking's prediction: whether people should choose a safer, more prosperous and sustainable AI development path, or choose an AI development path that leads to destruction.

The indictment notes that the defendants' products and their technologies undoubtedly have a lot of potential for good, but unfortunately they have great power without realizing the destructive power it contains.

The plaintiffs' lawyers quoted a public statement by the head of OpenAI's security department. This statement shows that OpenAI has long recognized that its AI product is "a fairly immature technology" and that it would be reckless to aggressively deploy AI models without adequate security precautions. But obviously, OpenAI's above understanding has not slowed down the development and deployment of AI. In the view of the plaintiff's lawyer, it is this kind of ignorance and indulgence of risk that has caused the infringement of people's rights to privacy, property rights and other rights.

The plaintant's lawyer pointed out that the defendant's behavior at the expense of others and the public interest for the sake of financial gain was illegal. Accordingly, they called on the court to demand that defendants immediately cease these actions and let defendants ensure that their future products are Transparen-cy, Accountability, and Controlling.

A review of the development of AI in the United States

After concluding the "introduction," the indictment provides a review of the development of AI in the United States — however, although titled "AI Development in the United States," the focus is entirely on the two defendants, OpenAI and Microsoft. Specifically, this review highlights four basic facts:

(1) OpenAI's transformation from a nonprofit to a for-profit company

Originally, OpenAI was founded as a nonprofit research institute that claimed to make it its mission to safely and responsibly advance humanity. However, since 2019, OpenAI's strategy has undergone a 180-degree shift, shifting from an open nonprofit organization to a for-profit corporate structure and partnering with outside investors, most notably Microsoft.

Commercially, this transformation of OpenAI has been very successful. In just a few years, it went from an obscure AI research institute to a $29 billion company. However, plaintiffs' lawyers point out that this shift also poses a number of problems. Many people worry that OpenAI is therefore putting short-term financial interests ahead of human interests, especially if they suddenly commercialize products widely when they are known to be risky, which could raise multiple moral, safety and ethical issues.

(2) The development of ChatGPT relies on secretly scraping network data

The development and training of large language models relies heavily on personal data, especially conversational data between people. The indictment points out that in order to be able to train large-language models such as ChatGPT at a relatively low cost, OpenAI sat on the table of mature data exchange markets and chose "theft", that is, the method of crawling data on the Internet in secret. Over the years, it scraped about 300 billion words of online text, including books, articles, and online posts. In addition to this, it secretly crawls a large amount of personal data, including personal information, chat history, online customer service interactions, social media conversations, and pictures scraped from the internet, among others.

(3) ChatGPT is trained on the user's app

The indictment notes that initially, ChatGPT used users to help train models without their consent. When a user chats with ChatGPT, all his behavior and information, including clicks, inputs, questions, uses, moves, keystrokes, searches, and geolocation, is secretly collected by Ope-nAI and used for model training.

It should also be noted that OpenAI does not fully disclose the storage status of the user information it collects. Because these data may contain sensitive information from users, they are at high risk of leakage without proper protection.

(4) Microsoft's economic dependence model for promoting OpenAI

The indictment alleges that Microsoft, as the most important collaborator of OpenAI, played a crucial role in promoting OpenAI products, but at the same time, it also greatly accelerated the spread of potential risks. Although the latest version of GPT, GPT-4, was released not long ago, Microsoft has actively integrated it into core products in everything from academia to healthcare. Such integrations have skyrocketed the number of users reached by OpenAI's products, while also greatly amplifying the risks. However, instead of paying enough attention to this risk, Microsoft fired the team responsible for ensuring ethical AI principles. And when other AI developers saw the "success" of OpenAI and Microsoft, they followed suit. In this context, the associated risks have reached unprecedented heights.

The main risk of AI

After reviewing "AI development in the United States," the indictment further lists the most significant risks that exist under the current circumstances. These risks include:

(1) Acts of large-scale invasion of privacy

The defendants' large-scale collection and tracking of users' personal information poses a huge threat to users' privacy and security. This information may be used for malicious purposes such as identity theft, financial fraud, extortion, etc.

In particular, OpenAI does not respect users' "righttobeforgotten" right to be forgotten, that is, the right of users to delete their personal data. While OpenAI ostensibly allows users to request the deletion of their own related data, in fact, this deletion option may be fake. Some companies ban or restrict the use of ChatGPT because they are concerned that all content uploaded to AI platforms like OpenAI's Chat-GPT or Google's Bard will be stored on the servers of those companies, making that information inaccessible or deleted.

(2) AI-induced disinformation, targeted attacks, sex crimes, and bias

The indictment alleges that the defendants' products, including ChatGPT, have serious product defects, that is, they generate various false information. A classic example is a rumor fabricated by ChatGPT about sexual harassment by Jonathan Turley, a law professor at George Washington University. Eugene Volokh, a law professor at the University of California, Los Angeles, conducted a test to study the jurisprudence of AI-generated content: He asked ChatGPT to generate a list of "legal scholars who have sexually harassed others." To ensure that the generated content is authentic, he also specifically asked ChatGPT to attribute the generated content to the source. When Warlock read the list, he found Terly's name prominently listed. According to ChatGPT, Telly made sexually suggestive remarks on a class trip to Alaska and tried to molest a student. Warlock was shocked by this, after all, as a famous professor, Telly is also a well-known figure in the circle, and as a colleague, he has never heard of the "big melon" in this circle. So he immediately confirmed the news. It turned out that the matter was completely false, and Telly had not attended any class trips at all, let alone engaged in any sexual harassment activities. After Warlock announced the news to the media, Terly, who "sat in the house and the 'pot' came from the sky", learned that he was described by ChatGPT as a sexual harasser. He was so upset that in an interview, he said: "It's chilling! Such trumped-up accusations are very harmful. ”

The indictment also states that in addition to spreading misinformation, the defendant's products may also be used by criminals in criminal activities such as harassment, extortion, extortion, coercion, fraud, etc. For example, a new form of "sexual harassment" is now emerging, with private photos and videos obtained through social media to create deepfake content with pornographic content. The public dissemination of these photos online has caused serious emotional and psychological damage to the victims.

Of particular importance here is the fact that the defendant's products were also used in child pornography. For example, there are pedophiles who use Dall-E to create a large number of pictures and videos of children's sex acts at a very low cost and spread them on the dark web. These actions have had quite serious consequences.

In addition, the indictment states that the defendants' products, such as ChatGPT, also facilitated the spread of hatred and prejudice. That's because language models are trained on realistic corpus, which contains a lot of content that involves hatred and prejudice. When training the model, the defendant did not pay attention to exclude this information, which led to the flaws of the model itself.

(3) Help build super malware

The indictment states that the defendants' products also provide strong support for the creation of malware. Malware refers to computer programs designed to damage or infiltrate a computer system. Over the past decade, malware has become increasingly sophisticated and difficult to detect.

The defendants' products can generate virtually undetectable malware at a fraction of the cost and can be used at scale, posing an unprecedented risk to cybersecurity worldwide. Although OpenAI claims to have security safeguards in place to prohibit the generation of polymorphic malware, malware developers can actually bypass these filters with clever inputs. Accordingly, the plaintiff's counsel argued that it was a gross negligence on the part of the defendant to hand over this enhanced destructive capacity to the public, but without the necessary safety precautions.

(4) Autonomous weapons

The so-called Autonomous Weapons, also known as "Slaughterbots," "lethalautonomous weaponssystems," or "killerrobots," use AI to identify, select, and target humans without intervention. This poses a serious threat to international security and human rights.

The indictment notes that this risk of unregulated AI is no longer out of reach, but is becoming a real risk, such as the near assassination of a foreign head of state (note: the indictment does not describe the incident). The author suspects that it refers to a drone attack by Venezuelan President Nicolas Maduro during his speech. And to build and use such a murderous weapon, the cost and difficulty are very low.

Experts warn that advances in similar technologies will accelerate the development of autonomous weapons, while the large-scale commercialization of these products will accelerate the spread and spread of risks due to the lack of adequate moral and ethical norms while improving AI capabilities.

Defendant's violation of the plaintiff's property rights and privacy

After listing the significant risks that the defendant's products may pose, the indictment focuses on the defendant's violation of privacy and property rights.

(1) The defendant's crawled data should be considered theft

The indictment argues that the defendants' covert mass scraping of the Internet without consent is essentially an act of theft and misappropriation.

To illustrate the nature of the defendant's actions, plaintiffs' lawyers drew an analogy to the ClearviewAI incident in 2020. ClearviewAI, a facial recognition company, has scraped billions of publicly available photos from various websites and social media platforms without user consent to develop its products. After his actions were made public by The New York Times, it immediately caused public unease. In March 2020, the American Civil Liberties Union of Illinois, as well as prosecutors in Vermont, filed a lawsuit against ClearviewAI at almost the same time. Regulators in the United Kingdom, Italy, Australia and other countries have also launched investigations into ClearviewAI and have successively imposed fines on it.

Plaintiffs' lawyers argue that OpenAI's illegal data collection is similar in nature to ClearviewAI, and should therefore be considered illegal.

(2) The defendant's conduct infringed on the plaintiff's property rights and interests

The indictment points out that in past cases, courts have established the principle that Internet users have property interests in their personal information and data, so OpenAI's illegal data scraping behavior constitutes an infringement of the plaintiff's property rights and interests in the first place. In a data marketplace, an Internet user's information is worth between $15 and $40, or more. There are also surveys that show that a person's online identity can be sold on the dark web for $1200. At this valuation, the value of property illegally infringed by OpenAI would be staggering.

(3) The defendant's conduct infringed on the plaintiff's privacy rights and interests

In addition to property rights, Internet users have privacy rights to personal information, even if it has been posted on the Internet. Therefore, the defendant's illegal crawling behavior also constituted an infringement of the plaintiff's privacy rights and interests.

The indictment states that aggregating and analyzing the data can reveal information that individuals do not want to be made public. For example, through an individual's public tweets, his mental health can be analyzed. Therefore, even a small amount of "public" private information is enough to harm the privacy rights of Internet users. In addition, the indictment notes that users often expect that content will not be seen by too many people when they post online, and that its impact will fade over time. But the defendant's actions shattered this expectation of the user, thereby infringing on their interests.

(4) the defendant's business conduct offended reasonable people and ignored the warnings of regulatory authorities

The indictment states that the public now has fear and anxiety about how defendants use and potentially misuse their personal information. People fear that their personal information will be forever embedded in defendants' products, being repeatedly accessed, shared, and misused.

In addition, the indictment notes that regulators have now warned against similar illegal practices, such as the Federal Trade Commission's mention in a case against Amazon: "Machine learning is not an excuse to break the law... The data used to improve the algorithm must be legally collected and retained. Companies would do well to learn this lesson. "However, the defendant apparently did not give sufficient weight to this warning.

(5) The defendant steals user data beyond reasonable consent

In addition to directly scraping information on the Internet, the defendant also collected data generated by users in the process of using products such as ChatGPT. In the indictment, it is referred to as theft of the second category. Specifically, this manifests itself in two ways: on the one hand, for consumers who use ChatGPT plugins or APIs, the websites do not provide any informed consent information, and the consumer's information and personal data are illegally collected in this case and used to train the defendant's large model. On the other hand, even those who signed up for OpenAI accounts and interacted directly with ChatGPT were not informed before their data was collected.

In addition to this, the defendants also informed that users can request that their private information not be used, but in reality, they cannot delete the collected data from the knowledge base of the language model. At the same time, the defendant was unable to provide users with information on the use of data, thus seriously violating the principle of transparency.

Defendant's violation of children's rights

After describing the property rights and privacy violations caused by the defendant to the plaintiffs, the indictment also highlights the harm it poses to the privacy and risks of children. Specifically, this includes the following aspects:

One is the deceptive tracking of children without consent. The indictment alleges that the defendants illegally collected a large amount of sensitive information about children, including identity, location, interests and relationships.

Second, OpenAI clearly states in the terms of service and privacy policy that the use of ChatGPT is for individuals aged thirteen and above, but in reality, the platform does not set up a verification mechanism, and underage users can easily obtain qualifications by falsely declaring their age. This omission by the defendant will expose these underage users to harmful information.

Third, the defendant deprived the economic value of child users. The indictment states that children are more likely than adults to be induced to sell their own and other people's information, which allows defendants to obtain higher-value data from children and use it for profit.

Fourth, the defendant violated reasonable privacy expectations and was offensive. The indictment states that the right of parents to maintain and custody of their children is a fundamental right to freedom. Therefore, the defendant's problems with children's privacy are actually violations of parents' reasonable expectations for privacy protection, which is not only illegal, but also seriously impacts social norms and morality.

Related allegations and legal remedies

Based on the above relevant facts, the plaintiff's lawyers argued that the defendants OpenAI and Microsoft allegedly violated a number of laws, including the Electronic Communications Privacy Act and the Computer Fraud and Abuse Act, and filed fifteen charges against them.

At the same time, the plaintiff filed its own legal remedy with the court. The package includes requesting the court to issue a restraining order for the defendant to temporarily freeze commercial access and commercial exploitation of the product until a series of rectifications are completed and the court's requirements are met. At the same time, the indictment also requires the defendant to pay compensation to the plaintiff—including actual damages determined by the trial, three times punitive damages, and exemplary damages as permitted by law. Although the approximate amount of damages is not given in the indictment, if the relevant charges are upheld by the court, then this amount should be a large number.

The prospects and significance of litigation

Realistically, while the allegations in the indictment are serious, it is not so easy for the plaintiffs to succeed in bringing down OpenAI and Microsoft. After all, Microsoft, as a defendant, has a strong legal team and strong financial support. Even if it does succeed, it will probably have to go through a protracted litigation. In fact, according to convention, the most likely way to end this case is for the original defendants to reach a settlement, OpenAI and Microsoft accept part of the plaintiff's claims, and give a small amount of compensation. That is, despite the thunder of the case, the final result may be just a few little rains.

However, although the tedious ending is likely to be doomed, at this moment, the case itself is still very meaningful. Since last year, generative AI has exploded. Unlike the past technology boom, it is not traditional giants such as Google and Facebook that are leading this outbreak, but a small startup such as OpenAI. With strong technology and an inspirational company image, it's easy to forget the risks behind the products it's promoting. With the rapid popularization of generative AI models, the associated risks have become more and more difficult to ignore, so at this point in time, it is very valuable to clarify the problem through such a lawsuit and let more people realize the risks behind the development of AI.

Admittedly, most of the issues raised by the plaintiff's lawyer in the indictment existed, but I personally believe that there are debatable areas for the legal remedies he proposed.

At this stage, various types of AI models have been widely used, and it is unrealistic to stop using these models immediately, as the plaintiff demanded. In contrast, a more prudent approach might be to gradually strengthen governance as it develops.

In fact, many of the problems given in the indictment can be solved by technical means, such as strengthening verification during the registration process, which can solve the problem of minors falsifying their ages; Through federated learning and other methods, the privacy leakage caused by data collection can be effectively alleviated; With the help of technologies such as blockchain, the flow of data can be tracked. I think it might be a better solution if OpenAI and Microsoft, after making huge revenues through AI, put a portion of their profits into technology to overcome previous problems and achieve a better balance between development and governance for their AI models.

Sue OpenAI

Read on

OpenAI and Microsoft are not true friends, and the relationship between the two companies is also a friend and foe

"100 Model Wars" Annual Exam|Competing for AI Applications: OpenAI, Google's "Competition" Intelligent Assistant, Kimi, and Secret Tower "Out of the Circle" One After Another

Google vs. OpenAI

OpenAI has made a major update and released the GPT-4o multi-modal large model

"Burned" $7,000 billion, OpenAI and Nvidia and TSMC are enemies

OpenAI Sam Altman: The new speech model GPT-4o has not yet been released

Zhou Hongyi suggested that Google open source all products! Open source has a chance to win with OpenAI, if it only relies on closed source, there will not be such a rapid development [with a comprehensive comparison of leading companies in China's AI framework industry]

The AI explosion week that started with OpenAI is finally Tencent's turn to show its musclesAI players

Report: OpenAI has disbanded its AI risk team led by former chief scientist Ilya

Outburst! OpenAI is facing personnel turmoil again! The head of security has resigned, and the "super-smart alignment team" has been disbanded

The press conference was tragic, and Ultraman posted an article satirizing Google! Google's crazy restructuring to take on OpenAI

It was revealed that the OpenAI super alignment team was disbanded!

70B模型秒出1000token，代码重写超越GPT4o，来自OpenAI参投团队

OpenAI's Super Alignment Team Disbanded Insiders Revealed: Trust in Ultraman Collapsed

Google released a new upgraded large model to face off against OpenAI; Meizu released the new Flyme AIOS system

changes in the senior management of pharmaceutical companies Novartis and GSK in China; OpenAI's Chief Scientist Leaves | Executive Updates: May 5-17, 2024