laitimes

Zhang Yiming's road, will OpenAI make sense?

author:Lunch box financial official
Zhang Yiming's road, will OpenAI make sense?

The mutual pinching between OpenAI and the New York Times is becoming more and more interesting.

On January 9, local time, OpenAI finally broke its silence for nearly two weeks and publicly published a long article in response to the accusations of the New York Times. On December 27 last year, the New York Times sued ChatGPT maker OpenAI and its partner Microsoft in the United States, accusing the latter of using millions of articles to train AI without permission.

OpenAI's response this time was not a boiled water PR rhetoric, but sharply pointed out that the New York Times did not tell the full story, and there was a suspicion of deliberately manipulating the results of ChatGPT's answers, and the entire lawsuit was groundless.

Zhang Yiming's road, will OpenAI make sense?

On one side is ChatGPT, which represents new technology, and on the other hand, the New York Times, which represents the old news organization, and the two are facing each other in court, which is an event destined to be written in the history of science and technology. Now OpenAI's response is quite "head iron", which adds fuel to the fire.

Looking back, whether it is radio, television, the Internet, or the emergence of new media, there will be a conflict with the interests of content copyright holders, especially journalism.

Exactly 10 years ago, in China, the rising Toutiao was also taken to court by Guangzhou Daily, and then a number of news organizations and portals followed suit, and there was a lot of momentum to attack it. The conflict between the two sides of the incident is the same as that between AI and the news media today.

The dispute finally subsided with Toutiao's vigorous purchase of copyrights, and "cooperation" was the path posed by Zhang Yiming. Two years later, Toutiao has covered more than 3,700 media outlets, and invests more than 1.5 billion yuan in copyright purchasers every year.

Coincidentally, OpenAI is still holding high the banner of "cooperation". In addition to his unceremonious response to The New York Times, he also emphasized the "exitable" principle and a strong willingness to work with news organizations.

Zhang Yiming's road, will OpenAI make sense?

But this time, the Times will only be more cautious — until now, social media like Facebook, search engines like Google and journalism have not come to an agreement, and journalism wants more from the platforms, which are reluctant to cooperate easily.

OpenAI threw out the "pie" of cooperation, and the New York Times may not eat it easily.

A

OpenAI and The New York Times, both clenched their fists.

Since the launch of ChatGPT at the end of 2022, OpenAI has suffered a lot of copyright lawsuits. In September last year, more than a dozen writers filed a lawsuit against OpenAI, and a few months later, in December, 11 more American writers sued OpenAI and Microsoft in federal court in Manhattan, New York.

But the New York Times's complaint carries a different weight after all. First, the New York Times itself is one of the most mainstream and largest established media outlets in the West, and second, the New York Times' prosecution is menacing.

Suing OpenAI, The New York Times submitted 22,000 pages of attachments and pleadings to the court in one go, including as many as 100 key pieces of evidence against ChatGPT's infringement, showing that ChatGPT's output is highly similar to that of The New York Times.

In a typical piece of evidence, the output of GPT-4 is on the left, and the original New York Times article is on the right, and the overlapping text is shown in red, similar to the "color palette" technique used every time a "hammer" plagiarism is made on the Chinese Internet.

Zhang Yiming's road, will OpenAI make sense?

According to the complaint, the New York Times article alone constitutes the largest single proprietary dataset used to train GPT in Common Crawl, a foundation that has archived nearly the entire network for 16 years. The New York Times demanded that OpenAI and Microsoft destroy the model and training data containing infringing material, without filing a specific amount of compensation, but said the defendants should be held liable for "billions of dollars in statutory and physical damages" related to the illegal copying and use of The New York Times' uniquely valuable work.

In addition, the New York Times also pointed out that due to AI "hallucinations", ChatGPT sometimes claims that some fake news and rumors are from the New York Times, causing damage to its reputation.

The New York Times came prepared, punched hard, and on the day of the appeal, it also published its own high-profile report, which caught OpenAI off guard. OpenAI later said that it had been communicating with OpenAI on copyright issues in December, but it was a slap in the face that the other party changed hands.

Zhang Yiming's road, will OpenAI make sense?

When it took another stand, OpenAI was not polite, and issued a long article, throwing out four key points: 1. OpenAI is willing to cooperate with news organizations and create new opportunities, 2. It is reasonable to use open Internet materials to train AI models, but OpenAI still provides an exit mechanism, 3. regurgitate facts is indeed a rare mistake, and OpenAI is working to reduce it to zero; 4. The New York Times The story is not told in its entirety, and its lawsuit is baseless.

The "rumination" mentioned here refers to the AI "spitting out" the training material as it is, as listed in the New York Times, and the AI's answers are almost exactly the same as the New York Times article. OpenAI's position is that the phenomenon of "rumination" does exist, but OpenAI has reduced its level to a very low level, and it is very suspicious that the New York Times has come up with hundreds of examples of "rumination" at once.

As a result, OpenAI is skeptical: "Interestingly, the rumination mentioned by The New York Times appears to have come from articles from many years ago that were heavily circulated on multiple third-party websites." They seem to be deliberately manipulating the prompts, often including lengthy excerpts from articles, in order to get our models to regurgitate. Even with such hints, our models often don't behave as the New York Times suggests, suggesting that they either instruct the model to ruminate or carefully pick examples from multiple attempts. ”

To sum it up: saying that my child is stealing? I think you are stuffing the child's hands and planting false goods, right?

In addition, there are two other points worth noting in OpenAI's response.

First, OpenAI emphasized the "opt-out" mechanism, noting that The New York Times had already adopted a rollout process back in August last year. In fact, many mainstream news outlets, including The New York Times, Reuters, and CNN, have blocked OpenAI's GPTBot web crawler since last year to restrict their continued access to these media content.

Second, OpenAI "kills people" and denies the importance of a New York Times media outlet in ChatGPT training: "Since the model learns from a huge collection of human knowledge, any one department (including the news) is only a small part of the overall training data, and any single data source (including the New York Times) is not important to the expected learning of the model." ”

"I'm not, I don't, don't talk nonsense" denial triple, put it on OpenAI just right.

B

Now that AI is the future and OpenAI is willing to cooperate, why is The New York Times going all the way?

"30% of AI comes from journalism. Let's stop making the same mistake and give everything for free again. "Our content is being stolen, and we have to say: not this time. The Media Innovation World Report 2023 reads.

"Don't make the same mistake", similar wording, was heard when OpenAI's CEO Sam Altman sat on the U.S. congressional bench. At that time, members of Congress expressed regret several times, saying that they could not repeat the mistakes of the social media era. In the age of social media, regulation has been far behind technology, and it has been 14 years since Facebook was launched when Zuckerberg first sat on the bench for the "Cambridge scandal" in 2018.

From a certain point of view, OpenAI is indeed standing on the shoulders of giants - with lessons from the past, ChatGPT became famous and immediately attracted vigilance from all sides.

Zhang Yiming's road, will OpenAI make sense?

The New York Times doesn't want to repeat the mistakes of the past. In the era when search engines and social media have become the entrance to traffic, traditional media have struggled to transform, and they have also reached "cooperation" with large technology platforms, but later they feel that this is not "worth it".

Facebook has been cooperating with traditional media for a long time, and the New York Times was also one of the first media to settle in, and the cooperation model at that time was profit sharing, and distribution was completed on Facebook's platform. But with Facebook and Google's parent companies receiving 60% of U.S. digital advertising revenue in 2018, media organizations are starting to feel like they're taking too much away and getting too little.

In 2019, the New York Times reported that the annual digital advertising revenue of the U.S. journalism industry was $5.1 billion, while the digital advertising revenue from Google's aggregator news service was $4.7 billion.

News publishers are striving for more benefits in many countries and regions. In 2020, the Australian government became the first country to require Facebook and Google to pay for news content. In 2023, Canada also passed the Online News Act, which was followed by an agreement between Google and the authorities agreeing to pay $74 million to Canadian news publishers. Meta, the maker of Facebook, refused to compromise and simply did not block news content in Canada. The U.S. Press Competition and Protection Act was also pushed in Congress, but it did not get a full vote.

Juan Cyno, founder of the Innovative Media Consulting Group, who wrote the Media Innovation World Report 2023, said bluntly in his speech: "We can't build our own business on someone else's platform, whether it's Facebook or Google, big tech companies don't care about our interests. "They have their own interests, so why should they be expected to take care of ours? Formalism prevails, but the income is too small. ”

You know, the New York Times itself is a role model for the rebirth of the print media at a time when it was in decline, and after the subprime mortgage crisis in 2008, it mortgaged its headquarters building to borrow money, and even tried to buy it in many ways. With a major digital transformation and the introduction of a paid subscription model, The New York Times eventually turned a profit. In 2022, more than 60 percent of The New York Times' revenue came from paid subscriptions.

Therefore, it is not difficult to understand where the New York Times's posture of "breaking the net" with OpenAI comes from: "cooperation" is easy to say, but how to cooperate to ensure that the original interests of the New York Times are not infringed upon and new business opportunities are not taken away? There are many question marks and few answers.

"Riding the New York Times' huge investment in reporting, you're free-riding the news industry. The resentment of the New York Times does not only come from the "fledgling" ChatGPT.

C

For OpenAI, this is destined to be an uphill battle.

In addition to the copyright battles that have erupted in many places, Europe has voted to pass the draft AI Act in June last year. Under the bill, vendors such as OpenAI are required to disclose a list of copyrighted data used in the process of training models.

Although the statement emphasized that the New York Times is "not important", the copyright content is still important for OpenAI's large-scale model training.

In a recent submission to the House of Lords Select Committee on Communications and Digital Affairs on a survey of large language models, OpenAI acknowledged that the development of AI tools like ChatGPT is inseparable from copyrighted material, and said that GPT would not have been born without these materials: "Since current copyright covers almost all forms of human expression, including blog posts, photos, forum posts, Software code snippets and government documents, without the use of copyrighted content, would not have been possible to train today's leading AI models. ”

Zhang Yiming's road, will OpenAI make sense?

While scolding the New York Times, OpenAI is also actively promoting "cooperation" with the journalism industry, and has achieved some results.

Shortly before The New York Times sued OpenAI in December, OpenAI partnered with German press and publishing giant Axel Springer. Springer is Europe's largest digital publishing company with well-known news brands including Business Insider and Le Monde.

The two parties signed a multi-year agreement for ChatGPT to provide users with a summary of the Springer news media report, including the original source and link, in the reply, to ensure that the news website gets traffic. At the same time, Springer's content will be used by OpenAI to train the model. Information quoted people familiar with the matter as saying that the deal is in the hundreds of billions of dollars.

This is already OpenAI's second major collaboration with news organizations, which in July reached a similar deal with the Associated Press for an undisclosed amount.

Competition will also further drive up the cost of newsgathering. In December, the media reported that Apple had reached agreements with a number of major publishers to collect its news content to train AI models. According to the report, Apple has approached NBC News, IAC and other institutions to propose a transaction of at least $50 million.

Just ticking the finger of "advertising share" has attracted mainstream media to rush in, and that "good era" belongs to social media and search engines. Today's OpenAI has to draw bigger and more fragrant pies.

Resources:

1. 36Kr: The New York Times: Rising from the Crisis to the Top of Global Media

2. iWeekly Weekend Pictorial: "To Save Journalism, Google Agrees to Pay Canadian News Publishers"

3. Tencent Technology: "Facebook Will Push News Hashtag and Plans to Spend Millions of Dollars to Buy Copyright from Media"

4. Jiemian News: "OpenAI Reaches a Partnership with Publishing Giants, Can This Deal Bring Evolution to Journalism?"

Beijing Daily: "Mobile App "Today's Headlines" Wantonly Grabs News and Traps in the Whirlpool of Infringement

Read on