laitimes

杠上Google I/O?OpenAI抢先一天直播,ChatGPT或将具备通话功能

author:HyperAI

This week's AI scene is destined to be buzzing.

On May 13, local time, OpenAI will release updates to ChatGPT and GPT-4 live. The next day, Google I/O arrived. Unlike the I/O conference, which is Google's annual event, OpenAI's temporary release this time is a bit of a heat grab. How will this long-entangled "old rival" make a move this time? Let's review the origins of the two sides and make a bold guess!

Multiple rounds of confrontation, fierce battles are in full swing

Since OpenAI's blockbuster, Google has been labeled as "hating iron but not steel", "slow start", and "catching up". One of the most worthwhile is the title of "AI Whampoa Military Academy", which seems to be a compliment, but it is actually Google's "bitter tears".

As we all know, ChatGPT, which laid the throne of OpenAI, is based on the Transformer architecture, which is a milestone architecture proposed by Google in the paper "Attention Is All You Need". In addition, a number of former Google bosses also appeared in the acknowledgments of the ChatGPT release interface, and then a number of key Google employees jumped to OpenAI...... What's even more interesting is that whenever Google tries to "Jedi Strikes Back", it is always accompanied by some hiccup.

In February 2023, in response to ChatGPT, Google proposed Bard, but it was revealed after its release, and there was a factual error in the presentation -

In response to the question "About the James Webb Space Telescope (JWST), can I tell a 9-year-old what it has discovered?" One of Bard's answers to this question is that the first photo of an extrasolar planet was taken by JWST. But Grant Tremblay, a researcher at the Harvard-Smithsonian Center for Astrophysics, points out that it was the Very Large Telescope (VLT) at the European Southern Observatory that took the first picture of an exoplanet in 2004.

At the I/O conference in May 2023, Google showcased Bard's product upgrades, such as support for more languages, recognition of image information, access to Google apps and some external apps, and more. At the same time, Google also released PaLM2, as a product that benchmarks GPT-4, and its math, coding, reasoning, and natural language generation have been improved.

The Google Health research team also created Med-PaLM 2 based on this, which has the ability to search for medical knowledge, decode medical terminology, and more. Not surprisingly, Google has integrated its AI capabilities into office scenarios such as copywriting and table making, and launched Google Workspace.

Subsequently, many netizens compared PaLM 2 with GPT-4 in various forms, and OpenAI is still ahead of the curve.

In December 2023, Google released its "largest and most capable" AI model, Gemini, and the demonstration effect is indeed amazing, and the high-end version can also compete with GPT-4 in terms of performance, but it was revealed that the demo video has been post-processed, and the effect is partially exaggerated.

On February 8, 2024, Google announced that it would officially rename Bard to Gemini, and its strongest model, Gemini Ultra, a chatbot powered by Gemini Ultra, was also officially opened, setting the same $20 as ChatGPT as the "monthly rent", which is quite a bit of a fight. The more important significance of this announcement is the unified integration of Google AI into Gemini - both the model name and the product name.

On February 16, 2024, a few days after the release of its most powerful Gemini 1.0 Ultra, Google launched Gemini 1.5 in one go. Among them, Gemini 1.5 Pro can support up to 1 million tokens for ultra-long contexts, crushing GPT-4 in the number of tokens, thus achieving excellent performance in audio, video processing and other tasks. Without Sora, Gemini 1.5 would have been a hot topic in the AI community for a long time.

Just a few hours after the release of Gemini 1.5, OpenAI launched the Wensheng video model Sora, which instantly stood at the center of the stage with unprecedented video generation capabilities, and the 1-minute demonstration video directly stole Gemini's topic.

In this round, there is no comparison in terms of technology, and it is obvious that the winner has been divided in terms of topicality, and OpenAI has further consolidated its position with the help of Sora.

OpenAI 又要截胡热度?

It is worth noting that on May 1, X netizen Jimmy Apples broke the news that OpenAI's search engine may be released on May 9, and this netizen had accurately predicted the release date of GPT-4. Subsequently, he said that the release date was postponed to May 13.

On May 8, Bloomberg also reported that OpenAI is developing a new search engine internally, bringing a new search experience through generative AI's Q&A. According to Bloomberg, one of the features of the search engine is that it can answer questions with written words and images. Bloomberg reports that OpenAI's search product is an extension of its flagship product, ChatGPT, enabling ChatGPT to get information, including citations, directly from the web. In a previous report, The Verge broke the news that OpenAI is poaching engineers in Google's search department to promote the rapid launch of its AI search product.

OpenAI's attack on the already stable search business this time is a bit of a "straight smash Huanglong"?

However, on May 11, OpenAI officially tweeted that the press conference on the 13th would only bring updates to ChatGPT and GPT-4, without mentioning "search engine". But May 13 is an interesting date, as Google has already announced that it will hold a Google I/O conference on May 14.

杠上Google I/O?OpenAI抢先一天直播,ChatGPT或将具备通话功能

Subsequently, Sam Altman went straight to the plate - not GPT-5, not a search engine, but we've been working on something new that we think people will love! It was like magic for me.

杠上Google I/O?OpenAI抢先一天直播,ChatGPT或将具备通话功能

After Sam Altman removed the two wrong answers, the big conjecture of netizens around "what exactly will OpenAI release" is still enthusiastic, and more clues have been exposed, including voice interaction.

According to The Information, OpenAI has shown its users a new model capable of both conversing and recognizing objects, which provides faster and more accurate image and audio understanding. According to The Verge, developer Ananay Arora said that ChatGPT may have a call function. Arora also found evidence that OpenAI provides servers for real-time audio and video communication.

杠上Google I/O?OpenAI抢先一天直播,ChatGPT或将具备通话功能

In addition, Hallid co-founder indigo posted a more detailed prediction on his Twitter (X) account, mentioning not only GPT-4.5, but also OpenAI's new AI Assistant assistant that will support full voice interaction.

杠上Google I/O?OpenAI抢先一天直播,ChatGPT或将具备通话功能

However, from a certain point of view, although Sam Altman denied the "search engine", he did not say that he would not add a "search buff" to ChatGPT. In fact, in recent times, netizens have unearthed a lot of evidence - OpenAI has entered the field of search.

First of all, former Mila researcher and MIT lecturer Lior S broke the news that OpenAI's latest SSL certificate logs show that search.chatgpt.com subdomains have been created.

杠上Google I/O?OpenAI抢先一天直播,ChatGPT或将具备通话功能

Accessing the domain name currently shows Not found, not 404 or the wrong domain

Some netizens in China received the grayscale test, and "Cyber Zen Heart" released the trial effect on its official account:

杠上Google I/O?OpenAI抢先一天直播,ChatGPT或将具备通话功能
杠上Google I/O?OpenAI抢先一天直播,ChatGPT或将具备通话功能

Source: Cyber Zen Heart

It can be seen that ChatGPT's answer is still very accurate, and "Cyber Zen Heart" indicates that the speed of the answer is also okay. However, in terms of obtaining real-time information, ChatGPT showed shortcomings, and Cyberzen searched for the price of Bitcoin and compared it with the price found in Google:

杠上Google I/O?OpenAI抢先一天直播,ChatGPT或将具备通话功能
杠上Google I/O?OpenAI抢先一天直播,ChatGPT或将具备通话功能

Source: Cyber Zen Heart

In addition, some netizens directly posted a demo on Twitter that claims to be OpenAI's official AI search page, but the interface is very different from the grayscale test:

杠上Google I/O?OpenAI抢先一天直播,ChatGPT或将具备通话功能

At present, it is still unknown whether OpenAI's search product will eventually be presented in the form of grayscale testing, and on the whole, it will not only face competitors from Google, but also Perplexity AI. In fact, in a sense, Perplexity AI is the product that OpenAI should directly benchmark in the search business.

Today, the self-proclaimed "world's first conversational search engine" AI tool is gaining traction, with investment from Huang, Bezos and other bigwigs, and is unique in its combination of ChatGPT-style Q&A and link lists from traditional search engines.

杠上Google I/O?OpenAI抢先一天直播,ChatGPT或将具备通话功能

In what form will OpenAI participate in the search engine market competition in the AI era? Let's look forward to whether ChatGPT's search function will be revealed at the May 13 press conference.

Google I/O 只能靠 Gemini 挑大梁了?

It is still unknown whether OpenAI's intentional or unintentional press conference will break out with a major update, but I believe that Google is bound to watch this live broadcast on time, if there is really a surprise, I wonder if Brother Pichai will be able to respond quickly and fight back in time at the Google I/O conference a day later?

In contrast, the annual Google I/O conference lacks some mystery, with a focus on mobile, web, ML/AI, and cloud previewed on its official page.

杠上Google I/O?OpenAI抢先一天直播,ChatGPT或将具备通话功能

As is customary, CEO Sundar Pichai will share the update of Android, the new generation of hardware products, Google's latest progress and achievements in the field of AI, and the integration of its AI capabilities with the entire Google ecosystem.

* Gemini 赋能谷歌全生态

There's no doubt that Gemini will be the highlight of this year's Google I/O conference. Gemini 1.5, which was only updated in February this year, has already stretched the context length to millions of levels, and it can already compete with GPT-4 in terms of performance. So the next step for Google is to consider how to integrate Gemini with its search, photo and video tools, Google Maps, and workspace tools for Gmail and Google Docs.

In addition, Google has gradually injected its AI capabilities into Google Assistant, but can Gemini's power create a more advanced, human-like natural language voice assistant?

It is worth noting that as an enterprise with both advanced large models and hardware business, what kind of sparks can Gemini and Google's own Pixel collide with? Last year, it was announced that an AI assistant called Pixie could be unveiled on the Pixel 9.

The Pixel 8, which was unveiled at Google I/O last year, is already equipped with Google's AI capabilities. It is equipped with Google's own Tensor G3 processor, which has features such as Audio Magic Eraser, Best Take, Translate and Read Web Pages, and more. For example, the Best Take feature can combine multiple group photos and select the expressions of different people from different pictures to create the perfect group photo.

As is customary, the Pixel 9 will be announced at this year's conference, but it has not been seen in the current leaks, but the Pixel 8a has been highly vocalized, and it remains to be seen whether the AI assistant Pixie will be unveiled.

In addition, in April this year, foreign media broke the news that Apple and Google are teaming up to integrate Gemini into the iOS system, and neither company has officially confirmed the news. I don't know if Brother Pichai will announce the relevant news at the Google I/O conference.

* Android 和 AR/XR

As the cornerstone of Google, Android will always be an integral part of Google I/O. This year, Android 15 has been demystified, the developer preview and initial beta have been released, and Pichai is bound to introduce further major updates to the system during his presentation. According to the previously disclosed information, it will also introduce the smart car based on Android Auto and the smart watch software Wear OS.

In addition, it has been revealed in the media that Pichai will share the news of Google's AR software and introduce its Android XR platform for Samsung and other headset manufacturers. According to reports earlier this year, the hardest hit area of Google's round of layoffs is the AR hardware team, so there is media speculation that it has given up on developing its own AR hardware and is fully committed to the OEM cooperation model, in other words, Google will focus on the software level.

In addition to Brother Pichai's keynote speech, this year's Google I/O Conference also has a number of thematic forums, such as the new dynamics of Google AI, the new dynamics of Android, the ML framework for the era of generative AI, etc., but no live broadcast is provided, relevant video materials will be released after the speech, HyperAI will continue to pay attention to it, and bring in-depth reports around AI, so stay tuned~

Write at the end

In the past, industry was an important measure of a country's strength, but now scientific and technological strength has also entered the negotiating table, and even become a bargaining chip in the great power game. Especially at a time when the popularity of large models is still high, every move of the Silicon Valley giants has attracted much attention. I still remember that at the end of 2022, OpenAI, Microsoft, Google, etc. always released blockbuster updates, and netizens shouted - when you wake up, the AI circle has changed again?

Entering 2024, the battle situation is still heating up, from the competition at the technical level, to the development of application scenarios, from the old powerhouses to the new unicorns, the companies that can continue to dominate the list must be the ones with moats. As for how the bigwigs at the top of the pyramid will fight, let's move the small bench together and watch the gods fight!

Resources:

1.https://36kr.com/p/2660898993824512

2.https://techcrunch.com/2024/05/09/google-i-o-2024-what-to-expect/

3.https://www.spiceworks.com/tech/tech-general/articles/google-io-2024-expectations

4.https://www.theverge.com/2024/5/11/24154307/openai-multimodal-digital-assistant-chatgpt-phone-calls

Read on