laitimes

The new "Siri" battle has begun, Microsoft, Amazon, OpenAI have entered

The new "Siri" battle has begun, Microsoft, Amazon, OpenAI have entered

The new "Siri" battle has begun, Microsoft, Amazon, OpenAI have entered

The battle for a new generation of personal AI assistants has finally begun.

In September, the three companies released important teasers, marking the arrival of this node. The three companies are Amazon, Microsoft, and OpenAI.

Amazon announced at the fall hardware conference on the 21st that the "old" voice assistant Alexa is finally going to upgrade and integrate into the big language model. The new Alexa has lower latency, understands context, remembers previous conversations, eliminates the need to wake up back and forth, and becomes more personalized the more you use it.

Also on September 21, Microsoft held an autumn conference in New York, USA, announcing that it will gradually push a series of updates to Windows 11 users from the 26th, one of the important updates is the artificial intelligence assistant Copilot. Copilot, as its name suggests, will be the user's digital butler, appearing in the sidebar of Windows 11 and allowing users to voice conversations, allowing users to control settings on their PC, launch applications, or answer questions. Driving Copilot is OpenAI's most advanced large-language model, GPT-4.

The new "Siri" battle has begun, Microsoft, Amazon, OpenAI have entered

A few days later, on the 25th, OpenAI suddenly released a new announcement, "ChatGPT can now see, listen, and speak", announcing that it will push new multimodal new features to paid users in the next two weeks. Multimodal ChatGPT will be able to conduct real-time voice conversations based on images. For example, if you open the refrigerator and take a picture, you can chat with ChatGPT about what to eat at night. Combined with ChatGPT has landed on Android and iOS in the form of an app, compared to the omniscient "big understanding" AI, this move makes ChatGPT more capable of personal assistant.

In addition to the three officially announced related product upgrades, Google was exposed by technology media Axios last month, and internal emails showed that Google will use the latest big language model technology to transform Google Assistant (Google Assistant). Apple also broke the news that it has built a framework for a large language model, and is transforming functions such as maps and Siri.

Tech giants are fighting on the battlefield of personal assistants, and the last time there was such a scene was 8 years ago: Alexa was first released in 2014 and carried in Amazon's smart speaker Echo product. Also in that year, Microsoft also launched Cortana, a voice assistant with Windows, known as "Cortana" in Chinese mainland; Google launched Google Now, a voice assistant with Android 4.1, later Google Assistant Google Assistant. That is after Apple launched Siri with the iPhone 4S in 2011, followed by three voice assistant competitors from technology giants, kicking off the prelude to the previous generation of voice assistant melee war.

However, the voice assistant, which initially attracted great interest from the public, has become a must-have, but disgusted "artificial intellectual disability" for smart terminals in recent years. With the release of OpenAI ChatGPT at the end of November last year, the smooth conversation dwarfs the previous generation of personal assistants. Shortly before the release of ChatGPT, Alexa was reported to be losing money and laying off employees, and after the release of ChatGPT, Microsoft "killed" Cortana and took it offline in August this year.

The war of the previous generation of personal assistants has come to an end, and the war of the new generation of personal assistants has been played. With the blessing of big language model technology, will the story be different this time?

A

Let's start with the official example to see how Amazon's Alexa, Microsoft's Cortana, and OpenAI's multimodal ChatGPT exist.

Alexa, which is connected to a large language model, has less "robot" flavor. Users can say things like "Alexa, I'm cold" to get Alexa to turn on the air conditioner, or it can be abstract, like "make this room look like the colors of Team XX." Users can also say multiple commands at once, such as "Alexa, turn on the sprinkler, open the garage door, turn off the exterior lights," and Alexa will recognize and perform these tasks.

In addition, now users can say to Alexa, "Alexa, let's chat," and no longer need to call Alexa by name for the rest of the conversation. And Alexa will remember some of the user's information and preferences, so you don't have to introduce yourself to TA every time.

The new "Siri" battle has begun, Microsoft, Amazon, OpenAI have entered

In a demo ad, the user and Alexa enter chat mode. Users want to throw a party, ask Alexa to recommend a party theme, and then recommend the right venue based on the theme. After the user and Alexa discussed everything, the user asked "Send my friend an invitation email next Friday at 8 p.m., mysteriously", Alexa readily agreed, and read the email that began with "Are you ready for a memorable night?" to confirm the user.

Compared with the voice assistant experience of calling "Alexa" every sentence in the past, there can be no continuity before and after, and the instructions must be clear and unambiguous, the new Alexa is indeed "more human-like". I believe that every user who uses smart homes is fed up with trying to say how to make "Little X" and "elves" understand the instructions.

The already killed Microsoft Cortana, also known as Cortana, is basically similar to Siri on the iPhone. Open an app for you, play a song, or answer your question (basically showing you web search results beyond basic questions like today's date and weather).

Copilot, on the other hand, is more like a hands-on assistant. For example, you can ask Copilot to help you "tidy up your desktop", and it will tile the windows; When you browse the web, you can call Copilot to help you summarize, explain, and rewrite the content of the web page; Writing copywriting and summarizing diagrams is not a problem. What's more interesting is that Copilot also has the ability to process images, and you can directly let Copilot remove the background and cut out portraits after taking a screenshot.

The new "Siri" battle has begun, Microsoft, Amazon, OpenAI have entered

Like the upgraded Alexa, Copilot's conversation experience is more "human", you can directly ask Copilot to "play the song that makes me concentrate", and it will find the corresponding playlist on Spotify.

The new "Siri" battle has begun, Microsoft, Amazon, OpenAI have entered

As for OpenAI's ChatGPT, it has also become closer to life after moving to multimodality. As mentioned earlier, you can open the fridge to take a picture and discuss with ChatGPT what to eat for the evening.

In another official example, a user sent ChatGPT a photo of a bike and asked how it lowered the seat. ChatGPT reminds users to check the model of the car, confirm whether it has a quick release lever or bolted it, and gives detailed steps. The user still did not understand, took a picture of the seat junction and circled a certain part, and asked if this was a quick release bar. ChatGPT recognizes that this is a bolt and recommends using an Allen wrench. The user then took a picture of the toolbox and asked ChatGPT which one was the Allen Wrench, which it also successfully recognized.

The new "Siri" battle has begun, Microsoft, Amazon, OpenAI have entered

In addition to problem solving in daily life, ChatGPT, which can "speak" thanks to the voice function, can also tell children bedtime stories. What's more interesting is that when you quarrel with people, you can also put up ChatGPT that you can listen to and speak to help you sort out your thoughts and solve arguments.

B

However, whether it is the new Alexa, Copilot and multimodal ChatGPT, there are certain thresholds for use.

The Copilot built into the Windows 11 operating system currently has no news of charges, and should be free to open to users, but wait for the update to be pushed gradually. And on the office suite Microsoft 365, Copilot is a premium subscription feature that costs $30 per month.

OpenAI's multimodal ChatGPT is only available to premium subscribers, aka ChatGPT Plus, for $20 per month.

And the new Alexa may charge in the future. After Amazon's fall launch, Bloomberg interviewed outgoing vice president of Amazon's devices and services division, David Limp, who said Amazon was "absolutely" considering Alexa's subscription model.

Linpu declined to discuss how much Alexa would cost if it were charged, saying that "the Alexa you know and love today will remain free," but powering AI chatbots isn't cheap, acknowledging that "the cost of model inference in the cloud is enormous." ”

At the beginning of the war, thinking about how to charge, this seems to be a quick success, but in fact it is a blood and tears lesson from the war of personal assistants of the previous generation.

Personal assistant products have long existed, represented by Microsoft's Microsoft Bob in 1995, when Bill Gates was the CEO of Microsoft and first topped the Forbes list of the world's richest people. But now looking back, Bob looks bloated: The software shows a virtual room, like a personal office, with a cartoon dog assistant crouched in the corner, lovingly asking what help you need.

The new "Siri" battle has begun, Microsoft, Amazon, OpenAI have entered

This path did not go through, and in just one year, Microsoft replaced Bob with Clippy, the thick-eyed paper clip, eager to help you do this and that (but nothing well), unsuccessful, and the object of everyone's criticism and ridicule.

By 2011, everything had changed, Apple introduced the iPhone 4S, and Siri was "included" - a voice assistant without a physical image, which could long press the home button to call out, could help you open apps, answer questions, and even playfully tell you jokes, which was quite "futuristic" at the time.

Tech giants are taking notice. In 2014, voice assistants exploded, Google that occupied Android launched Google Now (upgraded to Google Assistant two years later), Microsoft that occupied the PC operating system launched Cortana, and Amazon simply developed smart speaker devices and stuffed Alexa into it.

The new "Siri" battle has begun, Microsoft, Amazon, OpenAI have entered

The movie "Her" was released, in which the hero fell in love with the voice assistant of the computer operating system, who was voiced by the famous Hollywood actor "Widow" Scarlett Johansson, and the interface of the voice assistant was very Cortana. After the movie was released, Wired magazine even reported that some iPhone users thought Siri seemed to be becoming "self-aware."

Before 2019, each company has successively equipped voice assistants on more ports, Google, Apple, Microsoft have launched smart speaker devices, Alexa and Cortana have also cooperated, you can call each other from their voice assistants to achieve more functions. In China, mobile phone intelligent voice assistants and smart speakers have also begun to emerge, and Xiaodu, Xiaoai, Tmall Genie, etc. have all appeared.

However, it was also at this stage that voice assistants became more and more popular, and the public's evaluation of them slowly changed from fantasy to disappointment. In China, these voice assistants like to mention the title of "artificial intellectual disability". People post online asking "Cortana pops up by itself, how do you turn it off," or share short videos of smart speakers not understanding the simple command to "turn off the lights in the living room."

Taking Siri as an example, data from Vetro Analytics in 2018 showed that Siri usage was 19.6%, dependence was only 11%, and the average monthly usage time per user was only 14 minutes.

C

Poor experience directly affects the business prospects of voice assistant products.

Just before the launch of ChatGPT, in November 2022, media outlet Business Insider reported that according to internal data obtained, Amazon's Worldwide Digital division suffered an operating loss of more than $3 billion in the first quarter. This segment includes everything from Echo smart speakers, Alexa voice technology, and streaming services. People familiar with the unit said the loss was the largest of any of Amazon's business units, and Alexa was to blame for most of the loss. The report estimates that the division will lose even $10 billion in 2022.

Alexa didn't get off to a bad start, with the first generation of Echo devices selling more than 5 million units. But its business model has always been a problem, and all voice assistants actually have this problem: provide a service, and then what? In 2018, Amazon expects to lose $5 per device in 2021.

The previous generation of voice assistants did not have good monetization measures, coupled with the interactive experience is not good, Alexa hopes to integrate Amazon's e-commerce services, but the bad experience cannot support this vision, and the frequent asking users if they want to buy something itself is damaging the experience. Most of the conversations between users and voice assistants end up being trivial and mundane, such as today's weather, date, opening an app, etc., and this does not make Amazon money.

After the news of layoffs and losses, Amazon said that it would still vigorously develop Alexa, but the outside world did not see any new possibilities.

The new "Siri" battle has begun, Microsoft, Amazon, OpenAI have entered

The advent of ChatGPT made all the difference.

On the one hand, there is a crisis, ChatGPT's excellent dialogue ability based on the large language model allows the public to compare it with the previous generation of voice assistants almost immediately, and the conclusion is "too bad". The problem changes, the previous generation of voice assistants do not want to forge ahead, do not get on the big model, may fall behind and be beaten, even Apple and Google have to fear this possibility.

In addition to Google's rumors in August this year that it will upgrade Google Assistant with a large language model, Apple has similar rumors. In July this year, Mark Gurman broke the news that Apple has completed the basic framework of its large language model, called "Ajax", positioned to support conversational AI systems, and has applied it to maps, Siri and other functions to make artificial intelligence improvements.

Before the tech giants moved, there were a variety of third-party apps for personal assistants that integrated large models. In June of this year, someone "resurrected" Microsoft Bob, using the GPT-3.5 model, which was released on the Microsoft Store.

On the other hand, there are opportunities. The intelligent ecological imagination represented by Amazon Alexa is now supported by 5G and large models, and the possibility of realization is unprecedented.

At this autumn conference, when Amazon announced that Alexa will be fully upgraded, it has also outlined a broad prospect: Amazon injected large language models into more than 200 smart home APIs to provide Alexa with the background information needed to more actively and seamlessly manage smart homes. At the same time, Amazon wants to launch tools that allow Alexa to control certain features of third-party products that are not in the toolkit of the smart home ecosystem. Amazon said it has cooperated with General Electric, Philips, Xiaomi, iRobot and others to develop these features.

But now there are new challenges that were not faced 10 years ago, and the threat of AI to personal privacy and data security is being taken seriously. Whether it's the new Alexa, Copilot, or multimodal ChatGPT, it will inevitably touch on this issue when moving in the direction of personalized assistants. At present, the new Alexa is released, and the news that Amazon will use the user's voice interaction with Alexa to train the model is already being discussed.

"Whoever wins the personal representation is a big deal. Because you're never going to search or productivity again, you're never going to Amazon again. At an event in May, Bill Gates predicted.

The question is, who will win this war? Or like the battle of the previous generation of voice assistants, there is no winner after the vigor.

Resources:

1. IT House: "Big Eye Clip" Reborn, Transformed into ChatGPT Chatbot

2, wisdom: "Last night Microsoft and Amazon "married" and threw 5 AI killer apps"

3. Finance: "From "hope for the whole village" to "big layoffs", how did Amazon destroy Alexa? 》

4, titanium media: "The second half of the smartphone battle: the battle of voice assistants"

Read on