laitimes

Why are your smart speakers so stupid?

Why are your smart speakers so stupid?

Image source @ Visual China

Wen | Hedgehog Commune, author| Chen Meixi, editor | Director

As a carrier of intelligent speech, whether it can obtain more space, or gradually be dissolved, becoming the first form of intelligent speech, it still needs time to test.

At this moment, don't look up and ask the little love classmate at home, Xiaodu Xiaodu or Tmall Elf a question, will it tell you the answer? Not necessarily, it depends on whether your problem is complex, whether the grammar is standard, whether it understands keywords, or whether it has copyright to the song.

So you might hear a short correct answer, a long eerie long speech from a browser, a song you've never heard before, or a crisp apology. "Oh hey, I don't know about that." A few years ago, domestic Internet giants competed to enter the smart speaker market, as if whoever occupied this highland could get a ferry ticket to the era of intelligent voice.

After a fight, Ali, Baidu and Xiaomi stayed in the decider, but the smart speaker products ushered in the platform period after several years of rapid growth. Occupy the market, and then what? This is the question that smart speaker giants need to answer.

The smart speaker stood on three legs, and Tencent exited the stage early

When smart speakers first appeared in the market, they carried a lot of expectations about the future. ——A new generation of interactive terminals, traffic entrances that go deep into the home space, and the openers of environmental intelligence... For the artificial intelligence technology that has been booming for many years, the smart speaker is an excellent experimental product born in the ERA of AI. Not too high a price, let it quickly occupy the living room and bedside of ordinary people, and even become an invisible "family member".

In 2014, Amazon released its first smart speaker, the Echo, with shipments exceeding one million units in two weeks. James Flajos, a well-known American technology journalist, once quoted Apple's reaction to this incident: "First arrogantly contemptuous, and then fell into panic." Subsequently, a number of Internet technology companies went down, the smart speaker market fell into a scuffle, and finally occupied the head share of Amazon and Google, followed by Apple, and Facebook was still struggling to catch up.

As the story has been repeated many times, the war of smart speaker products has also burned into the Chinese market in the following years. In July 2017, Xiaomi released smart speaker products at the ecological conference. Today, five years later, the domestic smart speaker market presents a three-legged situation of Ali, Baidu and Xiaomi, and the market share occupied by the three companies has exceeded 95%, and there is no obvious victory or defeat between them.

People are more accustomed to using the word awakening to refer to these products, these small love classmates, small degrees and Tmall elves that are active in the family life space. Coincidentally, the domestic smart speaker market pattern can almost form a set of tacit contrasts with the foreign market. A company that started out as an e-commerce company, a company based on search and browsers, and a mobile phone maker are taking their place; and an Internet giant full of social genes is being thrown out of the way.

Xiaomi's advantage lies in the accumulation of years in the hardware field, and in the three-legged competition, it is the one that has gone the farthest on the road of self-built intelligent hardware ecology. In the entire ecology, the speaker is the controller, which integrates a wide range of home appliances such as Xiaomi air conditioners, ovens, sweeping robots, washing machines, desk lamps, refrigerators, etc., into the same home intelligent system.

Strictly speaking, Xiaomi's action of developing smart homes predates the birth of Xiao Ai's classmates. In September 2013, Xiaomi released its smart TV products, and in November of the same year, the Xiaomi smart router was launched. Until this point, smart speaker products were still in the pipeline, and Amazon's engineers had yet to select "Alexa" from more than 50 alternative wake-up words.

Four years later, Xiao Ai was "born", and the success of the smart speaker market accelerated Xiaomi's ambition and pace of expanding the hardware territory. Xiao Ai classmates and other smart home products have a linkage effect and increase the number of choices for users.

In October 2021, Xiaomi Chairman Lei Jun announced on Weibo that Xiaomi Home stores exceeded 10,000. According to the financial report information, in the first three quarters of 2021, Xiaomi's IoT business revenue accounted for 25%. Casually walk into the Xiaomi house in a shopping mall, the mobile phone counter occupies almost only two or three rows, and the oven, air purifier, and TV are placed in a conspicuous position, making people mistakenly think that they are in a miniature electrical city. Alibaba's strengths lie in channels and supply chains.

Compared with Xiaomi's self-built ecology, Ali relies on a strong external smart device cooperation network to connect Tmall Genie to more non-Alibaba smart devices. In 2021, Tmall Genie officially announced more than 1,000 access brands and more than 5,000 smart products.

The channel advantages of e-commerce platforms have greatly helped in the process of marketing promotion. In August 2017, the first Tmall Genie speaker was released, and during the Double 11 period of that year, the sales volume of Tmall Genie exceeded 1 million units, thus completing the original accumulation of users. According to a report by big data service provider Aowei Cloud Network, the annual shipment of smart speakers in the domestic market in 2017 was only 1.76 million units, according to this data, the number of Tmall Genie sold on Double Eleven accounted for 56.8% of the total shipments of that year.

From the product level, among the three companies, Baidu is the latest to start.

Why are your smart speakers so stupid?

At this time, Xiaomi and Ali have been running for a long time. According to data released by market research institute Canalys, in the second quarter of 2018, Tmall Genie and Xiaoai classmates have ranked third and fourth in the global smart speaker shipment list, only behind Google Home and Amazon Echo.

The second card is the ability to exchange information and content based on search and knowledge systems. For Baidu, the massive user behavior signals and algorithm capabilities used in the search model can become the basis for the optimization of the xiaodu speaker algorithm; and the text content under the precipitation of the encyclopedia, Baijia number and other systems can become a massive recall pool when answering questions with a small degree.

After the three companies each occupied favorable terrain, the space left for Tencent was very narrow. Tencent released a smart speaker called Tencent Listening in April 2018, which cost 699 yuan, much higher than its competitors. In the absence of a first-mover advantage, the price is not dominant and there is no special advantage to make this product uncompetitive, less than a year after the listing, Tencent internally suspended the project. In overseas markets, Facebook has not given up. In September 2021, they launched a portable smart screen speaker portal go, featuring video calling capabilities and portability. In the domestic market, whether Tencent will return to the war with new products in the future is still unknown.

But in front of the smart speaker giants is another important problem. According to the RUNTO "Monthly Tracking of China's Smart Speaker Retail Market" report, the sales volume of China's smart speaker market in 2021 was 36.54 million units, down 3.5% year-on-year. The market decline did not improve at the beginning of 2022, and the sales volume of China's smart speaker market in January 2022 was 3.05 million units, down 19.4% year-on-year and 3.5% month-on-month. Like many C-end products, smart speakers seem to have touched their own growth ceiling.

The triple hurdle behind the growth platform period

The challenges faced by smart speaker products in the growth platform period come from many aspects. First, resistance comes from a technical problem that has not yet been overcome. Yu Yang, an engineer engaged in embedded development, believes that the most critical technologies for smart speakers are speech recognition and NLP (natural language processing). "If you want to pursue sound quality, the sound itself is also quite critical." In the process of speech interaction, the information entered by the user is first recognized by the machine, and then the recognized text is processed and understood in order to output a new speech or complete the instructions expressed by the user.

But in the Chinese market, dialects are a problem that speech recognition cannot get around. There are many dialect areas in the south, and Wu, Cantonese, Min, Hakka and other dialects are spoken by tens of millions of people. In the dialect group, many middle-aged and elderly people who are stationed in their hometowns have difficulty in using standard Mandarin and smart speakers to talk. For middle-aged and elderly people who are not good at Mandarin in the dialect area, smart speakers are not currently available.

Why are your smart speakers so stupid?

Smart speakers began a difficult attempt at dialect interaction Image source @ Xiao Ai Classmate official website

For example, there are seven different dialect areas in the same province in Fujian, and it is also difficult for local residents in different dialect areas to communicate in dialects. Numerous complex factors have led to no breakthrough in speech recognition and speech interaction technology in dialects.

In the Mandarin environment, speech recognition is less difficult, but it is still a big problem to accurately understand the pauses, meanings and even emotions of sentences, and then make appropriate responses and replies. At the end of 2018, Lei Jun tried to show the real strength of Xiao Ai to the outside world at the Xiaomi AIoT development conference, and after asking "what are the three woods", Xiao Ai suddenly sang a song: "You are electricity, you are light, you are the only myth." The tipping of smart speakers does not matter the occasion, whether you are the boss or an ordinary user, it is absolutely the same.

The technology that is not yet mature makes the image of smart speakers closer to stupid and cute than smart, and more users tend to let the little love classmates or Tmall elves at home complete mechanical instructions. These mechanical instructions include singing, turning on the lights, timing, etc., because the instructions are clear and often only one or two words, and smart speakers rarely make mistakes when completing these instructions.

Yu Yang said that the only effect of Xiao Ai's classmates on him is that he can turn on and off the lights in bed, and he rarely even uses it to play music. "Because you have to think about what to say in advance, you can click on your phone to browse in the menu." I want to listen to the song to open the software, listen to any song can be browsed and then click, but the voice can only finish the instructions once. The image of the smart speaker and the user's habits of use determine that it is not a commodity with strong iterative demand; long-term indoor fixed environment means that its loss cycle is very long; static function, so that it will not be like mobile phone products There is not enough memory to replace the reason.

Yu Yang's little love mini was purchased in 2018, when he was still working for Xiaomi, and to this day, he jumped from Beijing to Shanghai, and his family still uses this smart speaker. After all, for a user like him, there is no difference at all letting a small love classmate born in 2018 turn on the light, or letting a small love classmate born in 2022 turn on the light.

The unsolved problems at the technical level make it difficult for the stock market to generate iterative demand on the one hand, and on the other hand, the development of the incremental market in dialect areas encounters obstacles. Touching the ceiling of growth is only a matter of time.

Secondly, the linkage market environment determines that many imaginary product values cannot play a role for the time being.

In many fantasies about intelligent voice, people regard it as the commander-in-chief of the intelligence of home space, sitting on the sofa can control the curtains, turn on the TV, and even boil coffee and bake bread. But in the real space, most of the speakers are the same as the Little Love Mini of the Yu Yang family, only accessing a few devices such as desk lamps and televisions. Even for many families, the speaker is just a speaker.

Taking smart curtains as an example, the curtain track connected to the intelligent control system currently has a unit price of 500 to 700 yuan, if one layer of curtains is a layer of curtains, two tracks are needed, and the cost is 1000 to 1400 yuan. In the short term, users who are willing to pay for this are mostly concentrated in first- and second-tier cities, and the popularity of smart homes in the sinking market will take time.

Complete sets of smart home systems appear more in the hotel space, and some stores of business chain hotels such as Quanji, Atour, and Qiuguo have established a complete set of intelligent living systems with smart speakers as the core. As a hotel that sells space and services, the wave of intelligent living environment obviously predates the family living space.

Perhaps in the near future, a full set of smart homes will become a must-have choice for ordinary people's home improvement, and at that time, the demand for "commander-in-chief" will be more intense. For enterprises, in addition to the underlying technology and market environment, another problem that needs to be faced is the business model. When many big manufacturers bet on smart speaker products, the valued potential is that it may be a new generation of traffic entrance, just like the user's time was migrated from pc to mobile, and traffic may also be migrated from the mobile part to smart speakers.

In the Internet business thinking, the person who gets the traffic wins the world, no matter how the subsequent realization, first stuck in the position and then say. When the situation is stable and growth gradually enters the platform period, the means of monetization are still being explored.

The first is the revenue from the sale of the hardware itself. In order to seize the market, each company has carried out a large subsidy for smart speaker products, and Baidu, Ali, and Xiaomi have all launched products below 100 yuan. A hardware enthusiast once unloaded eight pieces of the Xiaoai speaker, calculating that the total price of hardware components was about 170 yuan, excluding assembly costs, packaging costs, software and hardware research and development costs, transportation costs, etc.

On hardware devices, the profit margin of smart speakers is very small, and it is even difficult to equalize the cost. In the era of mobile Internet, traffic brings revenue through value-added services, advertising, live broadcasting and e-commerce. And these roads, at present, seem to be facing temporary dissatisfaction at the entrance of smart speakers.

Memberships and paid content/features are a common monetization method and the preferred way of revenue for many tool products. However, in the process of building a membership system for smart speaker products, the first thing that needs to be challenged is the user's earlier established mentality. ——What was originally free to use can now be used for payment; the original mobile music app has been purchased for members, the speaker needs to buy another member to play songs.

For users, this is not a shift that can be easily accepted. In most of the previous variety show implants, the image of the smart speaker is a fun and interesting content provider, which can talk to people, play games with people, and play designated music for people.

Once the payment system is established, it means that the identity of the speaker shifts more to tools and content carriers, which is different from the user's original expectations. The payment habits of the mobile terminal are still in the cultivation period, and the smart speaker as a lower frequency tool wants to directly establish the user's payment mind, and obviously needs to come up with a higher-order differentiated experience. As for the indispensable advertising revenue of many C-end products, smart speakers are not yet available.

On the one hand, compared with short videos or graphic information flow and other product forms, the content display of smart speakers requires strong interactive actions, triggered by users peer-to-peer, under this interactive form, advertising information is difficult to have a time to show; on the other hand, the display of advertisements in smart speaker products may cause users to resist more strongly, in the private home space, by a so-called artificial intelligence speaker broadcast some ads that meet user portraits, may increase users' anxiety about privacy issues.

All in all, the underlying technology and market environment have slowed down the growth of smart speaker products, while the business model is still in the exploration period, and no company has yet provided a standard answer. But challenges are also opportunities, whether it is the current Big Three or unknown new players, who can cross the obstacle earlier, perhaps who can take the lead in loosening the current market map.

Intelligent voice, the moment of disruption has not yet arrived

James Flajos, who has been following and reporting on voice technology for more than a decade and has dealt with the top management of a number of Internet technology companies, excitedly asserted in his book published in 2019: "Every decade or so, there will be a fundamental change in the way people interact with technology."

The advent of the age of intelligent voice is a turning point in human history, because the use of voice is a trait of our species as human beings — an ability that distinguishes us from other species. He called speech the least expensive tool to learn, going beyond words and images to become the first choice for humans interacting with machines. People will ditch keyboards and touchscreens for more natural and comfortable voice interfaces. "However things don't seem to be moving in that direction.

The "reversal" achieved by the Small Degree series through the screen speaker, and the energy invested by various smart speaker giants in the new product with screen, seem to be declaring that it is difficult for users to give up the screen. At least for now, vision is still an important part of human-computer interaction.

As James argues, voice is indeed the least expensive tool to learn, and for most people, it is a skill that is naturally acquired in life, with no additional cost of learning. But at the same time, there are also inefficiencies in voice interaction. When it is necessary to receive information more efficiently and help themselves make decisions, people will directly open the song list like Yu Yang, pick the songs they want to listen to, rather than deciding everything by themselves, and then tell the smart speaker exactly what they want to listen to.

In the final analysis, the development of intelligent voice technology has not yet reached a subversive moment, and it cannot replace the interactive form of text/vision/touch; it is also possible that between visual/tactile interaction and voice interaction, it is not an evolutionary relationship, no one will replace anyone, and different technologies will blend with each other to meet the needs of users for convenience, efficiency, comfort and other aspects of the user in different scenarios.

As more and more screen/touchscreen speakers are equipped with batteries and support long-term unplugged mode, smart speakers are starting to be as portable as possible. Even the name speaker is just a habitual continuation, a portable touchscreen smart speaker that looks more like a tablet with a speaker and intelligent voice system.

Why are your smart speakers so stupid?

In the future, there are many screens that may complete such a transformation. The screen that can understand people's speech may be a tablet, a mobile phone, a TV, or an LCD screen on the refrigerator. Alexa is just a smart assistant who lives in Amazon Echo speakers, and it will live anywhere as well.

From the perspective of actual use, the intelligent assistants living in the speakers are more responsible for entertainment functions and simple tool functions.

Wang Yanqing, a graduate student in the Department of Information Management at Peking University, once pointed out in a study that curiosity is the initial motivation for most users to use smart speakers at home, and 85% of the families surveyed clearly stated that they bought smart speakers for the purpose of curiosity. Through variety show implants and advertising, smart speakers leave consumers with the impression of novelty and fun, and under the catalyst of low-price promotions, curiosity is eventually transformed into purchasing behavior.

This corresponds to the use behavior after purchase. According to a report released by iResearch in 2021, the most used function in the sinking market is to play music, accounting for 69%; followed by asking about the weather, broadcasting news/stories and setting alarms; controlling home appliances ranked 6th, accounting for 34%. These functions used by high frequencies are all available at the beginning of the birth of domestic smart speakers. Five years later, there are no disruptive new technologies that will allow users to change their first impressions of "novelty and fun" and take a more serious look at these intelligent assistants that are awakened every day.

From a growth and revenue perspective, smart speakers have come to a crossroads. As a carrier of intelligent speech, whether it can obtain more space, or gradually be dissolved, becoming the first form of intelligent speech, it still needs time to test.

Intelligent voice has also reached a crossroads, as of now, the era of intelligent voice in James's mouth has not yet arrived. How to make users truly barrier-free to achieve human-machine dialogue, so that voice assistants with human names behave like adult Homo sapiens, rather than children who often run into trouble, occasionally amuse or complete simple tasks, we may need more than a decade.

Resources:

WANG Yanqing,LIU Chang. Research on information behavior of smart speaker users in home environment[J].Literature and Data Report,2021,3(03):116-128.

James Flajos, Yuan Dongming, Hu Weisong, translated. The Era of Intelligent Voice[M].Electronic Industry Press, 2019.

"Sinking" Special Series Report with Screen Sinking-Smart Speaker Consumption Behavior Report 2021[C]//.iResearch Consulting Series Research Report (No. 5, 2021), 2021:610-634.

Read on