laitimes

"There are two directions for general artificial intelligence in the future"

author:Observer.com

【Text/Observer.com, Zhou Yi, Editor: Lu Dong】

Compared with the popularity of ChatGPT when it came out, the charm of the big model "chat" seems to be quietly fading.

In its latest article this month, web analytics firm Similarweb says traffic to ChatGPT is declining as the novelty wears off. According to preliminary estimates, global visits to the ChatGPT website fell by 9.7% in June, the first time the site recorded a month-on-month decline in visits. In the US market, the number of visits to the website decreased by 10.3% month-on-month.

Obviously, "Chat" is not the whole of the big model, and human society cannot be reconstructed by writing poetry and painting alone.

"There are two directions for general artificial intelligence in the future"

Screenshot of the Similarweb article

However, since its inception, the big model "chat" has been sinking into the vertical realm, constantly reshaping people's lives.

A study by JAMA Internal Medicine, an authoritative international journal, shows that when it comes time to carefully answer patient questions, the average response length of doctors is 52 words, and chatbots are 211 words. Its responses are not only more informative, but also of better quality and more empathetic. In the assessment, 78.6% preferred the answers of the chatbot over the doctor's.

Putting large models into industrial application is also becoming the choice of many enterprises at home and abroad. At the 6th World Artificial Intelligence Conference (WAIC), which recently concluded in Shanghai, JD.com and many other companies introduced their solutions and related thinking. Focusing on "fitting the industry scenario", some people choose to let the general large model sink to the industry, and some people choose to directly create vertical large models.

Brainstorm: In addition to better understanding the "scenario", what will artificial intelligence develop in the future?

He Xiaodong, president of JD Exploration Research Institute and president of JD Technology's intelligent service and product department, said that to true artificial intelligence, multi-modal is the only way. "People are always a core existence, and all technology must ultimately serve people. Future AI needs to communicate with humans through language, vision, and speech, so future AI must also understand language and speech. Do a good job of multi-modality, in order to better serve human beings. ”

In the current competitive environment, "scene landing" is the ultimate goal of the big model.

When the application of large models is implemented, understanding the scene is the key?

One of the hottest topics at WAIC this year is how to make large model applications land.

This in itself is not difficult to understand: large models at home and abroad cannot be limited to "chat". Based on intelligent interaction, it is itself a tool to improve productivity. Any large model, in the future, must eventually land in specific industry applications to improve productivity; In the competition of large models, China has its own advantages compared with other countries, although there is a gap, but China's industry is many and mature, which may be an opportunity to "overtake".

From the data, in terms of computing power, China is not necessarily at a disadvantage.

Wu Hequan, an academician of the Chinese Academy of Engineering, once pointed out that according to the data at the end of 2022, the United States accounts for 36% of the global computing power, and China accounts for 31%. In terms of the scale of intelligent computing power dominated by GPUs and NPUs, the scale of intelligent computing in the United States accounted for 15% of the total global intelligent computing scale in 2021, and China accounted for 26%.

However, the gap is still worth paying attention to, for example, deep learning frameworks still need to be tested and continue to be polished; For example, the expansion of generative AI to industrial applications requires the efficient integration of multiple large models, which brings many problems. For example, large models require massive data training, but there is not enough corpus mining Chinese for training. For example, the NVIDIA A100 chip, which relies on large model training, is restricted from exporting to China... China still faces many challenges.

"There are two directions for general artificial intelligence in the future"

Source: NVIDIA website

In the new round of global artificial intelligence competition, the large model itself is indeed a "tough battle" that must be gnawed. But China actually has a chance to make a "surprise soldier".

Data, computing power and "banknote power" are indispensable for large models, but the development of large models is also inseparable from "scenarios". Essentially, the big model changes the way humans access information and services. It not only needs to meet the needs of information matching, but also allows AI to accurately understand human uses and accurately complete the tasks delivered by humans. "Precision" is based on the understanding of the scene.

At the WAIC conference, He Xiaodong, president of JD Exploration Research Institute and president of the intelligent service and product department of JD Technology, said that with the emergence of large models, the world will inevitably move towards the era of intelligent interaction in the future, allowing machines to better help us complete professional domains and broader tasks. He Xiaodong said that training a large model requires a scene, "Scene and data are the starting point for training large models in this era." ”

Perhaps this will be an opportunity for China.

Taking industry as an example, China has 41 industrial categories, 207 industrial intermediate categories, and 666 industrial subcategories, making it the only country in the world with all the industrial categories listed in the United Nations Industrial Classification; In terms of the Internet, China has a large number of enterprises in the fields of e-commerce, social networking and search, with mature experience and huge data, and the combination of large models and these industries and scenarios may bring a lot of opportunities.

"There are two directions for general artificial intelligence in the future"

A view of the production workshop Source: Xinhua News Agency

For example, e-commerce. According to the "2022 China E-commerce Market Data Report" previously released by NetEconomics, the transaction scale of the domestic online retail market will reach 137853 billion yuan in 2022; The number of online retail users in China reached 845 million, accounting for 79.2% of the total Internet users. Under this "big scene", "sub-scenes" such as live broadcasting, social networking, beauty, mother and baby, e-commerce, logistics, customer service... They may all become the entry point for large model applications.

After understanding the scenario, the next stop for AI is multimodal capability?

Around the industry scenario, some "solutions" have been released.

According to He Xiaodong, through 5 minutes of image and data collection, based on the ability of large models, JD.com can reconstruct the entire digital human image and promote it to application scenarios. For example, in the e-commerce scenario. Yanxi virtual anchor has launched 4,000+ brand live broadcast rooms on JD.com, accumulating 800 million GMV (total merchandise transactions).

However, paying attention to the application of scenarios may only be the present of the global competition for large models, not the future.

He Xiaodong said that people should not only pay attention to the big language model brought by ChatGPT. In fact, large model technology is also rapidly being applied in many other modalities, such as speech recognition and speech synthesis, such as image recognition and video synthesis in the visual field - and of course digital humans. "Digital humans have both images, voices, gestures, semantics, and emotions in them."

He Xiaodong said that multi-modality is the only way, whether it is the invention of neural networks or attention mechanisms, in fact, it is based on the understanding and inspiration of people's own learning mechanisms, so as to lead us to invent a series of models. Interestingly, many participating companies this year seem to have a soft spot for "digital humans", which shows the importance attached to multimodal capabilities from all walks of life.

At this year's WAIC conference, Tencent Cloud MaaS (Model-as-a-Service) one-stop service ushered in an upgrade. Tencent Cloud's industry big model capabilities will be applied to scenarios such as financial risk control, interactive translation, and digital sapiens customer service. Through the AI-generated algorithms, generative action drive, and industry large model capabilities provided by the platform, enterprises can obtain personalized, professional, and realistic digital employees. Digital humans, in fact, involve multimodal capabilities.

The report "Human-Machine Symbiosis - Ten AI Trends in the Big Model Era" pointed out that the development of multimodal technology is helping AI solve more complex problems. With the perception and input of images and voices, large models can analyze information such as actions, expressions, and emotions in the future to improve their interaction and performance capabilities. At present, text-based interaction will also move towards semantic-based interaction, strengthening the perception and expression of human emotions.

"There are two directions for general artificial intelligence in the future"

Site diagram

Admittedly, challenges remain on the track to the future, such as multimodality.

ChatGPT-4 has long accepted images as an input medium, and it can already succinctly point out the contradictions of pictures. As shown in the figure below, when a user asks, "What's unusual about this picture," GPT-4 succinctly responds, "A man is ironing clothes on the roof of a moving taxi." ”

"There are two directions for general artificial intelligence in the future"

At this year's WAIC conference, talking about the gap between large models at home and abroad, Tang Wenbin, co-founder and CTO of Megvii Technology, told the Science and Technology Innovation Board Daily that whether it is a basic language model or a multimodal model, there is a certain distance at home and abroad. "However, this is catch-up. In terms of application exploration, it is also in a relatively early state. But the future will be a prosperous ecology. ”

In the ascendant, it is time for domestic companies to actively act.

In an interview with the observer network, He Xiaodong said that there are two directions for general artificial intelligence in the future, one direction is multi-modality, large models must have visual capabilities, and the future can even be further extended to smell and touch; Another direction is to move towards embodied intelligence, including robots, robotic arms, unmanned vehicles, etc., so that general artificial intelligence will move to the physical world.

This article is an exclusive manuscript of the Observer Network and may not be reproduced without authorization.

Read on