OpenAI's press conference, which ended on May 14, unveiled its latest flagship model, GPT-4o, demonstrating the increasingly powerful capabilities of AI technology. Today, with the rapid development of artificial intelligence and the continuous emergence of large models, it coincides with the 1st anniversary of the release of the iFLYTEK Spark large model. In the past year, iFLYTEK Xinghuo has brought many surprises and changes to virtual humans.
(Image generated by iFLYTEK Xinghuo)
Virtual humans are essentially a digital simulation of people, with three characteristics: appearance, behavior, and thought. The implementation of these features relies on the integration of a series of advanced technologies, such as image recognition, 3D modeling, motion capture, natural language processing, computer vision, etc. At present, the empowerment of virtual humans by the iFLYTEK Xinghuo large model is reflected in the following aspects.
(1) Image customization is lighter: second-level construction
Relying on the Spark model, iFLYTEK has launched the "second-level sound/image construction" function, which can realize the rapid production of sound and image!
iFLYTEK Zhizuo "second-level sound & image construction" function page
It only takes less than 10 seconds to extract the appearance features, voice characteristics and other elements through the AI algorithm, and the system can generate a personal "digital clone" in a very short time. At the same time, it also supports self-training and standard training of image models to meet the application needs of virtual humans in different scenarios.
Virtual anchor "An Xiaojia" generated based on real anchors
Virtual host "Xiaojun" generated based on a live streamer
Professor Wang Jinhuan of Heilongjiang University of Traditional Chinese Medicine "Digital Doppelganger"
A variety of scenarios such as education and training, media communication, scientific and technological services, customer service guides, and short video production involve different content needs, and iFLYTEK can meet them well.
(2) Behavior-driven more realistic: hyper-anthropomorphic voice + AI-generated action
The Xinghuo voice model released on January 30 can achieve super-anthropomorphic dialogue, and the sound effect is close to the oral expression state of human daily life, with paralinguistic abilities such as breathing, sighing, changing the speed of speech, pausing and thinking, light and heavy reading, and modal words (um, ah). In addition, the perception of emotions of the large model reaches 85%+, which can more vividly express emotions such as happiness, apologies, coquettishness, and confusion.
At present, the super anthropomorphic voice has been launched on iFLYTEK Zhizuo, including "Ling Xiaoqi", "Ling Xiaoshan", "Ling Yuyan", "Ling Yuzhao", and "Ling Feizhe" 5 male and female voice talents. Whether it's a daily chatter or a complex and professional Q&A consultation, such a voice can better express personality and emotions.
Hyper-anthropomorphic voice content is more realistic
In addition to sound, movement is also a key element of virtual human interaction. With the support of large model technology, it can deeply understand semantic text, automatically match and generate actions, making virtual human movements more natural, smooth, realistic, and more vigorous.
Diverse postures and richer scenes
AI-generated actions make interactions more natural
At present, iFLYTEK has launched a variety of virtual human avatars, supporting AI-generated actions, and matching scene-based video templates to make the content effect closer to the real scene.
(3) Interaction brain consciousness: the re-evolution of virtual human intelligent interaction machine
The upgrade of virtual interaction means that the communication and interaction between users and virtual humans are more natural, efficient and intelligent.
As an intelligent device that integrates advanced speech recognition, natural language processing and machine learning technologies, the virtual human intelligent interactive machine continuously upgrades its intelligent perception ability, semantic understanding ability and emotional expression ability with the blessing of the Xinghuo large model, making the "face-to-face" communication and Q&A between virtual humans and users more effective and open.
At present, the intelligent interactive machine has been widely used in many fields such as finance, government affairs, cultural tourism, commerce, and exhibitions. It can be seen in scenic spots such as the Old Summer Palace, Mingzhongdu, Luogang Park, and major occasions such as the National Two Sessions, the Beijing Winter Olympics, and the Chengdu Universiade.
The virtual tour guide of Mingzhongdu Ruins Park can guide the scenic spot
The virtual tour guide of the Old Summer Palace Ruins Park is cute to popularize knowledge
iFLYTEK created the Chengdu Universiade virtual volunteer Xiaofu
The virtual human intelligent interactive machine was unveiled at the 2023 World Artificial Intelligence Conference
Beijing Winter Olympics virtual volunteer Aijia conducted multilingual interactive inquiries
The advanced Spark model brings an all-round improvement to the virtual person, not only in terms of external image, language and action, but also in the upgrade of the virtual person's interaction ability, the enhancement of the virtual person's "autonomous consciousness", and then leading the "new consciousness" of the virtual person.
As a representative of new quality productivity, iFLYTEK has always adhered to the practice of artificial intelligence +, so that virtual humans can become human partners.