iFLYTEK Xinghuo has upgraded again to launch the first long text, long picture and long voice function

On April 26, iFLYTEK Xinghuo large model V3.5 was launched in the spring, releasing the industry's first long text, long graphics and text, and long voice large model, which can not only quickly learn massive texts, graphic materials, meeting recordings, etc. from various information sources, but also give professional and accurate answers in various industry scenarios. In addition, the Xinghuo voice model debuted multi-emotional super-anthropomorphic synthesis, which has the ability to express emotions, and launched a one-sentence voice reproduction function.

The first large model of long text, long picture and text, and long voice

Why does iFLYTEK want to make a large model with long text, long graphics and long voice?

Through the iFLYTEK Xinghuo APP, it can be seen that the peak of user use is not on weekends, but at 9:30 a.m. and 3:30 p.m. on weekdays. This means that most users use iFLYTEK Xinghuo to solve work-related rigid needs. Efficient knowledge acquisition is a high concern for both users and developers.

iFLYTEK analysis found that in the process of knowledge acquisition and learning, the information that the majority of users can get is often not only ready-made long texts, but also the content of newspapers and books, PPT content of various seminars, board books on the teacher's blackboard, students' notes, as well as various meeting recordings, interviews, various online press conferences, training and education videos, etc., can these texts, pictures, voices, etc. be uploaded to iFLYTEK Xinghuo to quickly acquire knowledge?

To this end, iFLYTEK has launched the first large model that supports long text, long graphics and long voices, to solve the needs of users for obtaining multi-source information in real scenarios.

After the new upgrade of the iFLYTEK Xinghuo long text function, it has the capabilities of long document information extraction, long document knowledge Q&A, long document induction, long document text generation, etc., and has generally reached 97% of the level of the latest long text version of GPT-4 Turbo in April, while the overall level of iFLYTEK Xinghuo long text has surpassed GPT-4 Turbo in the knowledge question and answer tasks in multiple vertical fields such as banking, insurance, automobiles, and electricity.

The implementation of the long text function needs to solve the problem of efficient information processing: in the face of millions or even tens of millions of words, the long text large model consumes a lot of computing resources.

In order to solve the problem of efficiency and accuracy of large model application, Liu Qingfeng, chairman of iFLYTEK, said that based on the ability of iFLYTEK Xinghuo V3.5 to understand, learn and answer long texts, iFLYTEK has carried out important model pruning and distillation, so as to launch the industry's best performance of 13 billion parameters of the large model, in the case of effect loss of only 3%, so that Xinghuo has achieved great efficiency improvement in document upload and analysis processing, knowledge Q&A first response time and text generation. The test shows that under the condition of ensuring the effect of long text, whether it is 10K, 64K, 128K token, or longer text, the performance of the Xinghuo large model is the best in the industry.

Facing complex graphic and text scenarios, iFLYTEK has launched the Spark graphic recognition large model for the first time on the basis of years of international first-class technology accumulation in the graphic recognition and formula recognition competitions. Compared with the limitations of traditional small model line-by-line text recognition, the Xinghuo image and text recognition large model can directly handle very complex layout analysis, and has covered 31 typical scenarios, such as books, academic papers, patents, newspapers, posters, PPT, etc., and can automatically identify and mark 18 different layout elements, such as headers, footers, titles, paragraphs, tables, formulas, seals, handwriting, etc.

In addition, in the face of the demand for efficient access to a wide range of audio and video information, iFLYTEK has also launched the long voice function, which combines the world's leading speech recognition and translation technology to realize one-click reading of meeting recordings and learning videos, and realize efficient knowledge acquisition in audio and video scenarios.

It can "resonate emotionally" and "reproduce the sound of a sentence"

In the era of the Internet of Everything, more realistic AI voice interaction is needed. At the launch of iFLYTEK Xinghuo V3.5 at the beginning of the year, iFLYTEK launched the super anthropomorphic dialogue function, and the voice of AI is more natural and realistic, with an anthropomorphism of 83%, which is widely welcomed by users. Whether it is speech intelligibility, fluency or expressiveness, the effect exceeds that of OpenAI and Microsoft.

This time, iFLYTEK released multi-emotional super-anthropomorphic synthesis, which further improved the perceptibility of emotional expression, and the perceptibility of emotional expressions such as happiness, apology, comfort, coquettishness, and confusion reached more than 85%, and the AI voice was more vivid and real.

In addition to hyper-anthropomorphic dialogue, iFLYTEK has also launched the "one-sentence voice replication" function, which allows you to customize your AI assistant voice in one sentence. For example, imitating children's voices, reading books and newspapers to grandparents every day, and imitating our voices to tell children stories when we are on a business trip. This feature can make the world a warmer place.

Liu Qingfeng said that iFLYTEK has always been an industry leader in personalized speech synthesis, and has now advanced to one-sentence voice reproduction. At that time, iFLYTEK AI needed to go to Taiwan to record Lin Chiling's voice for a week, and later it took a day to imitate Guo Degang's voice, and then it took 5 minutes of recording, and now it can be imitated in one sentence. You can experience it on the iFLYTEK Xinghuo APP.

Dawan News reporter Xiang Lei

Edited by Wang Cui