The new upgrade of iFLYTEK's large model: 20 seconds to produce PPT, the anthropomorphic voice ability surpasses ChatGPT|frontline

author：36 Krypton 2024-01-30 19:03:00

Author|Shizuru Takeshizu

Edited by Tang Yongyi

On January 30, iFLYTEK released the newly upgraded iFLYTEK cognitive model Xinghuo V3.5, and released the self-developed voice model, as well as the Xinghuo open source model, Xinghuo open source-13B.

In the past year, iFLYTEK has focused on the direction of large models, and a large number of updates have been released near the end of the year, which also shows iFLYTEK's determination to invest to a certain extent. On January 29, the company released its 2023 performance forecast: expected revenue of 20 billion yuan, an increase of 7% over 2022. However, due to the huge investment in large models, the company's net profit has declined, and in 2023, the company expects (after deducting non-recurring gains and losses) net profit to be in the range of 80 million yuan to 120 million yuan, a decrease of more than 70% from 2022.

iFLYTEK said that the iFLYTEK Xinghuo V3.5 is based on the Feixing No. 1 platform, the national computing power base, and is the first national open large model trained on the national industrial computing power platform. The upgraded iFLYTEK Xinghuo V3.5 has been improved in seven capabilities, including logical reasoning, language understanding, text generation, math answering, code, and multimodality.

After the upgrade, iFLYTEK Xinghuo's text generation ability and mathematical operation ability have been improved, and it can easily answer mathematical and physical problems in the third year of junior high school.

In the live demonstration, Xinghuo V3.5 can easily cope with the question of "can you go back to the original point by advancing 20 meters, turning right 60, and repeating it repeatedly, and if you can go back, how many meters will it take".

In terms of overall parameters, Xinghuo V3.5 has surpassed GPT-4 Turbo in language comprehension and math capabilities, and has reached more than 90% of GPT-4V's ability in code and multimodal comprehension.

The new upgrade of iFLYTEK's large model: 20 seconds to produce PPT, the anthropomorphic voice ability surpasses ChatGPT|frontline

Comparison of the capabilities of Xinghuo V3.5 and GPT

Relying on the new upgrade of Xinghuo V3.5, iFLYTEK has also released a new AIGC tool "iFLYTEK Zhiwen".

At the scene, iFLYTEK also conducted a real-time demonstration. According to the given document information, iFLYTEK Zhiwen can quickly produce a set of new upgrade skills of dozens of pages of PPT in 20 seconds. After the PPT is generated, you can also be equipped with a professional virtual person to explain the PPT.

The PPT generation ability relies on the element extraction, concept understanding, knowledge reasoning, problem generation, and graphic generation capabilities of Xinghuo V3.5. Xinghuo V3.5 can not only logically process document information, but also allow the large model to provide more incremental information beyond the document and expand the depth of PPT content.

iFLYTEK also released new multimodal model progress.

The newly released "Xinghuo Speech Model" is based on the large language model framework and pre-trained by combining the decoupling representation of multi-dimensional speech attributes such as iFLYTEK language, timbre, and content. It can be multilingual and achieve hyper-anthropomorphic speech synthesis effects. Its average MOS score for the first 40 languages (a criterion for evaluating audio or video quality, with 5 being the highest) increased by 0.25, and the MOS reached 4.5 in the anthropomorphic test, with an anthropomorphism of 83%, surpassing ChatGPT in anthropomorphic speech synthesis capabilities.

Spark voice model

In the direction of open source, iFLYTEK released the iFlytekSpark-13B model. The model is 13 billion parameters, pre-trained on a massive high-quality dataset of more than 3 trillion tokens, and has functions such as chat, Q&A, text extraction, data analysis, and code generation.

Spark open source model

At the same time, iFLYTEK has also upgraded hardware products such as translators and voice recorders and to B services based on the large model capabilities of Xinghuo V3.5.

The newly launched Xinghuo Smart Blackboard is an AI hardware product based on the Xinghuo model. This smart blackboard has a variety of functions such as multi-modal understanding and recommendation, all-natural interaction, virtual human-assisted teaching, and intelligent lesson recording, which can make the explanation of knowledge more intuitive, make it more convenient for teachers to teach, and also move famous artists such as Albert Einstein into the classroom in the form of virtual humans to assist students in learning more efficiently.

Spark Smart Blackboard

Aiming at the landing scenario of the large model on the B-side, iFLYTEK not only upgraded the general large model of Xinghuo, but also optimized a series of suite services related to the large model.

At the bottom level, the Spark general large model has a variety of sizes such as 13B, 65B, 175B, etc., supporting heterogeneous computing power scheduling, and in terms of industry large models, Xinghuo's full-chain tool chain can improve the training efficiency of industry large models by 90%, supporting the application optimization of mainstream scenarios of enterprises.

At present, the Spark model has been implemented in energy, government affairs and other scenarios.

Spark large model to B application framework

Regarding future plans, iFLYTEK said that it will continue to make efforts to improve model capabilities, and will fully benchmark GPT4 in the first half of 2024, and release iFLYTEK Spark 4.0.

The new upgrade of iFLYTEK's large model: 20 seconds to produce PPT, the anthropomorphic voice ability surpasses ChatGPT|frontline

Read on