SALMONN: A large language model that supports speech, audio events, and music input

author：O&M development Mu Zi Li 2023-08-23 18:11:00

#暑期创作大赛#

SALMONN is a large language model (LLM) that supports speech, audio events, and music input, created by the Department of Electronic Engineering of Tsinghua University and ByteDance. INSTEAD OF SPEECH-ONLY INPUT OR AUDIO-ONLY EVENT INPUT, SALMONN CAN PERCEIVE AND UNDERSTAND A VARIETY OF AUDIO INPUTS, ENABLING EMERGING CAPABILITIES SUCH AS MULTILINGUAL SPEECH RECOGNITION AND TRANSLATION AND AUDIO SPEECH REASONING. THIS CAN BE SEEN AS GIVING LLM'S "EARS" COGNITIVE HEARING CAPABILITIES, WHICH MAKES SALMONN A STEP TOWARDS GENERAL ARTIFICIAL INTELLIGENCE WITH HEARING CAPABILITIES.

SALMONN: A large language model that supports speech, audio events, and music input

SALMONN encodes a common audio representation using speech and audio encoders, and then uses an audio text aligner to map audio features to text space. Finally, the large language model responds based on text prompts and auditory markers.

demo

Compared with traditional speech and audio processing tasks such as speech recognition and audio subtitling, SALMONN realizes cognitive-oriented audio perception by using LLM's common sense and cognitive capabilities, which greatly improves the versatility of the model and the richness of tasks. IN ADDITION, SALMONN IS ABLE TO FOLLOW TEXT COMMANDS, EVEN VERBAL COMMANDS, WITH RELATIVELY HIGH ACCURACY. Since SALMONN only uses text-based training data, listening to voice commands is also a cross-modal emergence capability.

HERE ARE SOME DEMOS FROM SALMONN.

sonic	Reply
asr.wav	SALMONN: A large language model that supports speech, audio events, and music input
Audio subtitle .wav	SALMONN: A large language model that supports speech, audio events, and music input
Music .wav	SALMONN: A large language model that supports speech, audio events, and music input
Emotional .wav	SALMONN: A large language model that supports speech, audio events, and music input
asr_en2de.wav	SALMONN: A large language model that supports speech, audio events, and music input
Keywords.flac	SALMONN: A large language model that supports speech, audio events, and music input
Spoken inquiry .wav	SALMONN: A large language model that supports speech, audio events, and music input
Audio storytelling .wav	SALMONN: A large language model that supports speech, audio events, and music input
Spoken audio query .wav	SALMONN: A large language model that supports speech, audio events, and music input

Project Address:

SALMONN: A large language model that supports speech, audio events, and music input

demo

Read on

Nature: ChatGPT breaks the Turing test – a race to find new ways to evaluate artificial intelligence is underway. Large language models mimic human conversation, but science

#New forces in finance#With the rapid development of the field of artificial intelligence, large-scale language models and chatbots have become hot spots in the industry. Especially in ChatGPT, which was released by OpenAI

"Why do daughters-in-law and mother-in-law have irreconcilable contradictions?" Today, Xiaobian began to test Baidu's "Wen Xin Yiyan", which is known as the Chinese version of chatGPT! [Like] [Like] [Like

Summary of Popular Large Language Models (LLMs) in 2023

With the rise of telemedicine, online consultation and consultation has become the preferred convenient and efficient medical support method for patients. Recently, Large Language Models (LLMs) have shown powerful nature

Using the python programming language, build a large language model to define knowledge for the robot

Nvidia is one of the hottest companies in the United States, and its market value is now several times that of Intel. The graphics card produced is difficult to find. Especially for superchips for large-scale language computing. One sheet

Amazon Selection and Applications in Operations: Don't panic. With the rapid development of large-scale language models, GenerativeAI has been

Databrick Dolly: A large language model that follows instructions

Large language model: SBERT — sentence BERT

Do large language models know what they don't?

The paper, titled SortedLLaMA, aims to reveal the potential advantages of the middle layer of large language models. The paper proposes a method called SoFT, which utilizes an intermediate layer

Intel CEO Pat Gelsinger unveiled the next era of personal computers: AI PCs generation Intel CEO Pat Gelsinger recently in San Jose

A beginner's guide to "Artificial Intelligence" Large Language Model (LLM).

The Game of Thrones author sued ChatGPT, some of the world's best-known novelists, and this week banded together to sue ChatGPT maker OpenAI, saying it used him

David Chalmers: Large language models predict that conscious AI will be possible in less than a decade

Use LM Studio to deploy local AI large language models with one click

With 3 times the sensitivity, it only takes a few seconds to search for millions of protein pairs, and Fudan and others have developed new language models

8.3K Stars!

Meta Researchers Crack the Curse of Large Model Reversal and Launch "Language Model Physics"

Decoding AI: Demystifying the "brain" of chatbots - large language models

Predicting protein co-regulation and function, Harvard & MIT trained a genomic language model

Intel has made important progress in the field of artificial intelligence accelerators, and its subsidiary HabanaLabs is in

Researchers propose a new concept of artificial intelligence that allows large language models to interact with the real physical world

Llama 3: The next frontier of open-source large language models

The secret of using large language models: How to control AI with efficient prompt words?

Apple has been exposed to a big move again, self-developed device-side large language model, AI is a new way out of "revitalization"?

No wonder the previous iPhone 16 series national version of the AI function will be provided by Baidu, the original Baidu in the Chinese artificial intelligence invention patent enterprise ranking is still high. Ranked in the top 10

Apple released OpenELM, an efficient language model based on an open-source training and inference framework

Solomonov: The Prophet of Large Language Models

Large Language Model Deployment: vLLM and Quantization

Apple launches OpenELM, an efficient language model, Xiaomi plans a new car for 150,000 yuan, and AI successfully rewrites human DNA