Microsoft executives said that GPT-4 will be released soon, or realize multimodal operation of text, images, voice and so on

2023-03-14 19:14:32

It is reported that recently, Andreas Braun, chief technology officer of Microsoft Germany, told the media: "We are about to launch GPT-4, when we will launch a model of multiple modalities, providing completely different possibilities, such as video." ”

The upcoming release of GPT-4 is the latest version of the Generative Pre-trained Transformer (GPT) created by OpenAI. As a deep learning technology, the models in this series will use artificial neural networks to process many complex natural language tasks including article generation and code writing.

Developed based on the GPT-3.5 architecture, the chatbot ChatGPT has quickly taken the world by storm since its launch. In theory, GPT-4 will technically go further than ChatGPT.

In large language models, modality refers to the types of input that the model can handle, including text, speech, images, videos, and other input sources. A multimodal large language model means that it can get information from various input sources and function properly.

Figure 丨 (Source: Pixabay)

Compared with the text-based single-modal large-scale language model based on ChatGPT, the possibility of obtaining information from images, videos and other content is greatly increased.

It is understood that the multimodal large-scale language model GPT-4 may support four modalities such as text, image, sound and video. However, since the specific details of GPT-4 have not been officially announced, it is unclear whether Braun shared GPT-4's unique multimodality at the event or other multimodality.

According to Holger Kenn, Microsoft's director of business strategy in Germany, multimodal AI "can not only convert text into corresponding images, but also into music and video."

The media has confirmed that GPT-4 will be able to support basically any language. This means that users who ask questions in English may receive answers in Japanese.

This may sound very strange. After all, how can someone who asks a question in English want an answer in Japanese? The key to this is that the model enables the dissemination of knowledge across different languages.

That is, if the answer that the questioner wants exists only in a certain language, the model can automatically convert the answer to the language in which the questioner asked the question.

In addition to its multimodal capabilities, GPT-4 is also able to provide faster response times than ChatGPT-generated answers, and is expected to provide more user-friendly answers.

It's worth noting that as a web-based language model, ChatGPT doesn't currently have a mobile app, but OpenAI may be developing a GPT-4-enabled mobile app.

At the same time, according to Braun, GPT-4 will also open up new enterprise use cases for generative AI. For example, with GPT-4, voice calls can be recorded in text form, saving customer service staff and other workers the time to manually enter critical information after answering the call.

Clemens Sieber, a senior AI expert at Microsoft Germany, told the media: "This can save 500 hours of work per day for a Microsoft customer in the Netherlands who receives 30,000 calls a day." ”

He further elaborates, "There are three more common use cases for answering company knowledge questions that only employees can access, AI-assisted document processing, and semi-automation by processing spoken language in call and response centers. ”

In addition, it is also reported that in order to improve the reliability of the artificial intelligence it has developed, Microsoft is also promoting the research of "confidence indicators".

What is the significance of this initiative?

Specifically, users typically use AI to understand or query their own datasets, and the accuracy of such models is now very high. However, since the accuracy of the model outputs text in a generated way needs to be further explored, the reliability of the model needs to be continuously improved.

"We built a feedback loop around it that included pros and cons, and it was an iterative process," Seaber said. ”

Finally, let's talk about the relationship between Microsoft and OpenAI. Microsoft has been a partner of OpenAI since 2019. The former invested $1 billion in the latter, and after the success of ChatGPT in early 2023, announced that it would invest billions of dollars over the next few years.

Even with the upcoming release of GPT-4, ChatGPT's popularity has not diminished. Last week, Microsoft also announced that it would integrate ChatGPT into its Azure cloud platform.

According to this prediction, GPT-4 may also be integrated into Microsoft products in the future, such as Bing chatbot.

Today, more and more businesses are looking to AI to better optimize productivity and streamline workflows. The development of multimodal large neural networks is not only an important milestone in the development of artificial intelligence, but also will guide model builders to think, whether they want to build a system that can help people live a better life, or develop a tool that is only used to generate profits.

Therefore, in the long run, while continuously exploring the powerful potential of AI, more power should be injected into the regulatory side.

Resources:

https://www.searchenginejournal.com/gpt-4-is-multimodal/481993/

https://www.heise.de/news/GPT-4-is-coming-next-week-and-it-will-be-multimodal-says-Microsoft-Germany-7540972.html

https://techmonitor.ai/technology/ai-and-automation/gpt-4-openai-microsoft-chatgpt

https://www.bigtechwire.com/2023/03/09/gpt-4-microsoft-germany-announces-release-date-of-fourth-generation-large-language-model/

https://www.livemint.com/

https://venturebeat.com/automation/unlocking-the-power-of-cloud-native-observability-to-transform-the-customer-experience/

https://economictimes.indiatimes.com/news/new-updates/openais-gpt-4-to-bring-multimodal-capabilities-with-ai-generated-videos-and-faster-responses-say-reports/articleshow/98579150.cms

Microsoft executives said that GPT-4 will be released soon, or realize multimodal operation of text, images, voice and so on

Read on

Microsoft Open Source Deep Speed Chat: The era of ChatGPT for everyone is here

Musk does not talk about Wude: while publicly calling for the suspension of AI research, while secretly developing an "AI version of WeChat"?

Microsoft seeks to transform its digital advertising business with ChatGPT

Ten thousand layoffs turned around and embraced AI, and Meta was going to change its name again

Microsoft Google wants to reinvent the business with AI, Musk said that AI will destroy humanity... Talk about AI

Microsoft Azure OpenAI International Edition integrates ChatGPT and other five large model services

Samsung "backstabbed" Google

Musk threatened to sue Microsoft, saying it "illegally used Twitter data for AI training."

Keep up with Microsoft! Google's generative AI Bard can program and debug code bugs too

Bing chat improvement report: Correctly display math formulas to reduce abnormal ending of conversations

Gates: AI will disrupt education, but in the short term "there will be far more failures than successes"

"Red Sky Island" debut rollover was bombarded with bad reviews, and the president of Xbox apologized

GPT-4 Windows Fried Field! The whole system is a conversational robot, and Microsoft has built an AI universe

Game information: Microsoft is determined to win and settle with Sony Nintendo for mergers and acquisitions!

Sony Hong Kong service PS+ one, two and three levels of membership officially increased the price, and the national service annual membership has risen to 309 yuan

Microsoft today officially launched the XGP Core service: replacing the Gold membership and providing a mini-game library

Meta Researchers Crack the Curse of Large Model Reversal and Launch "Language Model Physics"

Decoding AI: Demystifying the "brain" of chatbots - large language models

Predicting protein co-regulation and function, Harvard & MIT trained a genomic language model

Intel has made important progress in the field of artificial intelligence accelerators, and its subsidiary HabanaLabs is in

Researchers propose a new concept of artificial intelligence that allows large language models to interact with the real physical world

Llama 3: The next frontier of open-source large language models

The secret of using large language models: How to control AI with efficient prompt words?

Apple has been exposed to a big move again, self-developed device-side large language model, AI is a new way out of "revitalization"?

No wonder the previous iPhone 16 series national version of the AI function will be provided by Baidu, the original Baidu in the Chinese artificial intelligence invention patent enterprise ranking is still high. Ranked in the top 10

Apple released OpenELM, an efficient language model based on an open-source training and inference framework

Solomonov: The Prophet of Large Language Models

Large Language Model Deployment: vLLM and Quantization

Apple launches OpenELM, an efficient language model, Xiaomi plans a new car for 150,000 yuan, and AI successfully rewrites human DNA

The combination of deep learning and chemical language models is used for de novo drug design, which is published in the journal Nature

The tuyere belonging to major technology companies is here again! This large language model leads to the "new industrial revolution."

The landing of large language models Why the first step is to do customer service