laitimes

Microsoft executives said that GPT-4 will be released soon, or realize multimodal operation of text, images, voice and so on

It is reported that recently, Andreas Braun, chief technology officer of Microsoft Germany, told the media: "We are about to launch GPT-4, when we will launch a model of multiple modalities, providing completely different possibilities, such as video." ”

The upcoming release of GPT-4 is the latest version of the Generative Pre-trained Transformer (GPT) created by OpenAI. As a deep learning technology, the models in this series will use artificial neural networks to process many complex natural language tasks including article generation and code writing.

Developed based on the GPT-3.5 architecture, the chatbot ChatGPT has quickly taken the world by storm since its launch. In theory, GPT-4 will technically go further than ChatGPT.

In large language models, modality refers to the types of input that the model can handle, including text, speech, images, videos, and other input sources. A multimodal large language model means that it can get information from various input sources and function properly.

Figure 丨 (Source: Pixabay)

Compared with the text-based single-modal large-scale language model based on ChatGPT, the possibility of obtaining information from images, videos and other content is greatly increased.

It is understood that the multimodal large-scale language model GPT-4 may support four modalities such as text, image, sound and video. However, since the specific details of GPT-4 have not been officially announced, it is unclear whether Braun shared GPT-4's unique multimodality at the event or other multimodality.

According to Holger Kenn, Microsoft's director of business strategy in Germany, multimodal AI "can not only convert text into corresponding images, but also into music and video."

The media has confirmed that GPT-4 will be able to support basically any language. This means that users who ask questions in English may receive answers in Japanese.

This may sound very strange. After all, how can someone who asks a question in English want an answer in Japanese? The key to this is that the model enables the dissemination of knowledge across different languages.

That is, if the answer that the questioner wants exists only in a certain language, the model can automatically convert the answer to the language in which the questioner asked the question.

In addition to its multimodal capabilities, GPT-4 is also able to provide faster response times than ChatGPT-generated answers, and is expected to provide more user-friendly answers.

It's worth noting that as a web-based language model, ChatGPT doesn't currently have a mobile app, but OpenAI may be developing a GPT-4-enabled mobile app.

At the same time, according to Braun, GPT-4 will also open up new enterprise use cases for generative AI. For example, with GPT-4, voice calls can be recorded in text form, saving customer service staff and other workers the time to manually enter critical information after answering the call.

Clemens Sieber, a senior AI expert at Microsoft Germany, told the media: "This can save 500 hours of work per day for a Microsoft customer in the Netherlands who receives 30,000 calls a day." ”

He further elaborates, "There are three more common use cases for answering company knowledge questions that only employees can access, AI-assisted document processing, and semi-automation by processing spoken language in call and response centers. ”

In addition, it is also reported that in order to improve the reliability of the artificial intelligence it has developed, Microsoft is also promoting the research of "confidence indicators".

What is the significance of this initiative?

Specifically, users typically use AI to understand or query their own datasets, and the accuracy of such models is now very high. However, since the accuracy of the model outputs text in a generated way needs to be further explored, the reliability of the model needs to be continuously improved.

"We built a feedback loop around it that included pros and cons, and it was an iterative process," Seaber said. ”

Finally, let's talk about the relationship between Microsoft and OpenAI. Microsoft has been a partner of OpenAI since 2019. The former invested $1 billion in the latter, and after the success of ChatGPT in early 2023, announced that it would invest billions of dollars over the next few years.

Even with the upcoming release of GPT-4, ChatGPT's popularity has not diminished. Last week, Microsoft also announced that it would integrate ChatGPT into its Azure cloud platform.

According to this prediction, GPT-4 may also be integrated into Microsoft products in the future, such as Bing chatbot.

Today, more and more businesses are looking to AI to better optimize productivity and streamline workflows. The development of multimodal large neural networks is not only an important milestone in the development of artificial intelligence, but also will guide model builders to think, whether they want to build a system that can help people live a better life, or develop a tool that is only used to generate profits.

Therefore, in the long run, while continuously exploring the powerful potential of AI, more power should be injected into the regulatory side.

Resources:

https://www.searchenginejournal.com/gpt-4-is-multimodal/481993/

https://www.heise.de/news/GPT-4-is-coming-next-week-and-it-will-be-multimodal-says-Microsoft-Germany-7540972.html

https://techmonitor.ai/technology/ai-and-automation/gpt-4-openai-microsoft-chatgpt

https://www.bigtechwire.com/2023/03/09/gpt-4-microsoft-germany-announces-release-date-of-fourth-generation-large-language-model/

https://www.livemint.com/

https://venturebeat.com/automation/unlocking-the-power-of-cloud-native-observability-to-transform-the-customer-experience/

https://economictimes.indiatimes.com/news/new-updates/openais-gpt-4-to-bring-multimodal-capabilities-with-ai-generated-videos-and-faster-responses-say-reports/articleshow/98579150.cms

Read on