laitimes

OpenAI amplify the move! From now on, the GPT-4o voice mode will be launched to some users, and emotional intelligence will make machine conversations more humane

OpenAI amplify the move! From now on, the GPT-4o voice mode will be launched to some users, and emotional intelligence will make machine conversations more humane

National Business Daily

2024-07-31 10:22Published on the official account of Sichuan Daily Economic News

Edited by: Du Yu

On Tuesday (July 30), local time, OpenAI, an artificial intelligence (AI) research company in United States, announced that it will begin to roll out GPT-4o's voice mode to some ChatGPT Plus users from now on.

OpenAI amplify the move! From now on, the GPT-4o voice mode will be launched to some users, and emotional intelligence will make machine conversations more humane

Image source: OpenAI official website

OpenAI said that the video and screen sharing features demonstrated during the Spring Update were not included in the Alpha version and were rolled out later.

ChatGPT's advanced voice mode is different from the previous voice mode, and the original audio solution uses three separate models: one is used to convert the user's speech to text, then GPT-4 processes the prompt, and then the third model converts the text generated by ChatGPT into speech.

GPT-4o is a multimodal model that is able to handle these tasks without the assistance of other models, as the latency of the conversation will be significantly reduced in terms of experience.

OpenAI also revealed that GPT-4o can sense emotional intonation in the user's voice, including sadness, excitement, or singing; Currently, users in the Alpha group will receive an alert in ChatGPT and will receive an email explaining how to use it.

OpenAI launched a new version of its large model GPT-4o in May this year, while also demonstrating the voice mode. The company was scheduled to gradually open up the voice mode to users at the end of June, but ultimately decided to postpone the release until July. Voice mode will be available to all ChatGPT Plus users this fall.

"With a gradual rollout, we can closely monitor usage and continuously improve the model's capabilities and security based on actual feedback," OpenAI said on Tuesday. The company also revealed that it is still working on the video and screen sharing features that will be showcased during the May launch. There is no time set for these features to be rolled out.

As a result, the initial functionality of voice mode will be limited. For example, ChatGPT won't be able to use computer vision capabilities, which allow chatbots to provide voice feedback on a user's dance moves through a smartphone's camera.

The GPT-4o voice mode currently has access to four preset voices, Juniper, Breeze, Cove, and Ember, which were produced in collaboration with paid voice actors.

Previously, ChatGPT's female voice over called Sky was accused of being very similar to Hollywood star Scarlett · Johnson. After receiving a lawyer's letter from Johnson's team, OpenAI suspended the use of Sky voice.

OpenAI also said it has introduced new filters to ensure that the software can discover and reject certain requests to generate music or other forms of copyrighted audio. For AI companies, avoiding getting caught up in legal battles has become something to be wary of.

OpenAI amplify the move! From now on, the GPT-4o voice mode will be launched to some users, and emotional intelligence will make machine conversations more humane

Image source: Visual China-VCG31N2008743681

It is worth noting that on June 21, OpenAI announced the successful acquisition of the startup Rockset, which will include the company's talents and important technologies at the same time.

OpenAI said in the announcement that AI has the opportunity to change the way individuals and organizations use their own data. That's why we acquired Rockset, a database company that offers cutting-edge real-time analytics capabilities that provide world-class data indexing and querying capabilities.

Rockset offers a key technology called "vector search". As more companies use AI-powered recommendation engines, voice assistants, chatbots, and more, the use cases for this technology are becoming more broad.

National Business Daily comprehensive public information

National Business Daily

View original image 70K

  • OpenAI amplify the move! From now on, the GPT-4o voice mode will be launched to some users, and emotional intelligence will make machine conversations more humane
  • OpenAI amplify the move! From now on, the GPT-4o voice mode will be launched to some users, and emotional intelligence will make machine conversations more humane

Read on