Human-computer interaction (HCI) is one of the core areas of modern technology development. As technology continues to advance, so do the forms of human-computer interaction. From the early days of command-line interfaces, to graphical user interfaces (GUIs), to today's voice interactions and chatbots, human-machine interactions are becoming more natural and intuitive. As an early AI product manager, I have a strong interest in human-computer interaction, especially natural language processing (NLP)-related products and voice interaction, as well as product design and interactive experience design for intelligent assistants. Today, I would like to discuss with you the future of human-computer interaction around the just-released ChatGPT 4o.

Disrupting the Future: How ChatGPT 4o is revolutionizing the rules of human-computer interaction

1. Design of multimodal interaction

Traditional voice assistants rely primarily on a single voice input and output. However, with the development of technology, multimodal interaction design has gradually become mainstream. This design combines multiple input and output methods such as voice, vision, and touch, making human-computer interaction more natural and efficient. For example, in early multimodal interaction design, users could control smart home devices through voice commands, while making more detailed settings through the application interface on a smartphone or tablet. This approach improves the user experience to a certain extent, but it still has many limitations.

二、ChatGPT 4o：真正实现的多模态人机交互

Today, OpenAI has just released ChatGPT 4o, which is a breakthrough in multimodal human-computer interaction in the true sense. As a practitioner who hasn't published in a long time, I'm excited about this technological advancement and can't wait to share my thoughts with you.

The demo of ChatGPT 4o is amazing. Not only is it capable of replying without delay, but it can also be naturally interrupted by the user to handle multiple tasks at the same time. Specifically, ChatGPT 4o has the following notable features:

No delayed responses: Users barely feel any delay when interacting with ChatGPT 4o. It is able to process user input in real-time and give feedback quickly.
Multitasking: ChatGPT 4o is capable of handling multiple tasks at the same time, allowing users to switch between different topics in a single conversation while still maintaining coherence.
Visual memory is compatible with language: ChatGPT 4o can combine visual and verbal input, remember what the user has previously interacted with, and cite that information when appropriate.
Sentiment Analysis & TTS Output: ChatGPT 4o is able to analyze the user's emotional state and output an appropriate, emotionally charged voice response via text-to-speech (TTS) technology.
Delay-free invocation of device functions: ChatGPT 4o can recall mobile phone camera and computer desktop information without delay, enhancing the real-time and flexibility of interaction.

A new approach to human-computer interaction design

Based on the multimodal effect of ChatGPT 4o, I think the future human-computer interaction design will completely break the design concept of the past.

Here are three new approaches to design that I think are:

1) Seamless session flow: In traditional human-computer interaction design, the interaction between users and the system is usually divided into multiple independent sessions. Each time a new interaction begins, the system needs to reload contextual information, which not only increases the burden on the system, but also reduces the continuity and fluidity of the user experience. Future designs will focus on seamless conversation flow, ensuring that users can switch freely between different topics without having to start a new session all over again.

Applications that have been redefined include: smart assistants, where users can seamlessly switch topics in conversations with smart assistants, such as from asking about the weather to booking a restaurant, and the system is able to remember the user's previous preferences and choices; Voice customer service: When users interact with the customer service bot, they can change questions or needs in the middle of the process, and the system can still maintain a coherent contextual understanding and response. Education and training: In a virtual learning environment, students can ask new questions or change learning topics at any time without affecting the continuity of learning progress and system responses.

2) Multimodal memory system: In the future, multimodal human-computer interaction design will pay more attention to the memory and comprehension ability of the system. The system is able to memorize and understand the user's multimodal input (e.g., voice, vision, touch), and intelligently use this memory information in subsequent interactions, improving the personalization and accuracy of interactions.

Applications that have been redefined include: personal assistants: systems that can remember the user's daily habits and preferences, such as frequently used commute routes, favorite music genres, etc., and provide personalized recommendations when appropriate; Healthcare: The system can memorize patients' health data and medical history, and provide more accurate diagnosis and personalized treatment recommendations; Smart home: The system can remember the preferences and daily routines of each member of the family, and automatically adjust the lighting, temperature, music, etc., to create a comfortable living environment.

3) Emotion and Context Perception: With the development of AI technology, future systems will pay more attention to emotion and context understanding. By comprehensively analyzing the user's verbal and non-verbal signals (such as intonation, facial expressions, body posture, etc.), the system can provide more intimate and humanized services.

Applications that have been redefined include mental health support: the system is able to identify the user's emotional state, provide timely psychological support and comfort, and even recommend professional psychological counseling. Early education/elderly companionship: The system can perceive children's emotions and learning status, dynamically adjust the interactive content and methods, and improve the learning and companionship effect. E-commerce assistant: The system can analyze the user's sentiment and provide more personalized and empathetic services to improve customer satisfaction.

epilogue

Although the technological advancement of ChatGPT 4o is mainly the integration of existing technologies, the realization of its interactive experience and fluency is not easy. At that time, the iPhone also integrated a variety of technologies, but through excellent design and optimization, it brought a revolutionary user experience.

Before this, we have rarely seen such a good display of multi-modal interaction effects, and OpenAI is still a big step ahead in this regard. I believe that the technological revolution sparked by this conference will lead to a new wave of AI startups, and at the same time, it will also eliminate some companies that cannot keep up.

As an AI practitioner who has been deeply involved in human-computer interaction for many years, I will continue to pay attention to the latest developments in this field and share my insights and thoughts with you.

This article was originally published by @SingleWin-Win in Everyone is a product manager, and it is forbidden to reprint without permission.

The title image is from Unsplash and is licensed under CC0.

The views in this article only represent the author's own, everyone is a product manager, and the platform only provides information storage space services.

Disrupting the Future: How ChatGPT 4o is revolutionizing the rules of human-computer interaction

1. Design of multimodal interaction

二、ChatGPT 4o：真正实现的多模态人机交互

A new approach to human-computer interaction design

epilogue

Read on

The reason why Apple gave up making cars was exposed! After experiencing ChatGPT, I was afraid of falling behind, so I contacted Rivian

The AI search that ChatGPT did not do is not the next battleground

最强OpenAI发布新ChatGPT-4o,AI领域的突破情感识别+视觉理解

OpenAI overturned the voice assistant overnight! ChatGPT learns to look at screens, and the real-life version of Her is here

Sudden Kill! The Chinese version of Ali ChatGPT is here! I couldn't resist signing up for the experience

Hu Xijin is going to lose his job? Netizens used ChatGPT to imitate "Hu Biao" writing, laughing crazy

Let's talk about ChatGPT-4o from the perspective of human-computer interaction

The iOS version of ChatGPT updates support the app's preferred language setting Chinese

How to make ChatGPT "understand you" better

Risk and Governance of Generative AI – The Case of ChatGPT

This is the biggest update for ChatGPT4o! The press conference didn't mention a word! GPT-4o's image recognition ability is so strong! Even the portrait photo can tell who I am 👍 here

ChatGPT's new feature is online: when chatting, you can directly select network disk files such as OneDrive

ChatGPT is able to help doctors accurately analyze clinical studies and medical records

ChatGPT consumes more than 500,000 kWh of electricity per day, and it is energy that is stuck in the development of AI?

Terror! Imploring a Stanford professor to help it "break from prison"? ChatGPT-4 has emerged since

and ChatGPT engage in yellow young people