AI Daily: Fudan and Baidu's new models can generate 1-hour long videos; The new version of ChatGPT for Windows is launched; Two new features have been added to NotebookLM

Welcome to the [AI Daily] column! Here's your guide to exploring the world of AI every day, and every day we present you with hot content in the field of AI, focusing on developers, helping you gain insight into technology trends and understand innovative AI product applications.

Fresh AI productsClick to learn: https://top.aibase.com/

1. For paid users! The new ChatGPT for Windows version is online: shortcut keys can summon AI assistants

OpenAI has launched an early version of the new ChatGPT Windows app to provide a convenient AI assistant experience for paying users. Users can summon ChatGPT by simply pressing the Alt + Space key combination, without having to open a web page every time. The app is currently only available to paid users, but plans to give free users a chance to experience it as well. Although the beta version of the app is not fully functional, OpenAI promises to continue to update to improve the user experience.

AI Daily: Fudan and Baidu's new models can generate 1-hour long videos; The new version of ChatGPT for Windows is launched; Two new features have been added to NotebookLM

【AiBase Summary:】

🌟 The ChatGPT Windows app is only open to paid users and supports a variety of paid account types.

💡 Press the Alt + Space key combination to easily summon ChatGPT for a conversation, which is convenient and fast.

🔧 The beta app is temporarily missing some complications, but will continue to be updated to improve the experience.

2、OpenAI重磅发布GPT-4O-Audio-Preview

OpenAI's latest GPT-4O-Audio-Preview model has demonstrated amazing capabilities in the field of speech processing, not only generating natural and smooth voice responses, but also with sentiment analysis and voice interaction functions, opening up new possibilities for human-computer interaction. The model flexibly supports multiple combinations of modes, and the pricing strategy reflects the complexity of audio processing. The launch will revolutionize customer service, education, entertainment, and assistive technology.

【AiBase Summary:】

🔊 The model has the ability to generate natural and smooth voice responses, and supports voice assistants and virtual customer service applications.

🎶 Ability to analyze audio sentiment, intonation, and pitch for affective computing and user experience analysis.

🗣 It supports voice-to-voice interaction and lays the foundation for a comprehensive voice interaction system.

Details: https://platform.openai.com/docs/guides/audio/quickstart

3、Google升级AI笔记和研究助手NotebookLM

Google has announced a major upgrade to NotebookLM that will enhance the audio overview feature and allow users to more precisely guide AI-generated conversations. Updates include custom audio overviews and background listening capabilities to improve the user experience. The commercial pilot program was launched, looking forward to a wider range of application scenarios.

【AiBase Summary:】

🔊 The audio overview function has been upgraded, and users can customize the content of the guided AI conversation.

🎙️ The background listening function has been added, allowing users to work and listen to audio at the same time.

💼 The commercial pilot program is launched, giving businesses early access to new features and support.

4. Fudan and Baidu have teamed up to create a new AI model, Hallo2, which can generate 4K ultra-high-definition + 1-hour ultra-long video!

The Hallo2AI model, jointly developed by Fudan University and Baidu, will revolutionize the generation of character animation, bringing revolutionary changes to filmmaking, virtual assistants, game development, and other fields. The model combines latent diffusion models, patch-drop data augmentation technology, Gaussian noise enhancement technology, VQGAN discrete codebook prediction technology and text prompt control mechanism, and performs well in generating high-quality, long-sequence character animation.

【AiBase Summary:】

⚙️ The Hallo2 model combines a number of innovative techniques, including patch-drop data augmentation, Gaussian noise augmentation, VQGAN discrete codebook prediction, and text prompt control mechanism.

🌟 Validated on multiple publicly available datasets, Hallo2 excels at generating high-quality, long-sequence character animations beyond existing methods.

🚀 The release of the Hallo2 model marks a new level of AI character animation generation technology, which will further optimize efficiency and explore more application fields in the future.

Details:https://fudan-generative-vision.github.io/hallo2/#/

5. Tesla Optimus robot re-evolution: autonomous navigation, stair climbing, and human interaction become reality

Tesla's latest release of the Optimus robot showcases impressive new capabilities, from autonomous navigation to interacting with humans, highlighting the rapid advancements in artificial intelligence and robotics. Optimus has shown great potential in terms of autonomous navigation capabilities, energy management autonomy, and increased load capacity.

【AiBase Summary:】

🤖 Autonomous navigation capability: Optimus can move freely through complex environments, and multiple robots can work together to optimize navigation efficiency.

🔋 Energy management autonomy: Optimus can automatically locate charging stations for autonomous charging, improving work continuity and efficiency.

🏋️ ♂️ Increased load capacity: Optimus is capable of handling battery trays weighing up to 11 kg, opening up new possibilities for industrial and logistics applications.

6. Google's personnel overhaul: The Gemini team was merged into DeepMind, and the search leadership changed dramatically

Google has recently undergone significant leadership changes and team restructuring, including the K&I team and the Gemini team. The succession of new leaders and the integration of teams will have a significant impact on the company's technology development and AI project collaboration.

【AiBase Summary:】

🌟 Nick Fox takes over as the new head of Google's K&I team and will continue to drive the growth of search, advertising, geography and commerce products.

🔧 Prabhakar Raghavan moves to the role of Chief Technology Officer at Google, where he is committed to providing direction and support for the company's technology development.

🤖 The integration of the Gemini team with Google's DeepMind aims to strengthen the collaboration between the application team and the Gemini model team.

7. Upload a piece of music to become a piano song in seconds! The AMT-APC algorithm generates a masterclass piano performance with one click

Recently, researchers at Musashino University's School of Data Science developed the AMT-APC algorithm, which combines AMT models and fine-tuning techniques to more accurately generate a piano performance version that is close to the original song. The algorithm breaks through the limitations of the existing auto-generated piano music technology, and improves the sound quality fidelity and expressiveness.

【AiBase Summary:】

⭐ The AMT-APC algorithm takes advantage of the AMT model to generate a piano performance version that is closer to the original piece through fine-tuning.

🎵 The core strategy includes pre-training and fine-tuning, enabling the AMT model to process longer pieces of music and generate piano performances that conform to the style of the original song.

🎹 Introduce the concept of style vectors, learn different playing styles, and improve the expressiveness and sound quality fidelity of generated piano music.

Link:https://misya11p.github.io/amt-apc/

8.ᨤ䉆 Ahiriᬱ Aisiriʹʹᬓᬓᬱᬓʳᬱʹʵᬱᬱ ʹᬱ ᬁʝs ʹʝs ʹ

苹果正致力于为iOS18、iPadOS18和macOS15添加新的Apple Intelligence功能,其中包括ChatGPT集成和图像生成。 ChatGPT将为Siri提供更先进的文本和图像生成能力,而Visual Intelligence则将为iPhone16用户提供相机控制按钮功能。 iOS18.1、iPadOS18.1和macOS Sequoia15.1预计将于10月28日发布,而iOS18.2、iPadOS18.2和macOS Sequoia15.2的测试版也将很快推出。

【AiBase Summary:】

🔍 Siri will integrate ChatGPT to provide more advanced text and image generation capabilities.

📸 The iPhone 16 will get the Visual Intelligence feature, which provides information about surrounding objects through the camera control buttons.

🚀 iOS18.2将支持Image Playground图像生成、Genmoji和Image Wand。

9. Only one billion parameters! AI image generation model Meissonic

Meissonic is an open-source AI model that generates high-quality images with only one billion parameters. It adopts a training method of parallel iterative optimization, which makes the image generation speed 99% faster than that of traditional models. Despite the small number of parameters, Meissonic outperformed larger models in several tests and was able to achieve untrained image patching and scaling.

【AiBase Summary:】

🌟 The compact design of Meissonic is suitable for both regular gaming PCs and future mobile devices.

⚡ Using the training method of parallel iterative optimization, Meissonic can generate images 99% faster than traditional models.

🏆 Despite the small number of parameters, Meissonic outperformed larger models in several tests and was able to achieve untrained image patching and scaling.

Details: https://huggingface.co/spaces/MeissonFlow/meissonic

10. Perplexity launched the internal knowledge search function, which allows enterprises to query internal and external data at the same time

Perplexity has launched a new feature, "Internal Knowledge Search," designed to improve business productivity and make it easier for users to access the information they need. Users upload self-selected files to avoid low-value information interfering with search and improve efficiency. The "Space" feature is added to support team file sharing and AI assistant customization.

【AiBase Summary:】

📁 Users can only upload their own files to avoid low-value information interfering with the search and improve efficiency.

🔍 Perplexity launched the "Internal Knowledge Search" function, which allows users to query internal and external data at the same time.

🤝 The "Space" function is added to support team file sharing and AI assistant customization.

11. Pony.ai, an autonomous driving company, plans to go to the United States with an IPO valuation of more than $8.5 billion

Pony.ai plans to go to the United States for an IPO with a valuation of more than $8.5 billion. Founded in 2016, the company focuses on autonomous driving solutions and has completed nine rounds of financing of more than $1 billion. Revenue, mainly from the Robotaxi business, increased by 86% year-on-year in the first half of 2024.

【AiBase Summary:】

🌍 Pony.ai plans to go to the United States for an IPO under the ticker symbol "PONY" at a valuation of more than $8.5 billion.

💰 Founded in 2016, the company has completed nine rounds of financing of more than $1 billion and is valued at $8.5 billion.

🚖 The Robotaxi business is the main source of revenue, with a year-on-year increase of 86% in the first half of 2024.

AI Daily: Fudan and Baidu's new models can generate 1-hour long videos; The new version of ChatGPT for Windows is launched; Two new features have been added to NotebookLM

Read on