laitimes

Google DevConference 2024: Three Words Without AI

author:TechNode

The I/O Conference (developer conference) is Google's annual muscle show moment, and it is also a big stage that should not be missed in the era of artificial intelligence. In the early hours of this morning, the new I/O conference opened at Google's headquarters in California. It is reported that after the end of this conference, AI was mentioned 121 times (including text, it should actually be more than that), which comprehensively demonstrated how Google integrates AI technology into its core products, from search engines to mobile operating systems to dedicated hardware, AI applications are everywhere.

In particular, the release of the Gemini 1.5 Pro marks an important step forward for Google in dealing with large-scale data and improving the user experience. In addition, Google has launched a lighter model, Gemini 1.5 Flash, and a further upgraded open-source model, Gemma 2, which not only demonstrates Google's innovation in AI technology, but also demonstrates its determination to promote the popularization and application of AI technology.

Google DevConference 2024: Three Words Without AI

Gemini Family Bucket Iteration

Gemini 1.5 Pro is the highlight of this year's event. Google has increased the context length of the Gemini 1.5 Pro from 1 million tokens to 2 million tokens, an upgrade that will greatly enhance its data processing capabilities, making the model more comfortable with more complex and large data. At the same time, Google also announced that Gemini 1.5 Pro will fully support Workspace.

It is reported that the new Gemini 1.5 Pro has native audio understanding, system instructions, JSON mode, etc., and is able to use video computer vision to analyze images and audio videos, which makes it have human-level visual perception. Using a deep neural network, Gemini 1.5 Pro can recognize objects, scenes, and people in images with superhuman precision. In addition, Google announced that the Gemini 1.5 Pro will be available to developers around the world.

At the same time, in order to be more responsive and cost-effective, Google has also introduced a lighter model, Gemini 1.5 Flash, which excels in snippet generation, chat applications, image and video captioning, and extracting data from long documents and tables, targeting a broad developer base.

It is worth mentioning that Google will further upgrade the open-source model Gemma 2. It is understood that the efficient design of the Gemma 2 model makes it require less than half the amount of computation required by similar models, making it easy for a wider range of users to deploy and enjoy cost-effectiveness.

Google also announced the introduction of a trip planning feature for the Gemini platform. This feature will combine personal information and public travel information to help users book and plan vacation itineraries such as flights and hotels. According to Google, Gemini can quickly dig up specific details such as flight times and hotel bookings based on user prompts, and make suitable vacation plans in just a few seconds. Compared to manually planning a trip that can take hours, days, or even weeks, Gemini can do this almost instantaneously.

Google says the new trip planning feature will be coming to the Gemini Advanced platform in the coming months.

Search engine upgrades

Google believes that AI is the future of search. To this end, Google began to "move the knife" to the search engine.

Google is about to roll out an "AI Overview" to users in the U.S. and around the world – AI-generated snippets will appear at the top of search results, and that's just the beginning of how AI is changing search.

Liz Reid, head of search at Google, said: "What we're seeing from generative AI is that Google can do more searches for you. "For the past few years, she's been working on all parts of AI search." It can do a lot of the hard work from searching, so you can focus on the parts you want to get things done, or the parts of exploration that you find exciting. ”

An overview of the AI designed to give users an overview of the answers to their queries, as well as links to resources for more information. Google is using its Gemini AI to figure out what you're asking, whether you're typing, talking, taking a photo, or shooting a video. However, Reid says that not every search requires so much AI, and not every search will get it. "If you just want to navigate to the URL, you can search for Walmart and then head to walmart.com. Adding AI isn't really beneficial. She thinks the most helpful thing about Gemini is in more complex situations where you either need to do a lot of searching or even get a rough preview at the beginning.

For local searches, with Gemini, "we can do things like 'find the best yoga or pilates studios in Boston within a half-hour walk of Beacon Hill, rated over four stars.'" Perhaps, she continued, you also want to know which ones will be most helpful to first-time visitors. For users, this could mean a whole new way of interacting with the internet: less typing, fewer tags, and more search engine chats, and getting information will be more efficient.

Google DevConference 2024: Three Words Without AI

针对竞对推出 Project Astra 和 Veo

针对昨日OpenAI发布的GPT-4o,谷歌也发布了对标的大模型Project Astra。

According to reports, Google has developed an agent prototype based on Gemini, which can process information faster by continuously encoding video frames, combining video and voice inputs into an event timeline, and caching this information for efficient calls. Through the speech model, Google has also strengthened the agent's pronunciation, providing the agent with a wider range of intonation. These agents can better understand the context they use and respond quickly in conversations.

Google DevConference 2024: Three Words Without AI

In addition, in order to counter Sora, Google's AI video generation software Veo also supports the use of text to generate videos, and can create 1080p videos over 60s, while also using a variety of cinematic styles and better understanding of natural language.

Google says that creators can use a variety of filmmaking terminology to guide Veo to achieve the desired visual effects, such as "time-lapse" and "landscape aerial photography," and reduce the time it takes to adjust the prompt words. In addition, Veo supports the ability to extend video. If the creator is not satisfied with the existing video length, they can let Veo automatically expand the video, or add prompts to generate a longer video.

Google has already opened the trial channel, and in the future, it is also preparing to add some Veo features to YouTube's short video module.

Android 15

There is no doubt that AI is Google's top priority at this conference. On the mobile operating system, Android 15 is powered by Gemini, including the on-device features it will soon offer.

The current Android 15 pre-release brings new features such as more in-app camera controls, partial screen sharing and loudness control, as well as improvements to PDF, NFC, and satellite connectivity support. The new additions to Google's mobile operating system focus on productivity, user privacy and security, communication and performance, and more.

In addition to incorporating the Google Gemini model, Android 15 also adds a number of new features, such as Low Light Enhancement, a new auto exposure mode that is different from the way the Night mode camera creates still images, which will improve night scene performance by compositing multiple images. Low-light enhancement focuses on improving the camera preview screen so that users can better frame and frame their shots in low-light environments, or scan QR codes in low-light environments.

Currently, Android 15 developer and beta is only available on certain Google Pixel devices, from Pixel 6 to Pixel 8 Pro, as well as Pixel Fold and Pixel Tablet.

Sixth-generation TPU Came out

TPU (Tensor Processor) is a special chip customized by Google for machine learning, which originated eleven years ago and can be seen in many of Google's products and services, making a great contribution to the establishment of Google's AI empire.

Google says the new Trillium is able to train next-generation AI models at a faster rate while reducing latency and costs. Compared to the previous generation of TPUs v5e, Trillium TPUs deliver 4.7x higher peak compute performance per chip, double the high-bandwidth memory (HBM) capacity and bandwidth, and double the chip-to-chip interconnect (ICI) bandwidth. As Google's most sustainable TPU to date, it is more than 67% more energy efficient than its predecessor.

epilogue

A day later, Google followed its rival OpenAI and intensively launched a series of AI products or services. Here, Google not only demonstrated its breakthrough in AI technology, but also demonstrated its determination to integrate AI into all aspects of daily life, especially productivity.

In response, Nvidia senior scientist Jim Fan expressed his reflections, he believes that Google is doing one thing right, "they are finally starting to make a serious effort to integrate artificial intelligence into the search box," he said, and Google's strongest moat is distribution, "Gemini doesn't have to be the best model to be the most widely used model in the world." ”

Read on