laitimes

It's another crazy week, and the whole world is "AI numb"

It's another crazy week, and the whole world is "AI numb"

Geek Park

2024-05-19 13:51Posted on the official account of Beijing Geek Park

Is it because of Labor Day? All companies around the world have chosen to release the latest AI products and technologies in the third week of May.

A frantic week!

After a long "see you on Monday", OpenAI used GPT-4o to steal attention. At the press conference 24 hours later, Google did not "drop the chain", and the Veo video model, Project Astra, and the new version of AI search all left amazing memory points.

The two most powerful but very different press conferences on the surface only reached a consensus on one point - the super voice assistant (GPT-4o and Project Astra) like the movie "Her", which also announced the competition point of the large model track in 2024 - the multimodal fusion technology behind GPT-4o and Astra.

On the other side of the ocean, the belated ByteDance released the bean bag model family, and Tencent finally handed over the answer sheet of "GPTs" and the large model assistant app.

Today, whether it is a large manufacturer with a "drag family" or a start-up company with "no baggage", the product form has been repeatedly expanded: from chatbots, to AI search, "GPTs", multimodal voice assistants... Playing is getting more and more flowery.

I don't know if you're numb or not, but we're happily numb anyway.

Mon. 13 May

AI avatars/humanoid agents are evolving rapidly: Unitree releases Unitree G1 humanoid robot

Starting at ¥99,000, it is much lower than the industry selling price

Large language models are out of the circle, making humanoid robots that can achieve embodied intelligence popular.

In August 2023, Unitree released the humanoid robot H1 with a pre-sale price of $90,000 (about 650,000 yuan). This week, Unitree launched a new version of the humanoid robot Unitree G1, which is more than 80% cheaper, starting at 99,000 yuan.

Compared to the first generation, the Unitree G1 has significantly improved capabilities: opening bottle caps, smashing walnuts, flipping pots, running, dancing sticks, curling up ...... In the product demonstration video released by Unitree, the body and legs can rotate nearly 360°, and Unitree G1 can flexibly complete a series of tasks with mechanical arms like a human.

It's another crazy week, and the whole world is "AI numb"

Image source: Unitree Technology

开源闭源并进:零一万物发布千亿参数 Yi-Large 模型

Open source builds an ecosystem and explores the upper limit of AI from a closed source

On the first anniversary of the establishment of Zero One Everything, its 100 billion parameter Yi-Large closed-source model was officially unveiled, and reached the first place in the global large model Win Rate in Stanford's latest AlpacaEval 2.0.

At the same time, the Yi-34B and Yi-9B/6B small and medium-sized open-source model versions released earlier were upgraded to the Yi-1.5 series, and each version achieved the best SOTA performance in the same size.

It's another crazy week, and the whole world is "AI numb"

Yi Large Model API Open Platform | Image source: Zero One Everything

Tuesday, May 14

"Her" is really here: "GPT-4o" takes voice assistants to new heights

Is the multimodal fusion model just an engineering advancement?

OpenAI has released a new generation of its flagship model, GPT-4o, which allows people to talk to ChatGPT on their phones just like they would with Siri and other voice assistants. The difference is that ChatGPT's voice assistant has made a qualitative leap in comprehension, can also analyze and discuss the images or videos it sees, and can recognize the different emotions of the user when speaking.

With GPT-4o, ChatGPT can guide you through math problems based on your ideas and tell a bedtime story according to your real-time requirements. OpenAI says GPT-4o is about creating a deeper and more natural understanding of audio, images, and text, again to move toward the AGI goal.

The release of OpenAI has also caused widespread discussion in the AI circle. The industry generally believes that GPT-4o's amazing features lie in two points: 1) it shortens the latency of voice interaction to 300ms; 2) End-to-end multimodal native large model

P.S.: Leave an observation assignment: Will GPT-4o significantly improve ChatGPT's daily activity and user stickiness? With intelligent assistants with higher AI capabilities, will the 2016 100-box war be swept back? Will a voice assistant like Siri become a must-win for entry?

It's another crazy week, and the whole world is "AI numb"

Image source: OpenAI

Wednesday, May 15

There's not a single product that hasn't been reinvented by AI: Google is fully in the Gemini era

Sora is still an optional question for tech giants, but multimodal fusion is a must-do for large model companies.

After mentioning AI 121 times, the Google I/O 2024 developer conference released a basket of things, from search to Gmail, TPU, to voice assistant Astra and multimodal video model Veo, etc.

Three products are worth paying attention to:

Project Astra's multimodal AI assistant. If the competition point in 2023 is Copilot, in 2024, the competition point will evolve into a multimodal fusion agent, behind which is the technical path migration from LLM (large language model) to One-network-multimodality (multi-modal large model under one framework), and finally towards general artificial intelligence.

It's another crazy week, and the whole world is "AI numb"

A multimodal voice assistant is talking to users in real time|Image source: Google

Veo: Veo can create AI-generated videos based on text, images, and video prompts, and is coming to YouTube soon, helping creators quickly create more professional-quality videos. AI Search: Google has shown how AI can be further integrated into search, allowing for more complex forms of research and planning (e.g., generating a three-day vegetarian plan based on a query).

It's another crazy week, and the whole world is "AI numb"

Image source: Google Blackboard News

How to play the bytes of large models: If you are not ready, you will not send it, otherwise you will release 9 models at a time

The model was sent late, and the application was done a lot, what do you think?

ByteDance's self-developed large model bean bag large model (formerly Skylark large model) family made its debut with 9 models. ByteDance said that the reason why these 9 models are based on the number and demand of back-end model calls, and the strongest general model, cost-effective choice, and scenario optimization model have been made.

The inference price of the bean bag model has become a highlight, and its main model is priced at only 0.0008 yuan/1000 tokens in the enterprise market, and 0.8 centimeters can process more than 1500 Chinese characters.

It is worth noting that the byte conference did not introduce the model parameters, data and corpus, and did not even give the evaluation data of the bean bag model, but directly subdivided the model capabilities vertically in the scene. Bytes may be building user feedback and data feedback, so as to make more accurate scenarios and services. Based on the feedback of different data chains, the next action of the product or model is decided.

In the past six months, ByteDance has launched AI applications covering almost all popular tracks, such as "Doubao", AI application development platform "Button", interactive entertainment application "Cat Box", as well as Star Painting, Instant Dream, etc.

It's another crazy week, and the whole world is "AI numb"

Image source: ByteDance

The hidden player of the large model team: DeepSeek Chat is filed through the large model

Lower costs! I'm taking the lead!

There are no more than 5 companies with more than 10,000 GPUs in China, and High-Flyer, a 100-billion-dollar quantitative fund, is one of them. Unexpectedly, I pressed the admission ticket to the large model in advance - hoarding cards, but High-Flyer is serious about making a large model.

Since January of this year, High-Flyer's DeepSeek model has been frequently used as a benchmark object for discussion in the open source community. This month, High-Flyer open-sourced the second-generation MoE model, DeepSeek-V2, which features more parameters, stronger capabilities, and lower costs. Under the premise that its capability is close to the first-echelon closed-source model, the inference cost is reduced to 1 yuan per million token, that is, the cost is one-seventh of that of Llama3 70B and one-seventieth of GPT-4 Turbo. Moreover, DeepSeek v2 is also profitable.

After the release of DeepSeek v2, it attracted a price war for large models, and Zhipu, Facewall, and Byte successively announced the price reduction of model inference. Behind this is a series of advances in model architecture, systems, and engineering. Have you noticed that OpenAI's price has also been reduced by more than 10 times?

Anyway, now, DeepSeek-V2 has passed the filing, and you can experience it online, what is the secret strength of stealth players?

It's another crazy week, and the whole world is "AI numb"

Image source: DeepSeek

16 May

Wensheng diagram, Wensheng video: DiT architecture is being widely embraced

Open source has great power

Tencent's hybrid Wensheng diagram model has been released on the Hugging Face platform and Github, including complete models such as model weights, inference codes, and model algorithms, which can be used by enterprises and individual developers for free commercial use.

The mixed element Wensheng graph large model is a Chinese native DiT (Diffusion Models with transformer) architecture Wensheng graph open source model, which is also the same architecture and key technology of Sora and Stable Diffusion 3, and is a diffusion model based on Transformer architecture. In the past, the visual generative diffusion model was mainly based on the U-Net architecture, but with the increase of the number of parameters, the diffusion model based on the Transformer architecture shows better scalability, which helps to further improve the quality and efficiency of the model.

Fri. 17 May

"GPTs" and large model assistant apps: a must-have for big manufacturers, Tencent version is here

It has been connected to more than 600 Tencent internal businesses and scenarios

This week, Tencent announced a series of progress in the development of large models and application products.

Tencent has upgraded its hybrid model and launched three model versions with different characteristics in terms of quality and cost, and has more than 600 business access models.

At the tool layer, Tencent Cloud has released three PaaS toolchains, namely the Large Model Knowledge Engine, the Image Creation Engine, and the Video Creation Engine, to simplify the process of data access, model fine-tuning, and application development.

It is worth noting that Tencent has finally launched its own "GPTs" - meta-tools, where users can directly create agents using Tencent's official plug-ins and knowledge base. After the development is completed, the agent will be distributed to QQ, WeChat customer service, Tencent Cloud and other channels with one click. Tencent will also launch a new assistant app "Tencent Yuanbao" based on the hybrid model at the end of the month.

It's another crazy week, and the whole world is "AI numb"

The official website of Tencent Metaverse is open for application for trial

Write at the end:

This week, along with the above-mentioned AI products and technology releases, there are also "underwater operations" of major AI companies.

什么都无法阻挡 Scaling Law 的脚步:

Ilya Sutskever, co-founder and chief scientist at OpenAI, who dominates the hyperalignment, announced on the social platform X that he is leaving the company. Subsequently, Jan Leike, one of the leaders of the super-alignment team, also announced his departure, tweeting that the super-alignment team was marginalized within the company and did not have access to computing resources to do research. AWS CEO Adam Selipsky is leaving, or AWS is missing out on the best opportunity for AI investment and R&D. Microsoft has announced that it will invest 4 billion euros in France, most of which will be concentrated in the AI field, and Musk's xAI has spent nearly $10 billion to rent Oracle AI servers

AI applications are expanding the imagination:

Anthropic, which can use large models at the enterprise level, has recruited a CTO from Instagram to make a product, or entered the ToC APP. Meta Platforms is developing an AI-powered headset project with a camera that will enable the headset to recognize objects in the physical world around the wearer. Sam Altman was also recently revealed to be exploring the development of an AI headset with a camera with former Apple design guru Jony Ive, "and soon you will have eyes in your ears too".

It's another crazy week, and the whole world is "AI numb"

Microsoft Build official website|Image source: Microsoft

Next week, in the early morning of May 22, Beijing time, Microsoft, another major player in AI, will hold the Hybrid: Microsoft Build conference in Seattle. The official website has a large "How will AI shape your future?" on the official website, emphasizing the theme of this year's conference.

Money never sleeps, and so does AI.

  • It's another crazy week, and the whole world is "AI numb"
  • It's another crazy week, and the whole world is "AI numb"
  • It's another crazy week, and the whole world is "AI numb"
  • It's another crazy week, and the whole world is "AI numb"
  • It's another crazy week, and the whole world is "AI numb"
  • It's another crazy week, and the whole world is "AI numb"
  • It's another crazy week, and the whole world is "AI numb"
  • It's another crazy week, and the whole world is "AI numb"
  • It's another crazy week, and the whole world is "AI numb"

Read on