What are some of the key models and applications released this week? |Smart Weekly Report (I)

Written by: Jiang Ruijie, Zhang Siyu, Lu Yanjun, Xu Tao, Yang Qiuqiu

Editor: Wu Yangyang

OpenAI's AI search product hasn't been released for a week, and new rumors say it will be released by next Tuesday, because Tuesday is Google's I/O developer conference. And Microsoft will hold its Build developer conference on Thursday, which means that even if OpenAI doesn't release ChatGPT Search by then, next week will be full of confrontations.

Some of the edges have already been unleashed this week. In the past 5 days, at least 4 noteworthy models have been released around the world, one of which is AlphaFold 3 released by DeepMind, which may still not catch up with human scientists in terms of accuracy, but it has far surpassed humans in the prediction dimension. To put it simply, when this model for predicting molecular structure was first introduced in 2018, it could only predict two-dimensional protein sequences. After 3 years, it began to be able to predict the three-dimensional structure of proteins; In another two years, 2023, it will be able to predict more molecules than proteins, including RNA and small molecule "ligands"; Now, its latest model can not only predict the structure of these molecules, but also dynamically predict the interactions between these molecules. With this tool, the efficiency of new drug research and development will be greatly improved, which will be a business worth hundreds of billions of dollars.

The brief history of a molecular structure prediction model here is another reminder about AI: the bubble exists, but innovation is far from stopping. And this innovation is in all dimensions. This week, a Chinese company called Deep Quest also released a new model that, surprisingly, costs only about 1/100th the cost of GPT-4. In addition to this cost improvement, more and more companies are trying to apply AI to various industries, such as Altera, a game agent company, trying to develop AI for game companies that can play games with players; AI finance company Daloopa is experimenting with AI to extract and organize data for analysts from financial reports and investor presentations; Lexion, a contract automation company, enables people in legal, sales, IT, HR, and finance departments to create professional documents and ask questions about the content of the documents using natural language. Rad AI's new product can automatically identify patients who need follow-up on their health status and remind them to follow up on their health status via email, text message, or mobile phone.

Of course, the faster the progress in technology and application, the greater the norms and pressures from reality. This week, TikTok announced that it will launch an AI auto-tagging feature to ensure that content it identifies as AI-generated videos is tagged accordingly, becoming the first social media platform in the world to automatically tag AI-generated content. OpenAI has also released an AI detection tool that can identify whether an image has been detected by its DALL· E3 model generation.

These regulations are good news for the industry, but not necessarily from the government. This week, it was reported that the U.S. government is considering new regulatory measures to restrict the export of proprietary or closed-source AI models, and has tentative plans to restrict China's access to advanced AI models, including ChatGPT, especially if AI can be used to design the proteins needed to make biological weapons — exactly what DeepMind's Alphafold series of models released this week is capable of doing. If these models are restricted from exporting, teams working on AI4Science (AI for Science) could be impacted.

Again, due to the length of the report, our weekly report will be divided into two parts, the first part will focus on new models and new applications, and the second part will focus on new financings and some company developments. The following is the first part.

Key Points

New model

AlphaFold 3 was released, and a $100 billion business came;

The Song of the Dead ©型MAI-1,由Inflection创始人挂帅;

Alibaba Cloud released Tongyi Qianwen 2.5, benchmarking against GPT-4 Turbo;

"DeepSeek" released a low-cost model DeepSeek-V2, and the API price is only 1/100 of GPT-4;

Hugging Face发布机器人开源代码；

New applications

Google尝试让iPhone用户也用上Circle to Search；

Grok AI summarizes news in X;

TikTok will automatically tag AI-generated content;

OpenAI releases AI detector.

New model

AlphaFold 3 was released, and a $100 billion business came

On May 9, Google DeepMind and Isomorphic Labs released a new AI model for drug discovery, AlphaFold 3, which can accurately predict the structure and interactions of molecules such as proteins, DNA, RNA, and small molecule ligands (many drugs fall into this category).

AlphaFold 1 predicts 2D structures and AlphaFold 2 predicts 3D structures

In 2018, AlphaFold 1 was DeepMind's first attempt to predict the three-dimensional structure of proteins through deep learning, successfully predicting the structure of 25 proteins out of 43. However, such predictions are two-dimensional "contact plots" — a two-dimensional matrix that tells researchers only which amino acids are adjacent and labels the connections, but does not tell them exactly where they are or how far they are from each other.

In 2021, DeepMind released AlphaFold 2, which no longer predicts contact maps, but can directly predict the 3D structure of proteins by introducing a completely new model structure. But one of the limitations of AlphaFold 2 is its inability to predict the ability of two proteins that can interact with each other in the real world, and this release of AlphaFold 3 solves this problem.

AlphaFold-latest can predict other molecules in addition to proteins

In October 2023, DeepMind released AlphaFold-latest. Neocortex has reported that compared with AlphaFold 2, in addition to improving the accuracy of protein structure prediction, the new version can also predict the structure of ligands (molecules that bind to "receptor" proteins and cause changes in the way cells communicate), nucleic acids (DNA and RNA), and molecules containing post-translational modifications (PTMs).

AlphaFold 3 predicts not only molecular structure, but also its interactions

The main improvement in the release of AlphaFold 3 is the ability to predict intermolecular interactions. After the release of AlphaFold 3, DeepMind CEO Hassabis told the media that "biology is a dynamic system, and biological properties are manifested through the interactions between different molecules in cells". This means that in genomics studies, AlphaFold 3 can show how DNA or RNA fragments affect cell function through specific chemical changes, thereby promoting the precise regulation of gene activity and preventing and treating diseases associated with gene expression disorders.

According to the paper, the upgrade of AlphaFold 3 is mainly due to the addition of the Diffusion Model, which is used by most of the current image generation models, including Midjourney, Runway, and Sora.

In an interview with Bloomberg, Hassabis said that by accelerating biological research, AlphaFold 3 will open up more than $100 billion in drug research and development. "Neocortex" has reported that DeepMind established a drug discovery company Isomorphic in 2021 ("isomorphism", meaning that information systems and biological systems may have a common structure), and on January 8 this year, Isomorphic Labs announced that it has reached a strategic cooperation with pharmaceutical giants Eli Lilly and Novartis to apply AI to discover new drugs to treat diseases.

Reference Links:

https://blog.google/technology/ai/google-deepmind-isomorphic-alphafold-3-ai-model/#future-cell-biology

Microsoft's self-developed 500 billion parameter large model MAI-1,

由Inflection创始人挂帅

On May 6, it was reported that Microsoft was training a large language model called MAI-1 internally in an effort to compete with OpenAI and Google's strongest models.

According to reports, MAI-1 has 500 billion parameters, while OpenAI's main product GPT-4 and Google's Gemini are estimated to have more than 1.5 trillion parameters. The overall scale of this model is smaller than that of its competitors, but it is already the largest model developed in-house by Microsoft.

What are some of the key models and applications released this week? |Smart Weekly Report (I)

Since Microsoft is the largest external investor in OpenAI, Microsoft's AI applications have been deployed based on GPT-4 for a long time, and it has only developed some small models of its own, such as the Phi-3-mini released last month, which has only 3.8 billion parameters.

The development of MAI-1 was led by Mustafa Suleyman, a recent member of Microsoft and co-founder of DeepMind and AI startup Inflection. In March, Microsoft paid $650 million for Inflection and hired most of its employees to work for itself, including Suleiman, as reported by Neocortex. Suleiman's own business background is more product-oriented than technology, and Microsoft announced at the time that Suleiman will form a new team called Microsoft AI to focus on advancing Microsoft's AI assistant application Copilot and other consumer AI products and related research. Microsoft's new model is not said to be a direct successor to Inflection's model Pi, but it is possible that it will be built on the company's training data.

Later this month, Microsoft will host its 2024 developer conference, where MAI-1 could be shown for the first time at the earliest, though its exact purpose has not yet been determined.

Reference Links:

https://www.theinformation.com/articles/meet-mai-1-microsoft-readies-new-ai-model-to-compete-with-google-openai

Alibaba Cloud released Tongyi Qianwen 2.5,

对标GPT-4 Turbo

On May 9, Alibaba Cloud released the Tongyi Qianwen 2.5 model, open-sourced the 110 billion parameter model Qwen1.5-110B, and announced that the "Tongyi Qianwen" App would be renamed "Tongyi".

通义千问2.5对标GPT-4 Turbo

Tongyi Qianwen 1.0 and 2.0 were released in April and October last year, respectively, with parameter scales of 30 billion and 100 billion respectively. Alibaba Cloud did not announce the parameter scale of Tongyi Qianwen 2.5, and according to the number of open source Qwen1.5-110B parameters on the release date of 110 billion, the parameter scale of Tongyi Qianwen 2.5 exceeded at least this number.

According to Alibaba Cloud, Tongyi Qianwen 2.5 is benchmarked against GPT-4 Turbo, which can process up to 10 million words at a time and up to 100 documents at a time. The specific performance is as follows:

多文件类型支持:PDF,Word,Excel,Mobi等;
Multi-data format analysis: In addition to documents, Tongyi can also understand tables and charts, and can quickly summarize;
Multi-scenario application: suitable for contracts, white papers, research reports, financial reports, etc.;
Easy to use and integrate, support Markdown, JSON format, easy to read and edit.

In addition, Zhou Jingren, CTO of Alibaba Cloud, also introduced Tongyi's multimodal capabilities, such as audio and video comprehension capabilities, which can be applied in the intelligent glance of Alibaba Cloud Disk and New Oriental AI classroom notes.

A month ago, SenseTime also claimed to be benchmarked against GPT-4 Turbo when it released its latest large model "Ririxin 5.0".

Continue on the open source route

On the day of the release of Tongyi Qianwen 2.5, Alibaba Cloud announced the open source of Qwen 1.5-110B (110 billion parameters). This is the largest open-source model of Alibaba Cloud to date, and it is also the largest open-source model in China. Previously, the largest models in the domestic open source field were Alibaba Cloud's Qwen1.5-72B (72 billion parameters) and Shenzhen Yuanxiang Technology's XVERSE-65B (65 billion parameters).

Alibaba Cloud has released more than 10 open-source models to date.

阿里云称,Qwen1.5-110B模型在MMLU、TheoremQA、GPQA等基准测评中超越了Meta的Llama-3-70B(700亿参数)模型。此外,通义还开源了视觉理解模型Qwen-VL、音频理解模型Qwen-Audio、代码模型CodeQwen1.5-7B、混合专家模型Qwen1.5-MoE。

效仿Gemini和Claude，通义千问也给模型划分了size

In order to meet the needs of users in different scenarios, Tongyi has launched 8 large language models with parameter sizes ranging from 500 million to 110 billion: small-size models such as 0.5B, 1.8B, 4B, 7B, and 14B, which can be deployed on mobile phones, PCs and other devices (similar to Gemini nano and Claude Haiku); Large models such as 72B and 110B support enterprise and scientific applications (similar to Gemini Ultra and Claude Opus); Medium-sized sizes such as the 32B try to find a balance between performance, efficiency, and memory footprint (similar to the Gemini Pro and Claude Sonnet).

The B-end is the focus of the business, and the C-end application is renamed

According to Alibaba Cloud, Tongyi has served more than 90,000 enterprises through Alibaba Cloud and more than 2.2 million enterprises through DingTalk. Xiaomi's AI assistant "Xiao Ai" has cooperated with Tongyi Model in the fields of image generation and image understanding; Weibo, Zhongan Insurance, Perfect World Games and other companies have also announced access to the Tongyi model.

For enterprises, Alibaba Cloud released version 2.0 of the Bailian platform to provide enterprise-level search enhancement (RAG) services (Note: To understand what RAG search enhancement is, please read "OpenAI, Google, and Kimi are all "Perplexity", who is Perplexity?). To enhance the knowledge of large models with enterprise data, and provide exclusive knowledge base and retrieval services. Alibaba Cloud's Tongyi Lingma has also launched the Enterprise Edition, which is an intelligent coding assistant that masters nearly 200 programming languages.

In terms of C-end business, Tongyi Qianwen App was fully upgraded and renamed "Tongyi App".

Reference Links:

https://mp.weixin.qq.com/s/hU5YDkjiAsAYl8h2akl14Q

in-depth search for the release of low-cost models,

The API price is only 1/100th of GPT-4

On May 6, DeepSeek, an AI company under High-Flyer Quantitative, launched the second-generation Mixture of Experts (MoE) open-source model DeepSeek-V2, with a total parameter of 236 billion, support 128K context windows, and performance benchmarking GPT-4-0613.

The API pricing of DeepSeek-V2 is 1 yuan per million tokens and 2 yuan for output, while the API price for 128K contextual GPT-4 Turbo is 72 yuan per million tokens input and 217 yuan for output - the pricing of DeepSeek-V2 is equivalent to 1/100 of GPT-4.

The parent company of Deep Quest is a quantitative investment company

The parent company of Deep Quest is High-Flyer Quant, a hedge fund that uses AI to invest and was founded in 2015 by Liang Wenfeng and Xu Jin. In 2021, the scale of fund management quantified by High-Flyer reached 100 billion yuan, and at the beginning of 2023, the scale will be around 60 billion yuan.

Most of the members of the High-Flyer quantification core team are from Zhejiang University. Jin Xu holds a Ph.D. in Signal and Information Processing from Zhejiang University, and worked in Huawei Shanghai Research Institute and other companies before starting his business. Liang Wenfeng once studied artificial intelligence at Zhejiang University.

In 2018, High-Flyer Quant began to use machine learning, deep learning and other technologies for portfolio optimization. In 2021, High-Flyer Quant developed its own deep learning training platform "Firefly No. 2", with an investment of 1 billion yuan and about 10,000 NVIDIA A100 chips. In May 2023, High-Flyer Quant established an independent research organization "Deep Quest" to enter the field of generative AI, with the goal of "exploring the nature of AGI".

In January this year, DeepSeek open-sourced MoE model DeepSeek has three parameter sizes: 2 billion, 16 billion, and 145 billion.

Reference Links:

https://mp.weixin.qq.com/s/oJ3qdjE1KmcrC6NaMtdpqw

Hugging Face发布机器人开源代码

Hugging Face, which has always focused on the software field, has also entered the field of robotics. On May 6, Hugging Face's robotics project lead, Remi Cadene, announced the launch of the LeRobot open-source codebase, describing it as "Transformer architecture for NLP (natural language processing)".

Remi Cadene是谁？

Remi Caden joined Hugging Face two months ago and began building a team in Paris, France, primarily recruiting embodied robotics engineers. Previously, he worked as a scientist in Tesla's self-driving car division and the humanoid robot Optimus team. Cathene said he will build a true open-source robotics project at Hugging Face because "the next step in AI development is to apply it to the physical world," and Cathene said the team "is doing community-driven work around robotic AI and making it open to everyone."

What is LeRobot?

LeRobot is a versatile library that can be shared, visualize data, and train up-to-date models. Users have access to a large number of pre-trained models to accelerate the project process. In addition, LeRobot integrates seamlessly with physical simulators, allowing developers who do not have physical robot hardware to simulate and test AI models in a virtual environment.

Hugging Face said that LeRobot open source was a strategic decision to avoid concentrating power and innovation in the hands of a few companies. Hugging Face is a New York-based AI unicorn company valued at about $4.5 billion, and its main business is software, including open-source AI model libraries and AI assistants Hugging Chat Assistants.

Reference Links:

https://venturebeat.com/automation/hugging-face-launches-lerobot-open-source-robotics-code-library/

New applications

Google尝试让iPhone用户也用上Circle to Search

On May 8, Minsang Choi, Google's Google Lens design manager, shared a shortcut developed by Google's App iOS team on the social platform X that allows users to implement the Circle to Search feature on the iPhone 15 Pro. Circle to Search is a visual search feature launched by Google earlier this year that allows users to take a screenshot of the screen and find what the user chooses to find.

However, this feature was previously exclusive to Android, and was first available on Samsung's first AI phone, the Galaxy S24. Nowadays, Google is trying to implement a similar feature on Apple's iOS devices with Google Lens. iOS users can create iOS shortcuts, execute Google Lens programs after taking screenshots, quickly copy text, translate, or perform visual searches, and users can also add text for further queries.

However, the shortcut only supports searching for the full screenshot, and users can't circle or draw what they're looking for on the screenshot individually. In addition, the Chrome version of Circle to Search or similar features is also being tested on the browser or will be rolled out later.

Apple is currently in talks with companies including OpenAI, Google, Baidu, and others to apply its large models to Apple devices. At the same time, however, Apple is also developing its own models, especially those that can read the user's screen. For example, ReALM (Reference Resolution As Language Modeling), which has been reported in the "neocortex", focuses on how to get large models to understand the visual elements on the mobile phone screen, and Ferret-UI, which can "understand" the UI interface of the mobile phone and perform corresponding tasks.

Reference Links:

https://9to5google.com/2024/05/07/google-lens-circle-to-search-iphone/

Grok AI summarizes the news on X

On May 3, the social platform X announced the launch of a new feature called "Stories" in the "For You" section, which works through the Grok model developed by Elon Musk's xAI company, which can summarize the summary of current hot news and events for users. Currently, the feature is only available to paid X Premium subscribers.

Musk said his idea is to use artificial intelligence to fuse breaking news and user reviews to build a real-time event summary, and then encourage users to get more information about the event by chatting with Grok. However, instead of summarizing summaries from news reports, Grok aggregates information based on posts made by users on platform X. This may be to allow Grok to avoid complaints from press publishers.

Many web browsers have also started to use AI to summarize search results and generate summary information, including Google's generative AI search platform SGE, Microsoft's Bing browser, and Arc browser.

Reference Links:

https://techcrunch.com/2024/05/03/x-launches-stories-on-x-delivering-news-summarized-by-grok-ai/

TikTok will automatically tag AI-generated content

On May 9, TikTok announced that it would launch an AI auto-tagging feature to ensure that content it identifies as AI-generated videos is tagged accordingly, including using Adobe's Firefly tool, TikTok's own AI image generator, and OpenAI's Dall· Content produced will be identified and tagged. This makes TikTok the first social media platform to automatically tag certain AI-generated content. Big companies such as Google, Microsoft, Sony, OpenAI and others are also exploring embedding the technology into their AI tools, Meta said earlier this month that it would begin detecting AI-generated content from companies like Google, OpenAI, Microsoft, Adobe, Midjourney and Shutterstock and labeling it "AI-made."

Reference Links:

https://techcrunch.com/2024/05/09/tiktok-automatically-label-ai-generated-content-created-other-platforms/

OpenAI released an AI detector and plans to participate in the development of industry AI detection standards

5月7日，OpenAI宣布推出专用AI检测工具，能识别某张图片是否由其DALL· E 3 模型生成，准确率高达98%。同日，OpenAI还宣布加入C2PA执行委员会，计划参与制定C2PA标准。 C2PA全称是「内容来源和真实性联盟」（the Coalition for Content Provenance and Authenticity），由Adobe、ARM、英特尔、微软和数据验真平台Truepic联合组建。

OpenAI has been in DALL· E3 has incorporated C2PA metadata into the images created and modified, and plans to include C2PA data in the videos generated by the video generation model Sora for recognition purposes after the widespread adoption of the video generation model Sora. To avoid deletion or tampering with the placed C2PA data, OpenAI is developing new methods, including implementing tamper-proof watermarks and developing detection classifiers. The latter works by using AI to determine whether a piece of content was generated by AI.

OpenAI opened access to the first batch of external testers on May 7. The classifier can effectively handle common modifications, such as compression, cropping, saturation changes, etc., but the accuracy of judging other modifications decreases, and if the image is generated by other AI models, the accuracy of the classifier will also decrease.

Reference Links:

https://openai.com/index/understanding-the-source-of-what-we-see-and-hear-online/

-END-

OpenAI、Google、Kimi都在「Perplexity化」,谁是Perplexity?

During the May Day period, these two AIs became popular|Smart Weekly (Part I)

Humanoid Robots Get Hot in Silicon Valley|Smart Weekly (Part II)

Rokid: "To be the next iPhone or the next Nokia" makes people anxious

What are some of the key models and applications released this week? |Smart Weekly Report (I)

Read on

Xiao Xin shared: cellular automata model

The man stole 800 yuan of mobile phone models and was detained

Only Google's injured world has been achieved, but should the "all-round model" be followed?

Unraveling the Mystery of Memory: Ebbinghaus's Forgetting Curve and Mind Model Playing Cards Help You Grow and Leap

After GPU, NPU becomes the standard configuration again, how do mobile phones and PCs carry large AI models?

Be a sneak peek! ByteDance is unprecedented! The large model is stunningly unveiled, and the price is as low as 99%!

39 million people watched Lei Jun's live test drive; Musk recruits second brain-computer experiment patient; DeepMind launches a large-scale model risk assessment framework

From "sky-high prices" to "fracture prices", large models are about to change

If you want to land a large model, let everyone afford to use it first

Direct interaction with hundreds of millions of users Third-party AI models accelerate access to the Weibo ecosystem

iFLYTEK Xinghuo large model empowerment, opening up the "new consciousness" of virtual people

When open source meets large models, what kind of changes will occur?

It is said that the senior management of the Tsinghua Department of the large model company has changed

58.com Sun Qiming: How to build a large model of life service vertical? Self-developed + open source with both hands

AI Dimensity Full Push, China's First End-to-End Large Model Mass Production on the Car Xpeng opens the era of AI intelligent driving

The price of large models has fallen, and the Internet-style "turf war" has reappeared, will big factories really lose money?