laitimes

估值飙升,Mistral AI成微软第二条"大腿"

估值飙升,Mistral AI成微软第二条"大腿"

估值飙升,Mistral AI成微软第二条"大腿"

Produced by | Tiger Sniff Technology Group

Author | Du Yujun

Edit | Wang Yipeng

Header | Visual China

It is a strong rival and replacement of GPT-4, and it is also another weapon deployed by Microsoft. This artificial intelligence startup used its strength to make everyone exclaim "Microsoft has won".

On February 26, Mistral AI, a Paris-based artificial intelligence company, released Mistral Large, a cutting-edge text generation model. The model achieves top-of-the-line inference capabilities and can be used for complex multilingual inference tasks, including text understanding, transformation, and code generation.

On the same day, Microsoft announced a multi-year partnership with Mistral AI, stating that "Mistral AI is a pioneer, it is an innovator and trailblazer. Their commitment to nurturing the open source community and achieving superior performance aligns with Microsoft's commitment to developing trusted, scalable, and responsible AI solutions."

一、Mistral AI的崛起之路

Microsoft isn't the first giant to bet on Mistral AI.

Mistral AI was officially incorporated in May 2023 by alumni of Google's DeepMind and Meta. Just a few weeks after its launch, in June 2023, Mistral AI received a €105 million ($113 million) seed round led by Lightspeed Venture Partners, sending the company's valuation soaring to €240 million. Only half a year later, Mistral AI received another 385 million euros (about $415 million) in financing led by Andreessen Horowitz (a16z), followed by Nvidia, Salesforce, BNP Paribas and many other well-known institutions. And in February 2024, Microsoft will directly invest in Mistral AI.

From start-up to being favored by giants, Mistral AI only took a few months.

With the blessing of funds, the company, which has only about 20 employees, has frequently demonstrated its hard power.

In September, Mistral 7B was released, which was called the "strongest 7 billion parameter open source model" at the time.

Then, in December, Mistral AI silently threw out a magnetic link without a press conference and no publicity warm-up, and released the first open-source MoE large model Mistral 8x7B. With 87GB of seed and 8x7B MoE architecture, Mistral AI's value has skyrocketed, and in a few days, it was valued at $2 billion, an eightfold increase compared to its start-up time.

估值飙升,Mistral AI成微软第二条"大腿"

Mistral AI's open-source MoE large model, Mistral 8x7B

Mistral Large, released on February 26, directly called GPT-4, and scored second only to GPT-4 in the MMLU (an English evaluation dataset containing 57 multiple-choice Q&A tasks, which is currently the mainstream LLM evaluation dataset), becoming the second most widely used model in the world through APIs.

估值飙升,Mistral AI成微软第二条"大腿"

Comparison of GPT-4, Mistral Large (pre-trained), Claude 2, Gemini Pro 1.0, GPT 3.5, and LLaMA 2 70B on MMLU

Mistral Large has new features and benefits:

It is a native speaker fluent in English, French, Spanish, German, and Italian, with a nuanced understanding of grammar and cultural context.

Its 32K markup context window allows for precise recall of information from large documents.

Its precise instruction adherence enables developers to design their moderation policies – which we use to set up system-level moderation for le Chat.

It is capable of making function calls on its own. This, along with the constrained output pattern implemented on la Plateforme, enables large-scale application development and technology stack modernization.

Today, Mistral AI is valued at more than 2 billion euros (about 15.62 billion yuan).

2. GPT-4's "Rival" and "Replacement"

According to the official website of Mistral AI, Mistral Large's performance in knowledge reasoning, multilingual ability, mathematics and coding is close to GPT-4, and has become a strong competitor to GPT-4.

(1) Reasoning and knowledge

Mistral Large demonstrates strong reasoning skills. The following graph shows the performance of the Mistral Large pre-trained model on the standard benchmark.

估值飙升,Mistral AI成微软第二条"大腿"

Market-leading LLM models perform on a wide range of common sense, reasoning, and knowledge benchmarks: MMLU (Massive Multitasking Language in Measured Comprehension), HellaSwag (10-shot), Wino Grande (5-shot), Arc Challenge (5 times), Arc Challenge (25 times), TriviaQA (5 times), and TruthfulQA.

(2) Multilingual ability

Mistral Large has native multilingual capabilities. It significantly outperformed the LLaMA 2 70B in the HellaSwag, Arc Challenge, and MMLU benchmarks in French, German, Spanish, and Italian.

Figure 5: Comparison of Mistral Large, Mixtral 8x7B, and LLaMA 2 70B in French, German, Spanish, and Italian on HellaSwag, Arc Challenge, and MMLU

(3) Mathematics and coding

The Mistral Large exhibits top-notch performance in coding and math tasks. In the table below, we report on the performance of a range of popular benchmarks to evaluate the coding and mathematical performance of some of the top LLM models.

估值飙升,Mistral AI成微软第二条"大腿"

Performance of market-leading LLM models on popular coding and math benchmarks: HumanEval pass@1, MBPP pass@1, Math maj@4, GSM8K maj@8, and GSM8K maj@1

While GPT-4 is better in terms of performance, Mistral Large is cheaper to apply. Currently, the cost of querying Mistral Large is $8 per million input tokens and $24 per million output tokens. In artificial language terminology, tags represent small chunks of words – for example, when processed by an AI model, the word "TechCrunch" will be split into two tokens, "Tech" and "Crunch".

By default, Mistral AI supports a contextual window of 32k markers (typically over 20,000 English words). For comparison, GPT-4 Turbo has a 128k token context window and currently costs $10 per million input tokens and $30 per million output tokens. As a result, Mistral Large is currently 1.25 times cheaper than GPT-4 Turbo, becoming a "replacement" for GPT-4 Turbo. This is a significant savings for high-volume enterprise users.

估值飙升,Mistral AI成微软第二条"大腿"

Cost comparison of Mistral Large with GPT-4 and its cognate products

In addition to Mistral Large, the startup has also launched its own ChatGPT alternative, Le Chat's new service. The chat assistant is currently in beta. The company also plans to launch a paid version of Le Chat for enterprise customers. In addition to centralized billing, enterprise customers are able to define audit mechanisms.

Not only that, but Mistral AI's business model is also looking more and more like OpenAI's. Currently, the company's model is no longer fully open-source as it was when it was founded, but instead offers Mistral Large through a paid API and is priced based on usage. Mistral Large is available through la Platform and is also available on Azure AI. La Plateforme, an access point, is securely hosted on Mistral infrastructure in Europe, enabling developers to create applications and services within the scope of the model, which is also available through Azure AI Studio and Azure Machine Learning.

3. Two-way empowerment with Microsoft

Mistral's state-of-the-art model resources will be housed in the Microsoft Cloud, making it the second company in the world to offer commercial AI models on Microsoft Azure.

Mistral AI's partnership with Microsoft focuses on three core areas:

Supercomputing infrastructure: Microsoft will support Mistral AI with Azure AI supercomputing infrastructure, delivering best-in-class performance and scale for AI training and inference workloads for Mistral AI's flagship models.

Expand to market: Microsoft and Mistral AI will make Mistral AI's advanced models available to customers through Azure AI Studio and Model-as-a-Service (MaaS) in the Azure Machine Learning Model Catalog. In addition to OpenAI models, the model catalog offers a wide selection of open-source and commercial models. Users can use Microsoft Azure Consumption Commitment (MACC) to purchase models for Mistral AI. Azure's AI-optimized infrastructure and enterprise-grade capabilities provide Mistral AI with additional opportunities to promote, sell, and distribute its models to Microsoft customers around the world.

AI research and development: Microsoft and Mistral AI will explore collaborations around training purpose-specific models for specific customers, including European public sector workloads.

In response, Arthur Mensch, CEO of Mistral AI, said that the partnership with Microsoft gives Mistral AI access to Azure, drives its innovative research and real-world applications to new customers around the world, accelerates the development and deployment of next-generation large language models (LLMs), provides Mistral AI with the opportunity to unlock new business opportunities, expand into global markets, and facilitate ongoing research collaborations.

This is not only an important step towards the commercialization of Mistral AI, but also another proof of Microsoft's deepening layout in the AI field. For Microsoft, the open partnership strategy with Mistral AI is a great way to keep Azure customers in its product ecosystem. In addition, Microsoft's long-standing relationship with OpenAI has attracted scrutiny from antitrust regulators in the United States and Europe, and cooperation with large model companies such as Mistral AI can undoubtedly "divert firepower". At present, Microsoft is actively exploring the possibility of cooperating with other AI model points on its cloud computing platform. For example, Microsoft and Meta have partnered to deliver Llama large language models on Azure.

事实上,Mistral AI的在研产品不止于Mistral Large。

Mistral AI's model products are mainly divided into three categories: Mistral Small, Mistral Large, and Mistral Embed. Mistral Small benefits from the same innovations in RAG enablement and function invocation as Mistral Large, which provides cost-effective inference for low-latency workloads, Mistral Large for top-level inference for high-complexity tasks, and Mistral Embed for extracting the most advanced semantics from text excerpt tables.

Read on