9.4 billion! The largest M&A in the field of generative AI is born!

Smart stuff

Author | LI Shuiqing

Edit | Heart

The biggest M&A in generative AI has been born!

Zhidong reported on June 27 that according to the Wall Street Journal today, Databricks, a big data super unicorn, has recently agreed to acquire generative AI startup MosaicML for 1.3 billion US dollars (about 9.4 billion yuan), which has attracted great attention from domestic and foreign capital and intelligent circles.

MosaicML is an AI software company founded in San Francisco in 2021, with a total of 62 employees so far, and only raised $64 million in previous funding. Why can such a small AI company sell tens of billions of yuan? From the news point of view, MosaicML's entrepreneurial team is helmed by a former Intel executive responsible for AI, and it has just opened up a 30 billion parameter large-language model MPT-30B in June this year, which has laid the groundwork for it to become an "OpenAI challenger".

Databricks, also based in San Francisco, acquired MosaicML to help enterprise customers leverage proprietary data to build language models at a lower cost than GPT.

At present, when AI is setting off a new wave, Microsoft, OpenAI, Google and other technology giants are huge, and the domestic "100-model war" has also entered the deep water area. At present, some domestic investors have doubts about the prospects of AI large-model entrepreneurship, such as Zhu Xiaohu, a well-known investor and managing director of GSR Venture Capital, who bluntly said that ChatGPT is very unfriendly to startups, and please give up in the next two or three years. The acquisition of MosaicML may provide a new reference for the current industry circle.

In the era of OpenAI and big models dominated by technology giants, where is the development space for AI startups? Why does a company like MosaicML, which is only two years old, sell for a high price? What is the reference significance of this for the domestic market? This article explores this in depth.

Who is MosaicML? With only 15 researchers, Intel AI bosses started businesses and have open-sourced large models

First, let's take a look at where MosaicML came from.

In terms of scale, MosaicML is not huge. According to official disclosures, MosaicML currently has 62 employees, including only 15 researchers, with offices in San Francisco, New York, Palo Alto and San Diego, and has raised $64 million so far mainly from investors such as Lux Capital and DCVC.

But MosaicML's startup team isn't simple. Naveen Rao, co-founder and CEO of MosaicML, was previously vice president and general manager of Intel's AI Products Group. Rao previously founded AI chip company Nervana, which was acquired by Intel in 2016 for $408 million. Hanling Tang, CTO of MosaicML, is a former senior director of Intel's AI Labs, and it can be said that MosaicML is a proper big start-up.

MosaicML co-founder and CEO Naveen Rao (left) and CTO Hanling Tang (right)

MosaicML has open-sourced its big language model for market review. In May this year, it open-sourced MPT-7B, a large-language model with a scale of 7 billion parameters, followed by MPT-30B, its second open-source large-language model, in June. The company said that although its parameter volume is only 30 billion, which is 1/6 of GPT-3's 175 billion parameters, it performs better than GPT-3 in inference tasks, and can be run more easily on local hardware, and the deployment inference cost is lower.

Rao acknowledges that GPT-4 is superior in most ways, but MosaicML's model offers longer context lengths, which allows for unique use cases, such as having it generate the epilogue to the famous novel The Great Gatsby, at a lower cost.

According to MosaicML, the 30 billion parameter scale is the result of its careful selection to better optimize for GPUs:

It can be easily deployed on a single GPU, with 16-bit precision for an A100 GPU with 80GB of memory, or for a 40GB A100 GPU with 8-bit precision. It is said that the actual effect of the model is better than the more computationally intensive LLaMA and Falcon in many tasks. Rao mentioned in the interview that MosaicML uses a technology called "FlashAttention" to enable users to reason and train faster.

At the same time, the MPT-30B was trained on longer sequences than other models, up to 8000 labels; But including GPT-3, LLaMA and Falcon, each model is only 2,000 markers. Simply put, this means that users can enter longer prompts that may be better suited for data-intensive enterprise applications.

Industries such as healthcare and banking can benefit from MosaicML's ability to interpret and aggregate large amounts of data. For example, in the medical field, the model can interpret laboratory results and gain insight into a patient's medical history by analyzing various inputs. The open source model is more conducive to the security of medical data, and sending it to OpenAI through APIs threatens data security.

Rao says it can help build a model from tens of millions of dollars to hundreds of thousands of dollars.

However, it is difficult to verify MosaicML's claims completely independently, because the three open source big language model projects that Rao talks about (MosaicML, LLaMA, and Falcon) have not yet been tested using authoritative methods such as Stanford's HELM measure.

But what is certain is that MosaicML, led by this group of Intel AI bulls, is trying to overtake beyond OpenAI by aiming at the limitations of the GPT model.

Second, the super unicorn shot, increased the open source big model, and wrestled with OpenAI

Not only is MosaicML a representative of the open source big language model, but its acquirer Databricks is also an important advocate of the open source model.

Databricks, founded in 2013, is a Spark commercialization company co-founded by multiple founders of the famous Spark big data processing system at the AMP laboratory of the University of Berkeley. Compared with Microsoft, Google and other large manufacturers, Databricks can actually only be regarded as a startup. But it completed a $1.6 billion funding round in August 2021 and has become a super unicorn valued at $38 billion that year, surpassing OpenAI's current valuation.

In terms of revenue, according to Databricks' published data, its annual revenue in 2022 exceeded $1 billion, which provided the economic basis for the company's acquisition of MosaicML.

When it comes to AI, Databricks argues that open-source models are comparable to those offered by companies like OpenAI.

In April, Databricks unveiled its updated open-source Dolly big language model, which responds to customer inquiries and answers based on data from Databricks' smart lakehouse. As ChatGPT takes off, Databricks also supports the development of machine learning tools with its lakehouse all-in-one platform that allows data teams to store and protect data. Databricks also provides integration with popular AI frameworks such as TensorFlow, lowering the barrier to entry for enterprises to build and deploy AI models.

GPT-4 is not required for everyone, for every application. Ali Ghodsi, CEO of Databricks, said off-the-shelf models have been trained on internet data and while available, they are full of irrelevant information that could skew the results, and data privacy concerns in models built by external vendors are also alarming.

One of Databricks' core technologies, called Lakehouse, manages data for AI applications and unifies data, analytics, and AI programming tools in one system. When MosaicML is integrated into Databricks, it will become a separate service that helps companies build low-cost language models using proprietary data. For example, companies like Replit that provide programming tools are already using Databricks as data pipelines to feed information to MosaicML to train code-generating models to serve their customers.

It can be seen that Databricks, a data intelligence unicorn, is trying to challenge the market dominance of large companies such as Microsoft, OpenAI, and Google by incorporating AI large model capabilities, providing a new reference for the industry.

However, some people see the MosaicML acquisition as a hype with large models, because Databricks mainly operates Lakehouse, mainly using Spark to process large-scale cluster data, so its value of integrating large languages is not clear. It's unclear how Databricks pays for the acquisition.

Therefore, whether this merger can truly prove the commercial value of MosaicML still needs to wait for time to be verified.

Third, the opportunity points of AI large-model entrepreneurship: vertical industries, data security, and lower costs

At present, when the domestic "100-model war" has entered the deep water area, the MosaicML merger and acquisition case may also bring some new references to the domestic industry.

Regardless of Databrick's real intentions, this case reflects the positive attitude of foreign markets towards AI large-model entrepreneurship. The acquired MosaicML company was founded only two years ago, with only 62 employees, but the purchase price reached a high price of nearly 10 billion yuan, which added a certain confidence to the domestic AI large-model entrepreneurship.

Recently, there have been doubts in the domestic investment circle about generative AI and large model investment. The departure of Meituan co-founder Wang Huiwen due to illness has raised concerns about the difficulty of AI entrepreneurship, and yesterday's debate between Cheetah Mobile CEO Fu Sheng and GSR Venture Capital Managing Director Zhu Xiaohu on ChatGPT in the circle of friends also attracted attention.

Zhu believes that ChatGPT is very unfriendly to startups, please give up in the next two or three years, Fu Sheng complained that "half of Silicon Valley startups have started around ChatGPT, and our investors can still be so ignorant and fearless." Zhu Xiaohu said in the comment area that Fu Sheng was raising the bar.

According to market analyst firm PitchBook Data, the global generative AI market is expected to reach $42.6 billion by the end of this year and $98.1 billion by 2026. Venture capital for generative AI startups increased from $4.8 billion for all of 2022 to $12.7 billion in the first five months of 2023, the report said.

It is worth mentioning that the vertical industry large model market is becoming an important opportunity point, and dense data has become a key element for the success of AI large model entrepreneurship.

Larry Pickett, chief information and digital officer at Syneos Health, a biopharmaceutical services company, recently said that the current cost of training a model based on professional health data is about $1 million to $2 million. By using smaller, open-source pre-trained models instead of building on top of the entire dataset owned by OpenAI, the cost is greatly reduced. Enterprise technology leaders are under pressure to prepare data for AI models, and data and data intelligence platforms become pain points and opportunity points for entrepreneurs.

It can be seen that vertical industries, data security, and lower costs may all be important opportunities for AI startups to avoid the footprints of giant beasts and seek commercial success.

Conclusion: Generative AI entrepreneurship "attracts gold", and startups should avoid the footprints of giant beasts

The US$1.3 billion generative AI M&A brought a new reference for AI entrepreneurship. Although the founding time, scale, and talent strength of MosaicML seem to be very limited, and its large model effect has not yet caught up with GPT-4, MosaicML has still been given high recognition by the acquirer Databricks, thus verifying its value in stages.

In fact, some people think that the value of Databricks' integration of large language models is not clear enough, which may be hyped by the popularity of large models, and this case reference still needs time to verify. However, in any case, the MosaicML case does point out the key elements of AI entrepreneurship such as vertical industries, data security, and lower costs, which are worthy of industry reference.