laitimes

What OpenAI and Google are most afraid of is an "open source smiley face"

"We don't have a moat, and neither does OpenAI."

In a recently leaked document, a researcher inside Google expressed this view. The researcher believes that in this fierce AI race, although Google and OpenAI are catching up, the real winners may not be in these two, because a third-party force is rising.

This force is the "open source community", which is the biggest enemy of Google and OpenAI.

The top of the open source community is Hugging Face. As a Github in the field of AI, it provides a large number of high-quality open source models and tools to maximize the benefits of R&D results to the community, greatly reducing the technical threshold of AI, and promoting the "democratization" process of AI.

One of its founders, Clément, has also publicly stated that "in NLP or machine learning, the worst-case scenario is to compete with the entire scientific community and the open source community." So instead of trying to compete, we chose to empower the open source community and the scientific community."

Founded in 2016, Hugging Face has raised 5 rounds of funding in a few years, and its valuation has soared to $2 billion, and the number of stars on Github has exceeded 9.8w, ranking it among the popular resource libraries.

So what does this company do? How did it become the "top" of the open source industry? What is its development model?

01 NLP opens the road to counterattack

Hugging Face is an AI startup with natural language processing (NLP) technology at its core.

It was founded in 2016 by French serial entrepreneur Clément Delangue (who founded projects such as note-taking platform VideoNot.es, media monitoring platform mention and mobile development platform Moodstocks acquired by Google) and Thomas Wolf and Julien Chaumond, and is headquartered in New York, USA.

What OpenAI and Google are most afraid of is an "open source smiley face"

Two of the founders, Clément Delangue and Thomas Wolf, are experts in natural language processing. In their ongoing development of Hugging Face, they are regarded as pioneers in the contemporary NLP space.

Their original intention in founding Hugging Face is to bring an "entertaining" "open-field chatbot" to young people, just like the AI in the sci-fi movie "Her", which can chat with people about various topics such as anger, friends, love and sports competitions. You can gossip with it when you're bored, ask it questions, and let it generate funny pictures.

That's why Hugging Face gets its name from a cute smiling emoji with open hands.

On March 9, 2017, the Hugging Face App was officially launched on the iOS App Store, which received a lot of attention and received $1.2 million in angel investment, including SV Angel, NBA star Durant and other investors.

To train the chatbot's natural language processing (NLP) capabilities, Hugging Face built a library of resources to house various machine learning models and various types of databases, including helping to train the chatbot to detect text message sentiment, generate coherent responses, understand different conversation topics, and more.

At the same time, the Hugging Face team open-sourced the free part of this library on GitHub, with the aim of getting development inspiration from user co-creation.

In 2018, Hugging Face was still tepid, and it began sharing the underlying code of the app online for free. This move immediately received a positive response from researchers from well-known technology companies such as Google and Microsoft, who began to use these codes for AI applications, and this smiley emoji began to be known by the majority of AI developers.

Coincidentally, in the same year, Google launched a large-scale pre-trained language model BERT based on a two-way Transformer, opening the "involution era" of AI models.

In such an environment, Hugging Face began to provide AI model services, and then ushered in its own "golden age".

It first open-sourced PyTorch-BERT; Subsequently, it integrated the pre-trained models from the NLP field it had previously contributed to and released the Transformers library.

The Transformers library provides thousands of pre-trained models that support text classification, information extraction, question answering, summarization, translation, text generation in more than 100 languages. With the help of Transformers library, developers can easily use NLP large models such as BERT, GPT, XLNet, T5, and DistilBERT to complete AI tasks such as text classification, text summary, text generation, information extraction, and automatic QA, saving a lot of time and computing resources.

In short, the Transformers library provides a ready-to-use model without the need for enterprises to redevelop; As a result, companies are turning to Transformers libraries to incorporate models into product development and workflows.

As a result, the Transformers library quickly became popular and became the fastest-growing AI project in GitHub's history.

What OpenAI and Google are most afraid of is an "open source smiley face"

Hugging Face's Star curve on Github, image from Lux Capital

Clément Delangue, one of the founders of Hugging Face, lamented, "We didn't put things in mind when we released things, and we were even surprised by the explosive growth of the community."

With so many developers, Hugging Face naturally built its own community, Hugging Face Hub; At the same time, adjust the product strategy, no longer limited to natural language processing, but integrate different areas of machine learning, explore the creation of new use cases, and start building a complete set of open source product matrix.

As of April 2023, Hugging Face has shared 16,6894 training models and 26,900 datasets, covering NLP, speech, biology, time series, computer vision, reinforcement learning, and other fields, building a complete AI development ecosystem.

This has greatly reduced the threshold for relevant research and application, making Hugging Face the most influential technology provider in the AI community.

At present, these models have served tens of thousands of enterprises for resource development, helping researchers and related practitioners better build models and better participate in products and workflows, including well-known AI teams such as Meta, Amazon, Microsoft, and Google.

Companies and products that use Hugging Face | Hugging Face

In the capital markets, Hugging Face is also favored.

In May 2022, the team completed a $100 million Series C financing led by Lux Capital and invested by Sequoia Capital, with a valuation skyrocketing to $2 billion.

In the face of capital pursuit, the founder of Hugging Face was extremely calm, saying that he had rejected multiple "meaningful acquisition offers" and would not sell his business like GitHub. The founders also have some interesting ideas about the future of Hugging Face: "We want to be the first company to go public with an emoji, not a three-letter ticker."

02 Github for AI large models

Hugging Face, which has gained much attention with open source, also pays special attention to community construction, and the Hugging Face Hub, which has just been born, has now become the base camp of AI developers.

The Hugging Face Hub is a central place to explore, experiment, collaborate, and build machine learning techniques. Here anyone can share and explore models, datasets, etc., and everyone can easily collaborate to build machine learning models together, so Hugging Face Hub is also known as the "home of machine learning".

It is the product of Hugging Face's insistence on "open source" and its core. As the official website tagline: AI community, building the future.

What OpenAI and Google are most afraid of is an "open source smiley face"

Hugging Face's developer page | Hugging Face

The founder of Hugging Face has publicly stated that "Hugging Face's goal is to make natural language processing tools more people use through the tool and developer community, achieve their innovative goals, and make natural language processing technology easier to use and access."

He added that "no company, including tech giants, can 'solve AI problems' on its own, and the only way we can achieve that is through community-centric sharing of knowledge and resources."

Therefore, the company is committed to building the largest open-source collection of models, datasets, demos, and metrics on the Hugging Face Hub to enable everyone to explore, experiment, collaborate, and build technologies using machine learning to achieve the goal of "democratizing" AI.

Currently, Hugging Face Hub offers more than 120,000 models, 20,000 datasets, and 50,000 demo applications, all of which are open source, public, and free.

What OpenAI and Google are most afraid of is an "open source smiley face"

API hosting business provided by Hugging Face | Hugging Face

Hugging Face Hub is open to all machine learning models and is supported by natural language processing libraries such as Transformers, Flair, Asteroid, ESPnet, Pyannote, among which the core natural language processing library is the Transformers library.

The Transformers library supports framework interoperability between PyTorch, TensorFlow, and JAX, which ensures the flexibility to use different frameworks at each stage of the model lifecycle. Moreover, through the Inference API (Inference API), users can directly use the models and datasets developed by Hugging Face for inference and transfer learning, which makes the Transformers framework industry-leading in performance and ease of use, and completely changes the development model of deep learning in the NLP field.

What OpenAI and Google are most afraid of is an "open source smiley face"

Hugging Face Hub can be called the "Github" of the AI industry

In addition, the platform provides practical tools such as model versioning, test integration, sharing, and collaboration to help developers better manage and share models and datasets.

Therefore, in Hugging Face Hub, any developer or engineering team can easily download and train state-of-the-art pre-trained models through the interface, using the inference API of thousands of models, to complete common tasks in different modes, such as natural language processing, computer vision, audio, multimodal, etc., and build their own machine learning-driven applications in minutes, eliminating the time and resources required to train models from scratch.

On top of that, they can also create their own repositories under their own accounts to store and share trained models, datasets, and scripts, while sharing and communicating with a strong community to easily collaborate on ML workflows.

In short, the Hugging Face Hub provides a platform for researchers to showcase the models they want to share, test the models of others, and delve into the internal architecture of those models to promote the development of ML. Previously, AI seemed out of reach for front-end developers, after all, so far, only a few code-generated AI systems have been freely available to the public.

For this reason, Hugging Face decided to provide open source models and APIs in the community to change this situation, and actively undertake the complex and detailed work in the process of AI research to application, so that any AI practitioner can easily use these research models and resources. In Hugging Face's own words, what they do is to build a bridge between AI research and application.

In addition to providing convenience, Hugging Face actively takes steps to strengthen the security of the Hub, ensuring that users' code, models, and data are safe and secure.

For example, equipping the model library with model cards to inform users of the limitations and biases of each model, thereby promoting the responsible use and development of these models; Set access control on datasets to allow organizations and individuals to create private datasets for permission and privacy reasons, and handle access requests from other users themselves.

It is also worth mentioning that in order to further "democratize" natural language processing technology, Hugging face Hub also opened an NLP course - Hugging Face course.

The course will use databases from the Hugging Face ecosystem (Hugging Face Transformers, Hugging Face Datasets, Hugging Face Tokenizers, and Hugging Face Accelerate) to teach about natural language processing (NLP). It is completely free and does not even have ads.

What OpenAI and Google are most afraid of is an "open source smiley face"

Hugging Face uploads natural language processing courses directly to the video site YouTube for free

In short, Hugging Face Hub is like the GitHub of machine learning. A community developer-driven platform that provides resources for developers to explore, innovate, and collaborate on machine learning (ML) models, datasets, and ML applications to accelerate and advance AI by sharing knowledge and resources.

03 "Open Source" Drives "Business"

So the question is, how does an "open source" company that provides a "platform community" make a profit?

First of all, "open source" is the right decision.

With the open source project Transformers, Hugging Face has accumulated huge influence, gathered developers to build a huge community Hugging Face Hub, and won the trust of customers and investors, which makes its commercial transformation a natural fit.

In this regard, Pat Grady, a partner at Sequoia Capital, also said, "Their priority is application rather than monetization, and I think this is the right thing to do." They saw how the Transformer model could be applied beyond NLP, and they saw an opportunity to become GitHub, not just for NLP, but for every area of machine learning."

Moreover, looking at the entrepreneurial journey of startups in the market over the past decade, you will find that the commercial viability of the open source model has been strongly confirmed. Open source companies like MongoDB, Elastic, Confluent, and others are the fastest-growing open source companies that have all made a profit and survived in the market.

One of the founders of Hugging Face, Clément, is convinced that "startups can empower open communities in a way that generates a thousand times more value than building a proprietary tool."

It even publicly stated that "given the value of open source machine learning and its mainstream status, its usage is deferred revenue." Machine learning will become the default for technology development, and Hugging Face will become the number one platform for this and generate billions of dollars in revenue."

Therefore, Hugging Face chose the commercialization development path of "open source driven business" and began to provide paid functions in 2021.

What OpenAI and Google are most afraid of is an "open source smiley face"

Hugging Face's paid items | Hugging Face

At present, Hugging Face's profitable business mainly has three categories:

Paid membership: provide better services and community experience to obtain revenue;

Data hosting: Provide different hourly hosting services according to different parameter requirements;

AI solution services: The current main products provide customers with customized solutions around NLP, Vision and other directions to obtain technical service fees.

It is worth mentioning that since 2020, Hugging Face has begun to do customized natural language models for enterprises, and has launched personalized products including AutoTrain, Inference API & Infinity, Private Hub, Expert Support, etc., for different types of developers.

At present, more than 1,000 companies have become paying customers of Hugging Face, mainly large enterprises, including Intel, Qualcomm, Pfizer, Bloomberg and eBay.

In 2021, Hugging Face has achieved revenue of $10 million, and from the data point of view, Hugging Face's "open source leads business" strategy is successful.

This also confirms the statement of Hugging Face's CEO, Clément, "The company does not need to take 100% dividends from the value created, only 1% of the value is realized, but even 1% is enough to make you a high-capitalization company."

In short, Hugging Face built up influence through the open source community, and then gradually expanded into SaaS products and enterprise services. This gradual transformation has allowed Hugging Face to achieve a good balance between open source and commercialization, which is also an important reason for its success. This development strategy has also made Hugging Face a unique presence in the AI community and set an example for other AI startups.

However, the open source ecosystem also has its own weaknesses, because the development of commercialization is likely to harm the community environment where it grows naturally. In this regard, Hugging Face's approach is to strengthen the control of technology and maintain its own open source ecosystem; At the same time, dig deep into the field of scientific research.

"Machine learning technology is still in its early stages of development, and the potential of the open source community is huge. In the next 5 to 10 years, we're going to see more open source machine learning companies rise."

Clément, CEO of Hugging Face, said.