laitimes

Hugging Face, the "open-source smiley," made OpenAI and Google tremble

author:Cortana MentarloAI
Hugging Face, the "open-source smiley," made OpenAI and Google tremble

They aim to build a Github for AI, and valuations have soared to $2 billion in just a few years.

"We don't have any protections, and neither does OpenAI." That's the sentiment expressed by an internal Google researcher in a recently leaked document. He believes that in this fierce AI competition, although Google and OpenAI are catching up, the real winner may not be one of the two, because there is a third-party force on the rise.

This force is the "open source community," which is the real competitor of Google and OpenAI.

The most influential in the open source community is undoubtedly Hugging Face. As a Github in the field of AI, it provides many high-quality open source models and tools to maximize the benefits of R&D results to the community, greatly reducing the technical threshold of AI, and promoting the "democratization" process of AI.

Clément, one of the founders of Hugging Face, has publicly stated: "In natural language processing or machine learning, the worst-case scenario is to compete with the entire scientific community and the open source community." Therefore, instead of competing, we choose to support the open source community and the scientific community. "

Founded in 2016, Hugging Face has received 5 consecutive rounds of financing in just a few years, and its current valuation has reached $2 billion. On Github, it has more than 98,000 stars, ranking among the popular repositories.

So, what exactly does this company do? How did it counterattack to become the "top" of the open source industry? What is its development model?

01

NLP opens the way to counterattack

Hugging Face is an AI startup with natural language processing (NLP) at its core.

The company was founded by Clément Delangue, French serial entrepreneurs who founded VideoNot.es, Mention, and Moodstocks, which was acquired by Google, along with Thomas Wolf and Julien Chaumond. Founded in 2016, Hugging Face is now headquartered in New York, USA.

Hugging Face, the "open-source smiley," made OpenAI and Google tremble

Among them, the founders Clément Delangue and Thomas Wolf are experts in the field of natural language processing, and they are considered pioneers in the contemporary NLP field in the process of promoting the development of Hugging Face.

Their original intention in founding Hugging Face was to create an "entertaining" "open-field chatbot" for young people, just like the AI in the sci-fi movie "Her", which can talk to people about various topics, such as weather, friends, love and sports games. People can chat with it at their leisure, ask it questions, and even have it generate some interesting pictures.

This explains the origin of the name Hugging Face, which comes from a smiley emoji with open hands.

Hugging Face, the "open-source smiley," made OpenAI and Google tremble

On March 9, 2017, the Hugging Face App was officially released on the iOS App Store, which immediately attracted a lot of attention and successfully received $1.2 million in angel funding from SV Angel, NBA star Kevin Durant and other investors.

To enhance the chatbot's natural language processing (NLP) skills, Hugging Face created a library of machine learning models and databases that were used to train the bot for sentiment analysis, generate coherent responses, understand different conversation topics, and more.

In addition, the Hugging Face team has also open-sourced part of the library on GitHub, hoping to get inspiration for development through user co-creation.

However, by 2018, Hugging Face's development was still lackluster, so they decided to start sharing the underlying code of the app online for free. This behavior immediately attracted a positive response from researchers at tech giants such as Google and Microsoft, who began to use this code to develop AI applications, which also made Hugging Face's hallmark, the smiley emoji, recognized by a large number of AI developers.

In the same year, Google released a large-scale pre-trained language model BERT based on a two-way Transformer, which triggered the "involution era" of AI models.

In this context, Hugging Face began to provide AI model services, and then entered its own "golden age".

They first open-sourced PyTorch-BERT, then integrated the pre-trained models they had previously contributed to the NLP space and released the Transformers library.

The Transformers library provides thousands of pre-trained models that support text classification, information extraction, question answering, summarization, translation, text generation, and more in more than 100 languages. With the help of the Transformers library, developers can easily use large NLP models such as BERT, GPT, XLNet, T5, DistilBERT to complete various AI tasks, greatly saving time and computing resources.

Overall, the Transformers library provides enterprises with a plug-and-play model without the need for secondary development. As a result, many enterprises are starting to use the Transformers library to incorporate their models into product development and workflows.

Thanks to this, the Transformers library quickly became popular and became the fastest growing AI project on GitHub.

Hugging Face, the "open-source smiley," made OpenAI and Google tremble

Hugging Face's Star curve on Github, image from Lux Capital

Clément Delangue, co-founder of Hugging Face, lamented, "We didn't put much thought into it when we launched the product, and we were surprised by the explosive growth of the community. ”

In the face of a large number of developers, Hugging Face naturally established its own community, namely Hugging Face Hub; At the same time, they adjusted their product strategy, no longer focusing only on natural language processing, but began to explore various fields of machine learning and try to find new application scenarios to build a comprehensive open source product ecosystem.

By April 2023, Hugging Face had shared 166,894 training models and 26,900 datasets, covering fields including NLP, speech, biology, time series, computer vision, and reinforcement learning, establishing a complete AI development ecosystem.

This lowers the barrier to entry for relevant research and applications, making Hugging Face the most influential technology provider in the AI community.

At present, these models have served tens of thousands of enterprises, helping researchers and related personnel better build models and better participate in products and workflows, including well-known AI teams such as Meta, Amazon, Microsoft, and Google.

Hugging Face, the "open-source smiley," made OpenAI and Google tremble

In the capital market, Hugging Face has also received high attention.

In May 2022, the team closed a $100 million Series C funding round led by Lux Capital and participated by Sequoia Capital, soaring to a valuation of $2 billion.

In the face of capital pursuit, Hugging Face's founders remained calm, they rejected some "significant acquisition offers", and determined not to sell their business like GitHub. The founders had some interesting ideas about the future of Hugging Face: "We wanted to be the first public company to use an emoji as a ticker symbol, rather than the traditional three-letter one. ”

02

Github for AI big models

Hugging Face, a company that has gained traction for its open source, pays special attention to community building, and the Hugging Face Hub they founded has now become an important position for AI developers.

Hugging Face Hub is a centralized platform for people to explore, experiment, collaborate and develop machine learning technologies. Here, anyone can share and explore resources such as models and datasets, so that everyone can co-create and build machine learning models together. Hugging Face Hub is therefore known as the "home of machine learning."

It is the product of Hugging Face's insistence on the spirit of "open source" and is also its core value. Just like the manifesto on the official website: AI community, create the future together.

Hugging Face, the "open-source smiley," made OpenAI and Google tremble

The founder of Hugging Face has publicly stated, "The goal of Hugging Face is to achieve their innovative goals by providing tools and developer communities to make natural language processing tools more accessible to more people, making natural language processing technology more convenient and easy to use." ”

He added, "No company, including tech giants, can 'solve AI problems' on its own, and the only way to achieve that is to be community-centric and share knowledge and resources." ”

That's why the company is committed to building the largest open-source collection of models, datasets, demos, and metrics on the Hugging Face Hub, enabling everyone to "democratize" AI by exploring, experimenting, collaborating and building technologies using machine learning.

Currently, Hugging Face Hub offers more than 120,000 models, 20,000 datasets, and 50,000 demo applications, all of which are open source, public, and free.

Hugging Face, the "open-source smiley," made OpenAI and Google tremble

Hugging Face Hub is open to all machine learning models and is supported by natural language processing libraries such as Transformers, Flair, Asteroid, ESPnet, Pyannote, etc., of which the core natural language processing library is the Transformers library.

The Transformers library supports interoperability between PyTorch, TensorFlow, and JAX, guaranteeing the flexibility to use different frameworks at every stage of the model lifecycle. Through the Inference API (Inference API), users can directly use the models and datasets developed by Hugging Face for inference and transfer learning. This makes the Transformers framework industry-leading in performance and ease of use, profoundly changing the development model of deep learning in the NLP field.

Hugging Face, the "open-source smiley," made OpenAI and Google tremble

In addition, the platform provides practical tools such as model versioning, integration testing, sharing, and collaboration to help developers manage and share models and datasets more efficiently.

Therefore, on Hugging Face Hub, any developer or engineering team can quickly download and train state-of-the-art pre-trained models with the help of thousands of model inference API interfaces, complete common tasks in various modes, including natural language processing, computer vision, audio, multimodal, etc., and build their own machine learning-driven applications in minutes, saving a lot of time and resources to train models from scratch.

Based on this, they can also create a dedicated repository under their account to store and share trained models, datasets and scripts, and share and communicate with a strong community to easily complete machine learning workflows.

In short, Hugging Face Hub provides a platform for researchers to showcase and share their own models, test others' models, and delve into the internal structure of those models to drive machine learning. Previously, AI seemed out of reach for front-end developers because only a handful of code-generated AI systems were freely available to the public.

Therefore, Hugging Face decided to provide open source models and APIs in the community to change this status quo, and take the initiative to undertake the complicated and small work in the process of AI research to application, so that all AI practitioners can easily use these research models and resources. In Hugging Face's own words, what they do is bridge the gap between AI research and applications.

Hugging Face is also committed to strengthening the security of the Hub to ensure that users' code, models, and data are safe for users to use with peace of mind.

For example, they add model cards to the model library to inform users of the limitations and biases of each model, thereby driving responsible use and development of models; They also set up access control capabilities in datasets, allowing organizations and individuals to create private datasets based on permission and privacy considerations, and handle access requests from other users themselves.

It is worth mentioning that in order to further promote the "democratization" of natural language processing technology, Hugging Face Hub has also opened a natural language processing course - Hugging Face course.

The course will use databases from the Hugging Face ecosystem, including Hugging Face Transformers, Hugging Face Datasets, Hugging Face Tokenizers, and Hugging Face Accelerate, to teach about natural language processing (NLP). This course is completely free and even has no ads.

Hugging Face, the "open-source smiley," made OpenAI and Google tremble

Overall, Hugging Face Hub is very similar to GitHub in the field of machine learning. It is a community developer-driven platform that provides resources for developers to continuously explore, innovate, and collaborate on machine learning models, datasets, and machine learning applications to accelerate and advance AI by sharing knowledge and resources.

03

"Open Source" Drives "Business"

How to transform from a company that provides "open community" and "open source" resources to a profitable company is a question worth exploring.

First, the "open source" decision proved to be correct. Through the open-source project Transformers, Hugging Face has amassed a huge influence, established a large developer community, Hugging Face Hub, and won the trust of customers and investors, which has made its commercial transformation smooth.

Pat Grady, a partner at Sequoia Capital, said, "They prioritized apps over monetization, and I think that's the right decision." They see applications of the Transformer model beyond NLP and an opportunity to become GitHub-like, which will apply not only to NLP, but also to all areas of machine learning. ”

Looking back over the past decade, many startups have proven the commercial viability of the open source model. For example, MongoDB, Elastic, Confluent, and others are fast-growing open source companies that have achieved profitability and stable survival in the market.

Clément, co-founder of Hugging Face, firmly believes that "startups can empower open communities in a way that generates far more value than it can generate by building proprietary tools." ”

He publicly stated that "given the value of open source machine learning and its mainstream status, its usage equates to future revenue." Machine learning will become the default way for technology development, and Hugging Face will become the platform of choice for this, generating billions of dollars in revenue. ”

Therefore, Hugging Face chose the business development route of "driving commerce with open source" and began to provide paid features in 2021.

Hugging Face, the "open-source smiley," made OpenAI and Google tremble

Right now, Hugging Face is profitable in three main ways:

  1. Paid membership: Generate revenue by providing better service and community experiences.
  2. Data hosting: Hourly hosting services are available based on different parameter needs.
  3. AI solution service: This is the current main product, providing customers with customized NLP, vision and other solutions, charging technical service fees.

Since 2020, Hugging Face has begun to customize natural language models for enterprises, and has launched AutoTrain, Inference API & Infinity, Private Hub, Expert Support and other personalized products for different developer types.

At present, more than 1,000 companies have become paying customers of Hugging Face, including large enterprises such as Intel, Qualcomm, Pfizer, Bloomberg and eBay.

In 2021, Hugging Face achieved $10 million in revenue, proving the success of its "open source for business" strategy.

As Clément, CEO of Hugging Face, puts it, "Companies don't need to make 100% of the profit from the value created, they just need to monetize 1% of that value, and even if it's only 1%, it's enough to make you a high-cap company." ”

Overall, Hugging Face relies on the accumulated influence of the open source community, and then gradually expands into SaaS products and enterprise services. This incremental transformation has allowed Hugging Face to find a good balance between open source and commercialization, which is the key to its success. This development strategy has allowed Hugging Face to establish itself in the AI field and provide examples for other AI startups.

However, open source ecosystems also have their weaknesses, as the development of commercialization can disrupt the environment of naturally formed communities. To solve this problem, Hugging Face has taken an approach to enhancing technical control, maintaining its own open source ecosystem, and digging deep into the field of scientific research.

"Machine learning technology is still in its early stages, and the potential for open source communities is huge. In the next 5 to 10 years, we will definitely see more open source machine learning companies emerge. ”

Read on