laitimes

The most powerful open-source large model exploded in the middle of the night! Llama 3 is back, close to GPT-4, Musk praised

author:Love Fan'er

Not too unsurprisingly, Meta has come to blow up the streets with the Llama 3 series of models, which are known as "the most powerful open-source models ever".

Specifically, Meta has open-sourced two models of different scales, 8B and 70B.

  • Llama 3 8B: Basically as powerful as the largest Llama 2 70B.
  • Llama 3 70B: The first AI model, comparable to the Gemini 1.5 Pro and better than the Claude Cup

And that's just Meta's hors d'oeuvres, the real meal is yet to come. In the coming months, Meta will launch a series of new models with multi-modal, multi-language conversations, longer context windows, etc., among which the heavyweight of the Super 400B is expected to "wrestle wrists" with the Claude 3 Super Cup.

Call 3 体验地址:https://llama.meta.com/llama3/

The most powerful open-source large model exploded in the middle of the night! Llama 3 is back, close to GPT-4, Musk praised

又一 GPT-4 级模型来了,Llama 3 开卷

Compared to its predecessor, the Llama 2 model, the Llama 3 is a step up to the next level.

Thanks to the improvements in pre-training and post-training, the pre-training and instruction fine-tuning models released in this release are the most powerful models in today's 8B and 70B parameter scales, and the optimization of the post-training process significantly reduces the error rate of the model, enhances the consistency of the model, and enriches the diversity of responses.

Zuckerberg once revealed in a public statement that Llama 2's optimization in this area is not outstanding, considering that users don't ask Meta AI coding-related questions in WhatsApp.

And this time, Llama 3 offers breakthrough improvements in inference, code generation, and following instructions, making it even more flexible and easy to use.

The most powerful open-source large model exploded in the middle of the night! Llama 3 is back, close to GPT-4, Musk praised

Benchmark results show that Llama 3 8B scores well above Google Gemma 7B and Mistral 7B Instruct in the MMLU, GPQA, HumanEval, and other tests. In Zuckerberg's words, the smallest Llama 3 is basically as powerful as the largest Llama 2.

The Llama 3 70B ranks among the top AI models, beating the Claude 3 Cup in terms of overall performance, and is a close match for the Gemini 1.5 Pro.

To accurately study model performance under benchmarking, Meta has also developed a new set of high-quality human evaluation datasets.

The assessment set contains 1800 prompts covering 12 key use cases: Seeking Advice, Brainstorming, Categorizing, Closed-ended Q&A, Coding, Creative Writing, Extraction, Personaputing, Open-ended Q&A, Reasoning, Rewriting, and Summarizing.

The most powerful open-source large model exploded in the middle of the night! Llama 3 is back, close to GPT-4, Musk praised

To avoid overfitting Llama 3 on this assessment set, Meta even banned their research team from accessing the dataset. In head-to-head battles against Claude Sonnet, Mistral Medium, and GPT-3.5, the Meta Llama 70B all ended the race with a "landslide victory".

According to Meta's official introduction, Llama 3 chooses a relatively standard pure decoder Transformer architecture in the model architecture. Llama 3 has several key improvements compared to Llama 2:

  • Llama 3 uses a tokenizer with a 128K token vocabulary to encode the language more efficiently, resulting in a significant improvement in model performance.
  • Grouped Query Attention (GQA) is used in both the 8B and 70B models to improve the inference efficiency of the Llama 3 model.
  • The model is trained on a sequence of 8192 tokens, using a mask to ensure that self-attention does not cross document boundaries.

The quantity and quality of training data are key factors driving the emergence of large model capabilities in the next phase.

From the very beginning, Meta Llama 3 has been committed to being the most powerful model possible. Meta invests heavily in pre-training data. It is reported that Llama 3 uses more than 15T of tokens collected from public sources, which is seven times the dataset used by Llama 2, and contains four times the code data of Llama 2.

The most powerful open-source large model exploded in the middle of the night! Llama 3 is back, close to GPT-4, Musk praised

Considering the practical application of multiple languages, more than 5% of the Llama 3 pre-trained dataset is made up of high-quality non-English data covering more than 30 languages, but Meta officials also admit that the performance of these languages is expected to be slightly inferior compared to English.

To ensure that Llama 3 is trained on the highest quality data, the Meta research team even used heuristic filters, NSFW filters, semantic deduplication methods, and text classifiers in advance to predict data quality.

Notably, the research team also found that previous generations of Llama models were surprisingly good at identifying high-quality data, so they let Llama 2 generate training data for the text quality classifier powered by Llama 3, truly realizing "AI training AI".

In addition to the quality of training, Llama 3 has also made a qualitative leap in training efficiency.

Meta revealed that in order to train the largest Llama 3 model, they combined three types of parallelization: data parallelization, model parallelization, and pipeline parallelization.

When training simultaneously on 16K GPUs, each GPU can achieve more than 400 TFLOPS of compute utilization. The research team performed training runs on two custom-built 24K GPU clusters.

The most powerful open-source large model exploded in the middle of the night! Llama 3 is back, close to GPT-4, Musk praised

To maximize GPU uptime, the research team developed an advanced new training stack that automates error detection, handling, and maintenance. In addition, Meta has dramatically improved hardware reliability and silent data corruption detection, and has developed new scalable storage systems to reduce checkpointing and rollback overhead.

These improvements result in an overall effective training time of more than 95% and a training efficiency of about 3 times more than its predecessor.

For more technical details, please check out the official Meta blog: https://ai.meta.com/blog/meta-llama-3/

Open Source VS Closed Source

As Meta's "own son", Llama 3 is also naturally prioritized for integration into the AI chatbot Meta AI.

Back at last year's Meta Connect 2023 conference, Zuckerberg officially announced the launch of Meta AI, and then quickly rolled it out to the United States, Australia, Canada, Singapore, South Africa, and other regions.

In a previous interview, Zuckerberg was even more confident in Meta AI powered by Llama 3, saying that it will be the smartest AI assistant that people can use for free.

I think it's going to move from a chatbot-like format where you just ask a question and it gives an answer, and you can give it more complex tasks and it does them.

Attach the Meta AI web experience address: https://www.meta.ai/

The most powerful open-source large model exploded in the middle of the night! Llama 3 is back, close to GPT-4, Musk praised

Of course, if Meta AI is "not yet available in your country", you can use the most simple way to use the open source model - Hugging Face, the world's largest AI open source community website.

Attach the address of the experience: https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct

Platforms such as Perplexity, Poe, and others were also quick to announce the integration of Llama 3 into platform services.

The most powerful open-source large model exploded in the middle of the night! Llama 3 is back, close to GPT-4, Musk praised

You can also experience Llama 3 by calling the Replicate API, an open-source model platform, and the price of its use has been revealed, so you may want to use it on demand.

The most powerful open-source large model exploded in the middle of the night! Llama 3 is back, close to GPT-4, Musk praised

Interestingly, before Meta officially announced Llama 3, sharp-eyed netizens found that Microsoft's Azure Marketplace had stolen the Llama 3 8B Instruct version, but as the news spread further, when the flocking netizens tried to access the link again, they only got the "404" page.

Currently recovers: https://azuremarketplace.microsoft.com/en-us/marketplace/apps/metagenai.meta-llama-3-8b-chat-offer?tab=overview

The arrival of Llama 3 is setting off a new storm of discussion on the social platform X.

Meta AI Chief Scientist and Turing Award winner Yann LeCun not only cheered for the release of Llama 3, but also teased more releases in the coming months. Even Musk appeared in the comment area, expressing his recognition and expectation of Llama 3 with a concise and subtle "Not bad good".

The most powerful open-source large model exploded in the middle of the night! Llama 3 is back, close to GPT-4, Musk praised

JIm Fan, a senior scientist at NVIDIA, has turned his attention to the upcoming Llama 3 400B+, which he believes is a symbol of the open-source model and the top closed-source model.

From the benchmarks it shared, it can be seen that the strength of the Llama 3 400B+ is almost comparable to the Claude Super Cup, and the new version of GPT-4 Turbo, although there is still a certain gap, it is enough to prove that it has a place among the top large models.

The most powerful open-source large model exploded in the middle of the night! Llama 3 is back, close to GPT-4, Musk praised

Today coincides with the birthday of Andrew Ng, a professor at Stanford University and a top expert in AI, and the arrival of Llama 3 is undoubtedly the most special way to celebrate his birthday.

The most powerful open-source large model exploded in the middle of the night! Llama 3 is back, close to GPT-4, Musk praised

I have to say that today's open source model is really a hundred flowers blooming, and a hundred schools of thought are contending.

The most powerful open-source large model exploded in the middle of the night! Llama 3 is back, close to GPT-4, Musk praised

Earlier this year, Zuckerberg, who has 350,000 GPUs in his hands, in an interview with The Verge, described Meta's vision in a firm tone - to build AGI (Artificial General Intelligence).

In stark contrast to OpenAI, which is not open, Meta has followed the open source route towards the holy grail of AGI.

As Zuckerberg said, Meta, which is firmly open source, has not been without success on this challenging journey:

I'm generally very inclined to think that open source is good for the community and for us because we benefit from innovation.

Over the past year, the entire AI community has been debating the open-source or closed-source route, and this debate has gone beyond the technical pros and cons to the core direction of the future development of AI. Even Musk, who personally stepped down, has set an example for the world by open-sourced Grok 1.0.

Not long ago, there was some talk that open source models would be getting more and more backward, and now the arrival of Llama 3 has also given this pessimistic statement a resounding slap in the face.

However, while Llama 3 has pulled back the game for the open source model, the debate about open source versus closed source is far from over.

After all, GPT-4.5/5, which is secretly poised to take off, may end this summer with unrivaled performance.

Read on