Not too unsurprisingly, Meta has come to blow up the streets with the Llama 3 series of models, which are known as "the most powerful open-source models ever".

Specifically, Meta has open-sourced two models of different scales, 8B and 70B.

Llama 3 8B: Basically as powerful as the largest Llama 2 70B.
Llama 3 70B: The first AI model, comparable to the Gemini 1.5 Pro and better than the Claude Cup

And that's just Meta's hors d'oeuvres, the real meal is yet to come. In the coming months, Meta will launch a series of new models with multi-modal, multi-language conversations, longer context windows, etc., among which the heavyweight of the Super 400B is expected to "wrestle wrists" with the Claude 3 Super Cup.

Call 3 体验地址:https://llama.meta.com/llama3/

The most powerful open-source large model exploded in the middle of the night! Llama 3 is back, close to GPT-4, Musk praised

又一 GPT-4 级模型来了,Llama 3 开卷

Compared to its predecessor, the Llama 2 model, the Llama 3 is a step up to the next level.

Thanks to the improvements in pre-training and post-training, the pre-training and instruction fine-tuning models released in this release are the most powerful models in today's 8B and 70B parameter scales, and the optimization of the post-training process significantly reduces the error rate of the model, enhances the consistency of the model, and enriches the diversity of responses.

Zuckerberg once revealed in a public statement that Llama 2's optimization in this area is not outstanding, considering that users don't ask Meta AI coding-related questions in WhatsApp.

And this time, Llama 3 offers breakthrough improvements in inference, code generation, and following instructions, making it even more flexible and easy to use.

Benchmark results show that Llama 3 8B scores well above Google Gemma 7B and Mistral 7B Instruct in the MMLU, GPQA, HumanEval, and other tests. In Zuckerberg's words, the smallest Llama 3 is basically as powerful as the largest Llama 2.

The Llama 3 70B ranks among the top AI models, beating the Claude 3 Cup in terms of overall performance, and is a close match for the Gemini 1.5 Pro.

To accurately study model performance under benchmarking, Meta has also developed a new set of high-quality human evaluation datasets.

The assessment set contains 1800 prompts covering 12 key use cases: Seeking Advice, Brainstorming, Categorizing, Closed-ended Q&A, Coding, Creative Writing, Extraction, Personaputing, Open-ended Q&A, Reasoning, Rewriting, and Summarizing.

To avoid overfitting Llama 3 on this assessment set, Meta even banned their research team from accessing the dataset. In head-to-head battles against Claude Sonnet, Mistral Medium, and GPT-3.5, the Meta Llama 70B all ended the race with a "landslide victory".

According to Meta's official introduction, Llama 3 chooses a relatively standard pure decoder Transformer architecture in the model architecture. Llama 3 has several key improvements compared to Llama 2:

Llama 3 uses a tokenizer with a 128K token vocabulary to encode the language more efficiently, resulting in a significant improvement in model performance.
Grouped Query Attention (GQA) is used in both the 8B and 70B models to improve the inference efficiency of the Llama 3 model.
The model is trained on a sequence of 8192 tokens, using a mask to ensure that self-attention does not cross document boundaries.

The quantity and quality of training data are key factors driving the emergence of large model capabilities in the next phase.

From the very beginning, Meta Llama 3 has been committed to being the most powerful model possible. Meta invests heavily in pre-training data. It is reported that Llama 3 uses more than 15T of tokens collected from public sources, which is seven times the dataset used by Llama 2, and contains four times the code data of Llama 2.

Considering the practical application of multiple languages, more than 5% of the Llama 3 pre-trained dataset is made up of high-quality non-English data covering more than 30 languages, but Meta officials also admit that the performance of these languages is expected to be slightly inferior compared to English.

To ensure that Llama 3 is trained on the highest quality data, the Meta research team even used heuristic filters, NSFW filters, semantic deduplication methods, and text classifiers in advance to predict data quality.

Notably, the research team also found that previous generations of Llama models were surprisingly good at identifying high-quality data, so they let Llama 2 generate training data for the text quality classifier powered by Llama 3, truly realizing "AI training AI".

In addition to the quality of training, Llama 3 has also made a qualitative leap in training efficiency.

Meta revealed that in order to train the largest Llama 3 model, they combined three types of parallelization: data parallelization, model parallelization, and pipeline parallelization.

When training simultaneously on 16K GPUs, each GPU can achieve more than 400 TFLOPS of compute utilization. The research team performed training runs on two custom-built 24K GPU clusters.

To maximize GPU uptime, the research team developed an advanced new training stack that automates error detection, handling, and maintenance. In addition, Meta has dramatically improved hardware reliability and silent data corruption detection, and has developed new scalable storage systems to reduce checkpointing and rollback overhead.

These improvements result in an overall effective training time of more than 95% and a training efficiency of about 3 times more than its predecessor.

For more technical details, please check out the official Meta blog: https://ai.meta.com/blog/meta-llama-3/

Open Source VS Closed Source

As Meta's "own son", Llama 3 is also naturally prioritized for integration into the AI chatbot Meta AI.

Back at last year's Meta Connect 2023 conference, Zuckerberg officially announced the launch of Meta AI, and then quickly rolled it out to the United States, Australia, Canada, Singapore, South Africa, and other regions.

In a previous interview, Zuckerberg was even more confident in Meta AI powered by Llama 3, saying that it will be the smartest AI assistant that people can use for free.

I think it's going to move from a chatbot-like format where you just ask a question and it gives an answer, and you can give it more complex tasks and it does them.

Attach the Meta AI web experience address: https://www.meta.ai/

Of course, if Meta AI is "not yet available in your country", you can use the most simple way to use the open source model - Hugging Face, the world's largest AI open source community website.

Attach the address of the experience: https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct

Platforms such as Perplexity, Poe, and others were also quick to announce the integration of Llama 3 into platform services.

You can also experience Llama 3 by calling the Replicate API, an open-source model platform, and the price of its use has been revealed, so you may want to use it on demand.

Interestingly, before Meta officially announced Llama 3, sharp-eyed netizens found that Microsoft's Azure Marketplace had stolen the Llama 3 8B Instruct version, but as the news spread further, when the flocking netizens tried to access the link again, they only got the "404" page.

Currently recovers: https://azuremarketplace.microsoft.com/en-us/marketplace/apps/metagenai.meta-llama-3-8b-chat-offer?tab=overview

The arrival of Llama 3 is setting off a new storm of discussion on the social platform X.

Meta AI Chief Scientist and Turing Award winner Yann LeCun not only cheered for the release of Llama 3, but also teased more releases in the coming months. Even Musk appeared in the comment area, expressing his recognition and expectation of Llama 3 with a concise and subtle "Not bad good".

JIm Fan, a senior scientist at NVIDIA, has turned his attention to the upcoming Llama 3 400B+, which he believes is a symbol of the open-source model and the top closed-source model.

From the benchmarks it shared, it can be seen that the strength of the Llama 3 400B+ is almost comparable to the Claude Super Cup, and the new version of GPT-4 Turbo, although there is still a certain gap, it is enough to prove that it has a place among the top large models.

Today coincides with the birthday of Andrew Ng, a professor at Stanford University and a top expert in AI, and the arrival of Llama 3 is undoubtedly the most special way to celebrate his birthday.

I have to say that today's open source model is really a hundred flowers blooming, and a hundred schools of thought are contending.

Earlier this year, Zuckerberg, who has 350,000 GPUs in his hands, in an interview with The Verge, described Meta's vision in a firm tone - to build AGI (Artificial General Intelligence).

In stark contrast to OpenAI, which is not open, Meta has followed the open source route towards the holy grail of AGI.

As Zuckerberg said, Meta, which is firmly open source, has not been without success on this challenging journey:

I'm generally very inclined to think that open source is good for the community and for us because we benefit from innovation.

Over the past year, the entire AI community has been debating the open-source or closed-source route, and this debate has gone beyond the technical pros and cons to the core direction of the future development of AI. Even Musk, who personally stepped down, has set an example for the world by open-sourced Grok 1.0.

Not long ago, there was some talk that open source models would be getting more and more backward, and now the arrival of Llama 3 has also given this pessimistic statement a resounding slap in the face.

However, while Llama 3 has pulled back the game for the open source model, the debate about open source versus closed source is far from over.

After all, GPT-4.5/5, which is secretly poised to take off, may end this summer with unrivaled performance.

The most powerful open-source large model exploded in the middle of the night! Llama 3 is back, close to GPT-4, Musk praised

又一 GPT-4 级模型来了,Llama 3 开卷

Open Source VS Closed Source

Read on

CNCC | The future of multimodal affective computing under large models

The "Fuxi Eye" large model was released! It has the world's largest ophthalmic image database

New car | The AI large model is on the car, 13 new/27 optimizations, and the ZEEKR 009 glorious OTA upgrade

AI Daily: Fudan and Baidu's new models can generate 1-hour long videos; The new version of ChatGPT for Windows is launched; Two new features have been added to NotebookLM

Surveying and Mapping Bulletin | Ren Ping: Noise data visualization based on LOD1 city model

The terminal AI grading standard has been implemented, and the "fire" of the mobile phone model has burned to the agent

J Clin Invest丨Yang Weili/Li Shihua/Li Xiaojiang's team used monkey models to reveal new pathological mechanisms of Parkinson's disease

Tens of millions of dollars lost by poisoning for large model training? Anthropic found a hidden bug in the LLM codebase

Nearly 1,000 teenagers in the city gathered at Zhonghai Expo to show their skills in the three major model competitions of navigation, aviation and architecture

DeepMind and MIT developed Fluid, which enables autoregressive models to achieve large-scale expansion of Wensheng graphs

AI Weekly | ByteDance's large model training was "poisoned"; Microsoft will terminate the Azure OpenAI service for individuals in China

ByteDance responded to the attack on the intern for the training of the large model: it has been dismissed and does not affect the online business

After looking at the Chinese drone, Musk scolded: the F-35 is a "shit design", and it is produced by fools

A number of large models have been rolled out in the field of traditional Chinese medicine, and the "AI old Chinese medicine" is coming?

The radish ran quickly and hit the accelerator to go to sea, rolling up Musk

Heavy! Google's Willow quantum chip turned out: solving the problem of quantum computing for thirty years, Musk exclaimed, Altman congratulated

Musk once again mocked Gates: If Tesla tops the world's largest market capitalization, it may bankrupt him!

Shoot the king to bomb? Photorealistic generative world model, with Pixar investment

AI dominates mental work in 4 years, and humans move bricks? Musk predicts that 30 billion robots will take over the world

Tencent, Huawei, etc. access to DeepSeek lose more than 400 million yuan per month, and the MaaS model as a service is about to be subverted? Titanium media AGI

Musk suddenly released "the strongest AI on the earth", and the global technology circle will fry again, and the opponent is not DeepSeek

Musk released "the smartest AI on the planet", or is it backed by the Chinese? DeepSeek with the Sunrise Paper

Just! Musk, big announcement!

Musk, directed a coup d'état in the United States

The sex robot was unexpectedly empowered by a large model, and the concept stocks of adult products rose collectively, against the sky?

Musk is a dad again! Gave birth to a son with a sexy executive in the company, becoming his 14th child

Musk is a dad again! gave birth to a son with a sexy executive of the company, 14 children and 4 mothers are outrageous

Outburst! Musk's rocket exploded and disintegrated, causing all flights at airports in many places in the United States to be urgently suspended

Musk, it's getting more and more dangerous!

Suddenly! Musk, big announcement!

Musk's transgender daughter is ugly again! angrily scolded my father for making the 14th baby this year

The American general asked Musk: how to defeat China? After Musk gave the answer, the audience was silent