Meta released Llama 3 calling it one of the best open models out there

Meta has released the latest addition to the Llama family of open-source generative AI models: Llama 3. Or, to be more precise, the company has open-sourced two models from the new Llama 3 series, with the rest set to launch at some uncertain date in the future.

Meta claims that the new models, Llama 3 8B (with 8 billion parameters) and Llama 3 70B (with 70 billion parameters), represent a "significant leap in performance" compared to the previous generation of Llama models Llama 2 8B and Llama 2 70B. (Parameters essentially define the AI model's ability to handle problems, such as analyzing and generating text; in general, a model with a higher number of parameters is more capable than a model with a lower number of parameters.) In fact, Meta says that in terms of the number of parameters each, Llama 3 8B and Llama 3 70B were trained on two custom 24,000 GPU clusters and are among the best-performing generative AI models available today.

That's all it takes, so how did Meta prove it? The company pointed to the Llama 3 model's scores on popular AI benchmarks such as MMLU (for measuring knowledge), ARC (for measuring skill acquisition), and DROP (for testing the model's ability to reason about blocks of text). As we've written before, the usefulness and effectiveness of these benchmarks is up for debate. But for better or worse, they are still one of the few standardized ways AI players like Meta evaluate their models.

In at least nine benchmarks, Llama 3 8B outperforms other open-source models such as Mistral's Mistral 7B and Google's Gemma 7B, both of which contain 7 billion parameters: these benchmarks include: MMLU, ARC, DROP, GPQA (a set of biology, physics, and chemistry-related questions), HumanEval (code generation tests), GSM-8K (math word problems), MATH (another math benchmark), AGIEval (Problem Solving Test Set) and BIG-Bench Hard (Common Sense Reasoning Assessment).

Right now, the Mistral 7B and Gemma 7B aren't exactly at the forefront (the Mistral 7B was released last September), with the Llama 3 8B scoring just a few percentage points higher than these two in some of the benchmarks cited by Meta. But Meta also claims that the Llama 3 model, the Llama 3 70B, which has a higher number of parameters, is also competitive with flagship generative AI models, including the latest addition to Google's Gemini family, the Gemini 1.5 Pro.

Image source: Meta

The Llama 3 70B outperformed the Gemini 1.5 Pro in all three benchmarks: MMLU, HumanEval, and GSM-8K, and while it couldn't match Anthropic's most powerful Claude 3 Opus, the Llama 3 70B outperformed the weakest performer in the Claude 3 series in all five benchmarks (MMLU, GPQA, HumanEval, GSM-8K, and MATH). Claude 3 Sonnet。

Notably, Meta has also developed its own test set covering a variety of use cases from coding, authoring to inference, summarization, and more, and it was a surprise that Llama 3 70B stood out against the competition with the Mistral Medium model, OpenAI's GPT-3.5, and Claude Sonnet!- Llama 3 70B stood out against Mistral's Mistral Medium model, OpenAI's GPT-3.5, and Claude Sonnet stood out from the competition. Meta says it prohibits its modeling team from accessing this set of data in order to maintain objectivity, but it's clear that given that Meta designed this test itself, we have to be cautious about the results.

In terms of quality, Meta says users of the new Llama model can expect higher "maneuverability," a lower likelihood of refusing to answer questions, and higher accuracy for trivial questions, questions related to history and STEM fields such as engineering and science, and general coding suggestions. This is thanks in part to a much larger dataset: a collection of 15 trillion markers, or an incredible 750,000,000,000 words, seven times the size of the Llama 2 training set.

Meta would not say anything other than that the data came from "open sources" and contained four times as much code as the Llama 2 training dataset, with 5% of it containing non-English data (about 30 languages) to improve performance in non-English languages. Meta also said it used synthetic data, i.e., AI-generated data, to create longer documents for Llama 3 model training, which is controversial due to the potential performance drawbacks of this approach.

"While the model we're releasing today is only fine-tuned for English output, the increased diversity of data helps the model better identify nuances and patterns and excel in a variety of tasks," Meta wrote in a blog post. "

Many generative AI vendors see training data as a competitive advantage and are therefore tight-lipped about training data and related information. However, the details of the training data are also a potential source of IP-related litigation, which is another reason for the reluctance to reveal too much information. Recent reports suggest that Meta has used copyrighted e-books for AI training in an attempt to catch up with AI competitors, despite warnings from the company's lawyers, and that authors, including comedian Sarah Silverman, are filing lawsuits against Meta and OpenAI accusing the companies of using copyrighted data for training without authorization.

So, what about the other two common problems with generative AI models, including Llama 2 – toxicity and bias – and does Llama 3 improve on these areas? Meta claims: Yes.

Meta said the company has developed new data filtering pipelines to improve the quality of model training data and updated a pair of generative AI security suites, Llama Guard and CybersecEval, to prevent misuse and unnecessary text generation for Llama 3 models and other models. The company also released a new tool, Code Shield, designed to detect code in generative AI models that could introduce security vulnerabilities.

However, filtering isn't foolproof, and tools like Llama Guard, CybersecEval, and Code Shield can only go so far. We need to take a closer look at how the Llama 3 model performs in the real world, including academic testing of other benchmarks.

Meta says the Llama 3 model is now available for download and supports Meta's Meta AI assistant on Facebook, Instagram, WhatsApp, Messenger and the web, and will soon be hosted as a host on various cloud platforms, including AWS, Databricks, Google Cloud, Hugging Face, Kaggle, IBM's WatsonX, Microsoft Azure, NVIDIA's NIM and Snowflake. In the future, models optimized for AMD, AWS, Dell, Intel, NVIDIA, and Qualcomm hardware will also be available.

And, a more powerful model is on the way. Meta says it is currently training more than 400 billion parameters for its Llama 3 models – models that are able to "communicate in multiple languages," receive more data, and understand images and other patterns, as well as text, which will bring the Llama 3 series in line with publicly released versions like Hugging Face's Idefics2.

"Our near-term goal is to make Llama 3 a multilingual, multimodal, longer context product, and continue to improve the overall performance of core (large language model) features such as inference and coding," Meta wrote in a blog post. "There's still a lot to do."

Meta released Llama 3 calling it one of the best open models out there

Read on

OpenAI secretly launched a mysterious model, suspected to be ChatGPT4.5 for public testing

A summary of 9 models of geometric guide angle problems in the mathematics common test of the high school entrance examination

Five forces model to improve personal core competence

Meta AI released the most powerful open-source large model, Llama 3, which is available in versions 8B and 70B?

How to use AI models to solve practical problems?

In the era of large models, is the data center outdated now?

轩辕大模型的实践与应用 | ML-Summit 2024

The mobile UI model came out, and the Apple iPhone may welcome a new cycle of upgrades

iFLYTEK does not tell the "sexy story" of large models

Meta released the "strongest open-source AI model", and the next generation may be stronger than GPT

面壁新模型:早于Llama3、比肩 Llama3、推理超越 Llama3!

Huawei's profit soared by 564% in the first quarter, Tianya community recovered, and Xiaohongshu tested its self-developed large model

13 Models of Effective Communication Expression

Eat through an industrial chain in one day: NO.37 AI large model industrial chain

10 domestic large models vs. mentally handicapped - Chinese comprehension ability assessment

The most complete interpretation of the MoE hybrid expert model: revealing the key technologies and challenges