Author | Li Zhongliang, Chu Xingjuan

Introduction: Remember the moment when Musk open-sourced Grok on March 18, and now Grok 1.5 is about to arrive, with its superior coding and math processing capabilities, deeper contextual understanding (up to 128,000 tokens), and more accurate long-text retrieval capabilities. Musk is Musk, this speed, how can it not be admired. Grok-1.5 will be available to developers on X in the coming days.

Introducing Grok-1.5

On March 28, local time, Musk released Grok-1.5, a new artificial intelligence model with unprecedented long context support and advanced reasoning capabilities. Grok-1.5, the latest version of the series, is expected to be available to early testers and existing users of the X platform in the coming days. With the help of the Grok-1 model weights and network architecture, which were made public two weeks ago, the team demonstrated the technical achievements up to November last year, and has made significant progress in inference and problem solving since then.

Ability & Reasoning

One of the most significant improvements in Grok-1.5 is the more powerful performance for coding and math-related tasks. In the team's experiments, Grok-1.5 achieved a score of 50.6 percent on the MATH benchmark and a 90 percent score on the GSM8k benchmark—two math benchmarks that cover a wide range of competition questions from elementary school to high school. In addition, Grok-1.5 scored 74.1% on the HumanEval benchmark, which evaluates code generation and problem-solving skills.

Musk officially announced that Grok-1.5! is 16 times the context of GPT-4, and the reasoning power is more than DBRX

Ability to understand long contexts

Another new feature in Grok-1.5 is the ability to process up to 128K tokens in a context window. This allows Grok to have 16 times the memory capacity of its predecessor context, so it is able to digest information from large documents.

In addition, the Grok-1.5 model can handle longer, more complex prompts, maintaining its instruction tracking capabilities while the context window expands. In the Needle In A Haystack (NIAH) evaluation, Grok-1.5 demonstrated powerful retrieval capabilities, embedding text in long contexts of up to 128K tokens to achieve perfect search results, and in terms of text length alone, Grok-1.5 can really span 16 times more than GPT-4.

So how is such a strong model trained? Let's take a look at the infrastructure of Grok-1.5.

Running leading large language models (LLMs) on large GPU clusters requires a robust and flexible infrastructure. Grok-1.5 is based on a custom distributed training framework based on JAX, Rust, and Kubernetes, and this training stack enables the Grok team to prototype designs with minimal effort and train new architectures at scale.

The core challenge of training large models on large computing clusters is how to maximize the reliability and uptime of training jobs. The Grok team's custom trained coordinator is able to automatically detect problematic nodes and remove them from the training job. The team also optimized mechanisms such as checkpointing, data loading, and training job restart to do everything possible to reduce unplanned downtime caused by failures.

Grok 1.5 VS DBRX, the "strongest" open-source large model

At present, the Grok team has not said whether Grok-1.5 is open source, but it is speculated from Musk's lawsuit with OpenAI that Grok-1.5 is likely to be open source, otherwise there is a suspicion of "inconsistent knowledge and action".

The current competition in the open source large model market is also very fierce, Meta, Mistral, etc. are already at the forefront, but the market is also changing very fast. On March 27, local time, Databricks, an AI startup in the United States, announced as a "dark horse" that DBRX, a new general large model developed by its Mosaic Research team, will be open-sourced. The announcement was made by Jonathan Frankle, the lead neural network architect at the DBRX project, after confirming the test results, who confidently told the team: "We have surpassed all existing models on the market. Some of the test results are shown in the chart below:

DBRX performed well in a number of key tests. DBRX scored 73.7% on the MMLU test for language comprehension and 70.1% on the HumanEval test for code generation ability.

In addition, DBRX also performed very well in mathematical problem-solving skills, scoring 66.9% in the GSM8k test, which shows that DBRX is even more capable of programming than professional models such as CodeLlaMa-70B.

However, just a day later, Grok 1.5 was announced, and it was expected that Grok 1.5 would perform better than the "strongest" open source large model, DBRX. Assuming everyone cheats in the test, Grok leads the MMLU test with a score of 81.5%, HumanEval wins with a score of 74.1%, and a staggering 90% score in the GSM8k test far exceeds DBRX's 66.9%. Also on long text, Grok 1.5 processes up to 128K tokens in the context window, far exceeding DBRX 32K.

Of course, this is only the performance of the test set and does not fully explain the actual situation, but the good performance on the test set is definitely an advantage.

Developers are eagerly anticipating Grok 1.5

Regarding the sudden release of Grok-1.5, some netizens said that the test chart from Grok-1.5 was impressive. In terms of information retrieval, its performance is comparable to that of Claude-3-Opus and GPT-4-Turbo. Can't wait to give it a try.

Netizens' enthusiasm for Grok1.5 is evident in every comment: "Awesome, this is really an exciting development!" and the excitement is palpable as new features are on the horizon. "Can we get an overview of the web-based release timeline? I can't wait for it to come to Australia. Even with Grok version 1.0, the Spanish language support is already excellent!"

Some netizens also believe that unless Musk has a 10-fold advantage, it will be difficult to win in the competition of open source large models.

Of course, it's worth noting that Musk has said that Platform X will open up access to the Grok chatbot to more users, especially for those who have already subscribed to the $8 per month premium plan. This price is significantly more economical than GPT-4, which costs $19.99 per month to use, and Gemini Advanced, which costs $28.99 per month.

Also, historically, X.ai's Grok models differ from other generative AI models in that they answer questions on topics that other models don't normally touch, such as intrigue and more controversial political ideas. Bolder, freer.

epilogue

GPT-4 has been with us for more than a year, while Gemini 1.5 was unveiled a few months ago and Claude 3 was only a few weeks ago. DBRX, an open-source model released yesterday, claims to outperform all current large models, and today it is surpassed by Grok 1.5 in some subdivisions. Which model will take the lead in the future? While it is not yet known, there is no doubt that we are in the golden age of AI development, and we are very lucky.

Original link: Musk officially announced Grok-1.5!Super GPT-4 16 times the context, reasoning ability beyond DBRX, netizens: Win in dare to say!_AI&Large model_Li Zhongliang_InfoQ Selected articles

Musk officially announced that Grok-1.5! is 16 times the context of GPT-4, and the reasoning power is more than DBRX

Introducing Grok-1.5

Ability & Reasoning

Ability to understand long contexts

Grok 1.5 VS DBRX, the "strongest" open-source large model

epilogue

Read on

The United States accuses China of overcapacity of trams and will ban imports! Elon Musk: Who will protect Tesla

Musk praised China's high-speed rail for killing the United States in seconds, and those who have never set foot don't know the reason, and the comparison makes the United States lose color

"Iron Man" Musk said: 30 years after the construction of Mars, we humans have a show!

Tesla π launch with 6800mAh, Musk's mobile phone is here! 16 512

Send me the code! Musk forced Twitter employees to write weekly reports, inviting hackers to change the program

It only takes 13 minutes to reach Mars, can Musk's light-speed engine theory be realized?

Musk visited Indonesia today to develop Starlink, why Indonesia, not India?

Musk is also fleeing the mainland? Tesla built a factory in Mexico and asked suppliers to go with it

39 million people watched Lei Jun's live test drive; Musk recruits second brain-computer experiment patient; DeepMind launches a large-scale model risk assessment framework

Musk posted a regret: China hopes that Tesla can succeed, but the United States is attacking it

Musk: Give me 25% of Tesla, otherwise divest artificial intelligence and robotics

was rewarded with 400 million at a time, and he was called a brother and sister to Musk, why did he become Tesla's prince?

Tesla shareholders made waves against Musk's sky-high compensation plan

Musk demanded a 25% increase in Tesla's shares, otherwise it would divest AI and robotics

You are getting a higher salary than Musk, and how much did the CEOs of the Seven Heroes get paid last year

Musk's second subject from Brain-Computer Interface was approved by the FDA to implant chips in 10 people this year