Editor: Editorial Department

Some time ago, Microsoft announced and open-sourced the latest generation of large model WizardLM-2, which claims to be comparable in performance to GPT-4. However, before it was launched for a day, the model weights and announcements were all deleted, and the reason was ......

Last week, Microsoft parachuted in WizardLM-2, an open-source model that can be called GPT-4.

Unexpectedly, a few hours after it was released, it was immediately deleted.

Some netizens suddenly found that WizardLM's model weights and announcement posts were all deleted, and they were no longer in Microsoft's collection.

Within a few hours of its release, Microsoft deleted the GPT-4 open-source model in seconds!

The GitHub project homepage has become 404.

Project Address: https://wizardlm.github.io/

The weights of the model, including the HF, are also all gone.....

全网满脸疑惑,WizardLM怎么没了?

However, the reason why Microsoft did this was because the team forgot to "test" the model.

Subsequently, the Microsoft team came forward to apologize and explain that it had been a while since WizardLM was released a few months ago, so we were not very familiar with the new release process now.

We accidentally missed one thing that was required in the model release process: poison testing

Microsoft WizardLM is the second generation

In June last year, the original WizardLM, which was fine-tuned based on LlaMA, was released, attracting a lot of attention from the open source community.

Address: https://arxiv.org/pdf/2304.12244.pdf

Subsequently, a code version of WizardCoder was born, a model based on Code Llama and fine-tuned using Evol-Instruct.

The test results showed that WizardCoder's pass@1 on HumanEval reached a staggering 73.2%, surpassing the original GPT-4.

Fast forward to April 15th, and Microsoft developers officially announced the new generation of WizardLM, this time fine-tuned from Mixtral 8x22B.

It contains three parametric versions, which are 8x22B, 70B, and 7B.

Most notably, in the MT-Bench benchmark, the new model achieved a leading edge.

Specifically, the performance of the WizardLM 8x22B model in the maximum-parameter version is almost close to that of GPT-4 and Claude 3.

At the same parametric scale, the 70B version ranks first.

The 7B version is the fastest, even achieving performance comparable to the leading model with 10 times the parameter scale.

The secret behind WizardLM 2's outstanding performance is the revolutionary training methodology Evol-Instruct developed by Microsoft.

Evol-Instruct leverages large language models to iteratively rewrite the initial instruction set into increasingly complex variants. These evolutionary instruction data are then used to fine-tune the base model to significantly improve its ability to handle complex tasks.

The other is RLEIF, a reinforcement learning framework that also played an important role in the development of WizardLM 2.

In WizardLM 2 training, the AI Align AI (AAA) approach is also employed, which allows multiple leading large models to guide and improve with each other.

The AAA framework consists of two main components, namely "co-teaching" and "self-learning".

During this phase of co-teaching, WizardLM and a variety of licensed, open-source, and proprietary advanced models conduct mock chats, quality judges, suggestions for improvements, and close skill gaps.

By communicating with each other and providing feedback, models can learn from their peers and refine their capabilities.

For self-learning, WizardLM can generate new evolutionary training data for supervised learning and preference data for reinforcement learning through active self-learning.

This self-learning mechanism allows the model to continuously improve performance by learning from the data and feedback it generates on its own.

In addition, the WizardLM 2 model was trained using the generated synthetic data.

In the researchers' view, the training data for large models is drying up day by day, and it is believed that the carefully created data of the AI and the model supervised by the AI step by step will be the only way to a more powerful AI.

As a result, they created a synthetic training system that is entirely AI-driven to improve WizardLM-2.

Netizens who are quick have already downloaded the weights

However, many people had already downloaded model weights before the library was deleted.

Several users also tested on some additional benchmarks before the model was removed.

Fortunately, the netizens who tested it were impressed by the 7B model and said that it would be the preferred model for performing local assistant tasks by themselves.

It was also poisoned and found to have a score of 98.33 for the WizardLM-8x22B, compared to 89.46 for the base Mixtral 8x22B and 92.93 for the Mixtral 8x7B-Indict.

The higher the score, the better, which means that the WizardLM-8x22B is still very strong.

If there is no poisoning test, it is absolutely impossible to send the model out.

Large models are hallucination-prone and well-known.

If WizardLM 2 outputs "toxic, biased, or incorrect" content in its response, it is not friendly to large models.

In particular, these mistakes have attracted the attention of the whole network, and Microsoft itself will be subject to criticism, and even be investigated by the authorities.

Some netizens wondered that you can update the indicator through the "poison test". Why would I want to delete the entire repository and weights?

According to the Microsoft authors, this is the only way to do this according to the latest internal regulations.

Others say that we want a model that has not been lobotomy.

However, developers will need to be patient, and the Microsoft team has promised that it will go live again after the testing is complete.

Within a few hours of its release, Microsoft deleted the GPT-4 open-source model in seconds!

Some time ago, Microsoft announced and open-sourced the latest generation of large model WizardLM-2, which claims to be comparable in performance to GPT-4. However, before it was launched for a day, the model weights and announcements were all deleted, and the reason was ......

Read on

Meta AI released the most powerful open-source large model, Llama 3, which is available in versions 8B and 70B?

How to use AI models to solve practical problems?

In the era of large models, is the data center outdated now?

轩辕大模型的实践与应用 | ML-Summit 2024

The mobile UI model came out, and the Apple iPhone may welcome a new cycle of upgrades

iFLYTEK does not tell the "sexy story" of large models

Burpsuite Core Generator for Python Penetration Testing Primer

HTTP traffic image restoration for Python penetration testing primer

Meta released the "strongest open-source AI model", and the next generation may be stronger than GPT

面壁新模型:早于Llama3、比肩 Llama3、推理超越 Llama3!

Huawei's profit soared by 564% in the first quarter, Tianya community recovered, and Xiaohongshu tested its self-developed large model

Laughing to death, the foreign guy tested the effect of the drug, and the result was the next second ...

13 Models of Effective Communication Expression

Eat through an industrial chain in one day: NO.37 AI large model industrial chain

10 domestic large models vs. mentally handicapped - Chinese comprehension ability assessment

The most complete interpretation of the MoE hybrid expert model: revealing the key technologies and challenges

Fantasy Journey to the West's single method modified the multiplier data test!, only an increase of 10%, can not threaten the red child

Baidu's strongest SOTA: 3DGS based on diffusion model!

"Book Report" Python Black Hat (Hacking and Penetration Testing Programming)

WMI Process Monitoring for Python Penetration Testing

python渗透测试入门之shellcode

Python Penetration Testing Primer Keylogger

Python渗透测试入门之Kithup木马

Burpsuite for Python Penetration Testing Primer adds the Bing API plugin

Use Node.js to implement automatic signature verification for Postman interface testing

Sprint 2024 "Half Year Red" | Sixty percent of AI companies have achieved profitable growth, and large model companies have made money?

The launch of Chang'e-6 is imminent, the US military tests new weapons, foreign media: rewrite the rules of space superiority

Apple iOS 17.5 has released four beta versions of the system, which system is best for upgrading?

iOS 17.5 Beta 4 Comprehensive Test Results Coming!

WeChat Secrets: Grayscale Testing and Holiday Surprises

Dialogue with UBTECH Jiao Jichao: Large model accelerates humanoid robots to "work in the factory"

After the Huawei Pura70 Ultra performance and battery life game test, I found that Huawei has improved so much