Baijiao is from the temple of Wafei
Qubits | Official account QbitAI
Suddenly, the whole circle of big models is talking about the same thing.
A statistical chart in the Microsoft paper that "is not surprising at first glance" leaks the "heavenly machine".
ChatGPT, which leads the global storm, has only 20 billion parameters behind the large model???
As soon as the paper was published, it attracted a lot of attention at home and abroad.
Many netizens still don't believe it: Are you sure it's not a spelling error?
Some netizens said: No wonder OpenAI is so nervous about open source. Or maybe it's a preparation for OpenAI to open source.
Coincidentally, just a few days ago, a netizen found a suspected new model of GPT-4: copilot-gpt-4-2 in the API of GitHub Copilot, and the knowledge they have has been updated to March 2023.
What does the paper say?
In addition to revealing secrets, the paper itself is also worth seeing: the industry's first code generation with a diffusion model.
The research team envisioned a scenario like this:
If a developer can only modify the last line of code, how many times does it take to write a function from scratch to complete?
Autoregressive models that generate code in natural language have similar limitations: it's not easy to reconsider previously generated tokens.
Microsoft researchers proposed CODEFUSION using an encoding-decoding architecture, which mainly includes encoders, decoders, denoisers, and classification heads, which encode natural language input into continuous representations, and then input its additional conditions into the Diffusion model for iterative denoising with Gaussian noise.
In order to generate syntactically correct code, code tokens are obtained from the input decoder after denoising, and CODEFUSION is pre-trained by a Continuous Paragraph Denoising (CPD) task against the code.
CODEFUSION was evaluated on three language tasks: Python, Bash, and Excel Conditional Formatting (CF) rules.
THE RESULTS SHOW THAT ITS 75 MILLION PARAMETER SCALE CODEFUSION PERFORMANCE IS CLOSE TO THAT OF GPT-3.5-TURBO WITH 20 BILLION PARAMETERS, AND IT ALSO GENERATES MORE DIVERSE CODE.
Compared with the diffusion model generated by plain text, CODEFUSION generates more syntactically correct code; Generate more diverse candidate codes than automated regression models.
Compared to the state-of-the-art autoregressive system (350M-175B parameters), it performs equally in terms of accuracy in the top 1, while it outperforms the autoregressive system in terms of accuracy in the top 3 and top 5 due to its better balance between diversity and quality.
As a result, this was originally just a normal performance comparison, but I didn't expect it to cause an uproar.
Some people have also started conspiracy theories, perhaps this is the "appetizer" of OpenAI's open source, and it is deliberately done -
Because many large models have caught up, and as early as May this year, Reuters broke the news that OpenAI was preparing to open source a new large language model.
One More Thing
It is worth mentioning that as early as February this year, a news report by Forbes revealed that ChatGPT only has 20 billion parameters.
At the time, the headline was "Is Bigger Better?" Why is the "battle" of ChatGPT VS GPT-3 VS GPT-4 just a family chat?"
It's just that not many people cared about it at the time.
Reference Links:
[1]https://twitter.com/felix_red_panda/status/1718916631512949248
[2]https://x.com/teortaxesTex/status/1718972447024623898?s=20
[3]https://www.reddit.com/r/singularity/comments/17jrepb/microsoft_paper_claims_chatgpt_35_has_20_billion/
[4]https://www.zhihu.com/question/628395521
[5]https://www.reddit.com/r/ChatGPT/comments/17ht56t/new_leaks_about_upcoming_developments_with_openai/?share_id=txV27HR0zw0TjV8dLXf4l
[6]https://www.forbes.com/sites/forbestechcouncil/2023/02/17/is-bigger-better-why-the-chatgpt-vs-gpt-3-vs-gpt-4-battle-is-just-a-family-chat/amp/
— END —
QbitAI · Headline tags
Follow us and be the first to know about cutting-edge science and technology