#科技之巅 #

Editor: Editorial Office

OpenAI's GPT-4 has made a bright debut in the spotlight, and the multimodal function is too explosive, almost blinding human eyes. Jim Fan, a senior apprentice and Dr. of Stanford, said that GPT4 can already be admitted to Stanford by himself with such a strong reasoning ability!

Sure enough, the only one who can beat yesterday's OpenAI is today's OpenAI.

Just now, OpenAI shockingly released the large-scale multimodal model GPT-4, which supports the input of images and text, and generates text results.

Claimed to be the most advanced AI system in history!

GPT-4 King Crowned! Reading pictures and doing questions is explosive, and you can be admitted to Stanford on your own

GPT-4 not only has eyes to understand pictures, but also has almost achieved full scores in major exams, including GRE, sweeping various benchmarks, and performance indicators are bursting.

OpenAI spent 6 months iteratively tweaking GPT-4 using adversarial testing procedures and lessons learned from ChatGPT, resulting in the best results ever in terms of authenticity, controllability, and more.

Everyone remembers that in early February, Microsoft and Google fought for three days, and on February 8, when Microsoft released the ChatGPT version of Bing, it was said that Bing was "based on ChatGPT-like technology".

Today, the mystery has finally been solved - the big model behind it is GPT-4!

Geoffrey Hinton, one of the Big Three of the Turing Awards, marveled at this, "After the caterpillar absorbs the nutrients, it turns into a butterfly. And humans have extracted billions of understanding nuggets, GPT-4, which is human butterflies."

By the way, ChatGPT Plus users can now get started first.

The exam is almost a perfect score, and the performance jump is skyrocketing

In casual conversation, the distinction between GPT-3.5 and GPT-4 is subtle. The difference only occurs when the complexity of the task reaches a sufficient threshold, and GPT-4 is more reliable, creative, and capable of handling more subtle instructions than GPT-3.5.

To understand the differences between the two models, OpenAI was tested on various benchmarks and some mock exams designed for humans.

GPT-4 Among the various tests, several tests are close to full scores:

USABO Semifinal 2020
GRE Writing

Taking the BAR Lawyer License Examination in the United States as an example, GPT3.5 can reach the 10% level, and GPT4 can reach the 90% level. The Biology Olympiad soared directly from the 31% level of GPT to the 99% level.

In addition, OpenAI evaluated GPT-4 on traditional benchmarks designed for machine learning models. From the experimental results, GPT-4 is much better than existing large-scale language models, as well as most SOTA models:

In addition, GPT-4's ability performance in different languages: the accuracy of the Chinese is about 80%, which is already better than the English performance of GPT-3.5.

Many existing ML benchmarks are written in English. To get an initial idea of GPT-4's capabilities in other languages, the researchers used Azure Translate to translate the MMLU benchmark, a set of 14,000 multiple-choice questions covering 57 topics, into multiple languages.

In 24 of the 26 languages tested, GPT-4 outperformed GPT-3.5 and the English language performance of other large language models (Chinchilla, PaLM):

OpenAI says it uses GPT-4 internally, so it also focuses on the effectiveness of large language models in content generation, sales, and programming. It is also used by insiders to help humans evaluate AI output.

In this regard, Jim Fan, a senior apprentice of Li Feifei and NVIDIA AI scientist, commented: "The strongest thing about GPT-4 is actually reasoning ability. Its scores on the GRE, SAT, and law school exams are almost identical to those of human candidates. In other words, GPT-4 can be admitted to Stanford on its own."

(Jim Fan himself graduated from Stanford!) ）

Netizen: It's over, as soon as GPT-4 is released, we humans will not be needed...

Read pictures and do small cases, and even understand terriers better than netizens

The highlight of this upgrade of GPT-4 is, of course, multi-modality.

GPT-4 can not only analyze summary graphic icons, but even read meme diagrams and explain where memes are and why they are funny. In this sense, it can even kill many humans in seconds.

OpenAI says GPT-4 is more creative and collaborative than previous models. It can generate, edit, and iterate on users for creative and technical writing tasks, such as composing songs, writing screenplays, or learning users' writing styles.

GPT-4 can take images as input and generate titles, classifications, and analysis. For example, give it a picture of ingredients and ask it what it can do with these ingredients.

In addition, GPT-4 is capable of processing text over 25,000 words, allowing for long form content creation, extended sessions, document search, and analysis.

GPT-4 surpasses ChatGPT in its advanced reasoning capabilities. As follows:

Meme recognition

For example, show it a weird meme and ask where it's funny.

After GPT-4 gets it, it will first analyze the content of a wave of pictures and then give an answer.

For example, analyze the following one graphically by graph.

GPT-4 reacted immediately: the "Lighting charging cable" in the picture looks like a large and outdated VGA interface, plugged into this small and modern smartphone, with strong contrast.

Give such a meme again, ask where is the GPT-4 terrier?

It replied fluently: The funny thing about this meme is that it "does not match the picture and text".

The text clearly says that the Earth was taken from space, but the picture is actually just a bunch of chicken nuggets arranged like a map.

GPT-4 can also read comics: why add layers to neural networks?

It hit the nail on the head, and this cartoon satirizes the difference between statistical learning and neural networks in the way they improve model performance.

Chart analysis

What is the average daily meat consumption in Georgia and Western Asia combined? Before giving an answer, please provide step-by-step reasoning.

Sure enough, GPT-4 clearly lists its own problem-solving steps -

1. Determine the average daily meat consumption in Georgia.

2. Determine average daily meat consumption in Western Asia.

3. Add the values from steps 1 and 2.

Do physics problems

GPT-4 is required to solve a physics problem of the Polytechnic Polytechnique, the radiation detection principle of the bolometer. It is worth noting that this is still a French question.

GPT-4 begins to solve the problem: To answer question I.1.a, we need the temperature T(x) of each point, represented by the abscissa x of the conductive rod.

The subsequent problem-solving process is highly energetic.

You think that's all the GPT-4 capabilities are all about?

Boss Greg Brockman went straight online for the demo.

The most amazing thing is that GPT-4's super understanding of code helps you generate code.

Greg directly drew a scribbled diagram on the paper, took a picture, sent it to GPT and said, write the web page code according to this layout for me, and write it.

In addition, if the operation is wrong, throwing the error message, or even the screenshot of the error message, to GPT-4 can help you give corresponding tips.

Netizens called out: GPT-4 press conference, teach you how to replace programmers.

By the way, you can also file taxes with GPT-4. You know, every year Americans spend a lot of time and money on filing taxes.

Training process

Like previous GPT models, the GPT-4 base model was trained using publicly available internet data as well as OpenAI-authorized data to predict the next word in the document.

The data is an internet-based corpus that includes correct/wrong solutions to mathematical problems, weak/strong reasoning, contradictory/consistent statements that are enough to represent a large number of ideologies and ideas.

When the user prompts to ask a question, the underlying model can react in a variety of ways, but the answer may be far from the user's intent.

Therefore, to align it with the user's intent, OpenAI fine-tunes the model's behavior using reinforcement learning based on human feedback (RLHF).

However, the model's power seems to come primarily from the pre-training process, and RLHF does not improve test scores (it actually lowers test scores if not actively reinforced).

The basic model needs to prompt the project to know that it should answer the question, so the guidance of the model mainly comes from the post-training process.

A major focus of the GPT-4 model is to build a deep learning stack that scales predictably. Because for large training like GPT-4, extensive model-specific tuning is not feasible.

As a result, the OpenAI team develops infrastructure and optimizations with predictable behavior at multiple scales.

To verify this scalability, the researchers accurately predicted the eventual loss of GPT-4 on an internal codebase (not part of the training set) in advance by inference from a model trained using the same methodology, but using 1/10,000 of the amount of computation.

Now, OpenAI can accurately predict the loss of metrics optimized during training. For example, extrapolating from a model with a computational volume of 1/1000 and successfully predicting the pass rate of a subset of the HumanEval dataset:

Others remain unpredictable. For example, the Inverse Scaling competition aims to find a metric that gets worse as the amount of computation in the model increases, and the hindsight neglect task is one of the winners. But GPT-4 reversed this trend:

OpenAI believes that the ability to accurately predict the future of machine learning is essential for technical security, but it is not taken seriously enough.

Now, OpenAI is putting more effort into developing methods and calling on the industry to work together.

List of contribution

At the same time as the release of GPT-4, Open AI also disclosed the organizational structure and personnel list of GPT-4.

Swipe up and down to see all

Professor Chen Baoquan of Peking University said,

No matter how good the movie is, no one will see the end of the final cast and crew. This play of Open AI doesn't even take an unusual path. There is no doubt that this will be not only the most read, but also carefully studied list of "actors" (contributors), and the biggest thing to see is the detailed contribution classification, almost a rough department setup structure.

This very "bold" disclosure is actually quite far-reaching, reflecting the core concept behind Open AI, and also predicting the direction of future progress to a certain extent.

Resources:

https://openai.com/product/gpt-4

GPT-4 King Crowned! Reading pictures and doing questions is explosive, and you can be admitted to Stanford on your own

OpenAI's GPT-4 has made a bright debut in the spotlight, and the multimodal function is too explosive, almost blinding human eyes. Jim Fan, a senior apprentice and Dr. of Stanford, said that GPT4 can already be admitted to Stanford by himself with such a strong reasoning ability!

Meme recognition

Chart analysis

Do physics problems