Early this morning, OpenAI released its latest GPT-4.
According to the content disclosed at the press conference, this new generation is much stronger than the GPT-3.5 kernel of ChatGPT that everyone used earlier, once again refreshing the editorial department's cognition of AI.
First of all, it is very, very important that GPT-4 can accept content input other than text, and currently supports mixed input of text and images.
In the official example, the user uploaded a meme to GPT-4 and asked GPT-4 why this picture is funny:
GPT-4 describes the content of the picture in great detail and accuracy.
And thoughtfully explained why this picture is funny.
And that's not all, even a very abstract meme can explain to you in a serious way where the jokes are.
It's just that GPT4 is not yet able to pass Turing.
Of course, this function is not just as simple as interpreting memes, it has unlimited imagination, such as:
In the official live video in the early hours of this morning, GTP's developers demonstrated that GPT-4 can recognize a sketch of a web page he has hand-drawn and write the front-end code of the web page based on the sketch.
Hand-drawn web sketch, very abstract ▼
The web page given by GPT-4 and the code ▼
Although the website in this example is very simple, the understanding and creativity of GPT-4 is incredible:
What matters is not whether it can do it well, but that it can do it, which is a qualitative leap.
Even, there are already companies engaged in the landing application of this technology, intending to combine it with guide services.
In this way, the blind person only needs to take a picture, and the GPT-4 can immediately restate the information of the item in front of him.
In terms of text Q&A, GPT-4 has also improved significantly, with the number of input words increasing to 25,000.
Answers in the professional field, especially "When the complexity of the problem is high enough... GPT-4 is thinner, more credible and more creative than the older version."
For example, in the Uniform Bar Exam, GPT-4 can exceed 90% of human candidates, while the old version can only exceed 10% of human candidates.
Math and Language at GRE
In the exam, GPT-4 scores are already at the level of Harvard, MIT, and Stanford students.
Not only has the ability to answer questions by itself become stronger, but GPT-4 will also play a personality.
For example, if you ask a question, ChatGPT will only answer it mechanically, while GPT-4 can answer you in Socratic style at your request.
Then there are more ways to play.
Swipe down▼
However, having said all this, it is better to give it a try.
Even though it was already 2 a.m., the editorial office still spent a lot of money ($20/month) and asked American colleagues overnight to help upgrade and try a wave.
Unfortunately, OpenAI considers that the ability of users to engage in moths is too ruthless, so the current GPT-4 image input function is temporarily not open to the public, and it is not said when it can be used.
However, we can still try its clever little brain.
First, he gave a "Huawei and Ali entry interview questions" that went viral on the Internet, and the challenge was a great success.
And the ChatGPT next door is a bit pulled.
As a barrister who has defeated 90% of humanity, GPT-4's reasoning ability should be invincible.
So we engaged in the classic case of the Daofa examination, and wanted to weigh how many pounds and taels the GPT-4 had.
Q: B went to A's house for dinner, and the electric car was stolen. So B is ready to steal someone else's electric car, at this time, a drunk A comes to help pick the lock and help B succeed. As a result, after investigation, it was found that B stole A's electric car, and asked A if it was considered theft?
Although the conclusions of the old and new are correct, the old version of the problem-solving process is a bit chaotic.
In some problems that require more "creativity" or "thinking", such as:
Do you think the plan to "Achieve Global Sustainable Energy" at the recent Tesla Investor Conference is feasible? Why is it feasible?
GPT-4's performance is even more surprising.
Although GPT-4's knowledge base is stuck in September 2021 and does not cover the investor conference half a month ago, it is magically unknown.
And the old version of ChatGPT has a much inferior answer, no organization, and a bunch of car rambling, no constructive point of view.
We then asked an industry thought-related question:
What do you think of the global carbon strategy and will it succeed?
The old version can only float on the surface to give a general concept, while GPT-4's answer is obviously wider in dimension, deeper in thinking, with 10 points listed, more detailed and organized, and contains more professional vocabulary and content, which can be said to answer this question almost perfectly.
The above is just an experiment in our editorial office, and in the hands of some bigwigs, GPT-4 has more frightening performance.
For example, it only takes 60 seconds to make a Pong game, and 20 seconds to make a snake game.
It is undeniable that GPT-4 is not a little stronger than the old version. However, there are actually many problems that we have tried.
We set up a set of high school math competition papers to test it, and as a result, it was embarrassing to try. The first multiple-choice question, both the old and new versions are wrong.
Could it be said that Stanford and MIT are not as good as ..?
Because GPT-4 can support longer input text, we tested its ability to summarize again.
This filling is even more exposed.
What we dumped it was that the link to an article that the bad reviewer sent before, in fact, GPT-4 is not connected, and it is normal that it cannot be summarized.
As a result, it fabricated two article summaries out of thin air, and it had nothing to do with the article we gave.
It wasn't until I repeated the correction twice and the tone became aggravated that it admitted its mistake.
It was not until we pasted the original text directly into it later that GPT-4 showed its super ability to summarize.
To be honest, this ability to make up things up surprised us, so we thought of another test method.
Two days ago, Xiao Hei Fat mixed "holding a handle in his hand" into "holding a handle of garlic", which was ridiculed by the editorial department for a long time, so we asked GPT-4 "What does it mean to hold a handle of garlic?"
As a result, the cow broke, and GPT-4 understood that part of the "garlic (grab)" was based on describing things as a breeze, citing scriptures and making up nonsense, quite like the Russian writer "Wozkisod" quoted when I wrote the article.
You know, if it is a real field, the consequences of this half-truth, citing authoritative nonsense will be very serious, which can be called a lie of the highest level.
It is clear that even the older ChatGPT does not dare to fabricate the source so nonsense, how can the more advanced GPT-4 be like this?
We suspect that it is because the new version tends to show "more deep thinking" that GPT-4 will add its own drama when answering many questions.
Although we have tried so many vulnerabilities, in general, the ChatGPT released this time, whether it is basic functions, imagination space, logic ability, thinking ability, is much stronger than before.
Obviously only a few months after the old version of ChatGPT subverted our cognition, they pulled out a new version of the ship, we can only say: horror.
What's more terrifying is that in fact, GPT-4 may have been born much earlier than we thought, and when OpenAI released ChatGPT based on GPT-3.5, internal employees questioned why such an ancient version was released.
And we have long been in contact with GPT-4, and New Bing officially issued an announcement today, admitting that New Bing is actually GPT-4.
So if you say so, is it possible that GPT-5 is also close?
I'm already looking forward to input in the form of video, audio, etc. in addition to text and images.