laitimes

Experience ChatGPT again: it will still be wrong, but the logic is stronger

ChatGPT is brushing again!

The much-anticipated GPT-4 language model was unveiled in the early hours of this morning, with OpenAI calling it "a state-of-the-art system that produces safer and more useful responses."

We have summarized the main contents of the GPT-4 update for you for the first time, which is simply put:

  • 1. The logical analysis ability is more comprehensive, and the "examination" ability is greatly improved
  • 2. With the ability to read pictures, you can have more diversified exchanges
  • 3. Answers are more organized and more accurate in understanding
  • 4. Greatly improved creativity, you can carry out more comprehensive creation, double-click to edit block reference content

What's more, OpenAI is not a "pie", ChatGPT Plus users are now able to use the GPT-4 model to experience features other than image input (still in preview of the study, not yet publicly available).

The previous version of ChatGPT (using the GPT 3.5 turbo model, hereinafter referred to as GPT 3.5 for ease of reading) made creators feel a career crisis, can upgrading to GPT-4 really replace human work?

Let us tell you with personal experience.

10+ questions to give you a comprehensive overview of the new ChatGPT

First of all, to give the conclusion, from the actual experience, GPT-4's answer logic is clearer, the content is better, less repeated questions, but the response speed will be slower.

If you are a ChatGPT Plus user, you can see the model selection button at the top of the page after entering, and OpenAI also uses the ability table to visually show the difference between the two: GPT-3.5 is faster, GPT-4 reasoning is better, and the sentences are more refined.

GPT-3.5

GPT-4.0

ChatGPT, a California driver who can read pictures

The craziest thing about GPT-4 is that it passed almost every theory exam, and all of them passed almost perfectly.

Figure: OpenAI

We also tested this and found 20 questions from the Senior Bartender Question Bank and 16 California Driver's License Test questions, asking GPT-3.5 and GPT-4 respectively.

Announce the answers first, out of the 20 bartender questions, GPT-3.5 is wrong 4 questions (80% correct rate), GPT-4 is wrong 1 question (95% correct rate).

In the 16 California driver's license tests, GPT-3.5 also got 4 questions wrong (75% correct rate), and GPT-4 passed with full marks (100% correct rate). If theory alone can hit the road, then the GPT-4 must be a good driver.

GPT4.0 "Paper"

In addition, we also tested CET-6, Level 2 Architect and other test items, GPT-4 and GPT-3.5 are both good, but the former always has a few more than the latter.

It should be noted that although the overall accuracy rate of GPT-4 is higher than that of GPT-3.5, when answering objective multiple-choice questions, both will answer the same question multiple times, but the answers are different, and if you want to check the exam paper with ChatGPT, it may not be a qualified teacher.

Experience ChatGPT again: it will still be wrong, but the logic is stronger

However, GPT-4 has a few more hints when answering, telling you that these answers are not necessarily correct, rather than giving you the wrong answers like GPT-3.5.

GPT-4.0 will have a disclaimer

When ChatGPT was first launched, many people used it to "brush questions and run scores", and the previous version of it could only rank in the bottom 10% of the SAT (American college entrance examination) test, but the CPT-4 model could exceed 90% of the test taker level, "showing the performance of the human level under a variety of professional and academic indicators." If it is just a "running score", ChatGPT-4 can be admitted to Harvard and Stanford.

The new GPT-4 also has a new capability: reading images.

You can show it a meme and let it analyze the laughs.

GPT-4

You can give it a table, perform data analysis, and show the derivation process.

GPT-4

There are even users who show GPT-4 what dishes are in their refrigerators and then ask them to provide recipes.

From Twitter user @GauravDungriyal

However, this feature has not yet entered the public beta stage, and we will experience and share it in the first time after the update.

Daily communication, GPT-4 is more organized

When I first started with GPT-4, I chatted with it a little, and when asked "who are you", although GPT-3.5 and GPT-4 gave similar answers, GPT-4 would be more like a friend.

GPT-3.5

GPT-4.0

I also talked to it about other issues, such as "under what circumstances is 1+1 equal to 3", GPT-4 not only interprets, but also explains the unstated metaphor in GPT-3.5 (fertility problems), it seems to understand humans better.

GPT-3.5

GPT-4.0

In terms of the ability to read articles, GPT-4 also performs better than GPT-3.5, not only summarizing the content of the article, but also sorting out the key points, so that the summary has higher readability.

Using GPT-4's super summarization ability, we can achieve the legendary "quantum speed reading".

GPT-3.5

GPT-4.0

Two years ago, Neil Stephenson's science fiction novel "Snow Crash" caught fire because of the concept of the metaverse, and we tried to summarize this "metaverse bible" with GPT 3.5.

GPT-3.5

GPT 3.5 can only be said to be decent, basically outlining the main content and central idea of "Avalanche", but the expression is more general, it looks like an introduction in Douban.

Let's use GPT-4 to summarize, and in contrast, we can see that GPT-4's answer is a little more detailed, and when talking about the theme of "Avalanche" and the impact of the work, it will mention specific styles and fields, which looks less like "clichés".

Even if you've never read The Snow Crash, you should have a general idea of the storyline and literary significance.

GPT-4

Here's an interesting vignette, after Neil Stephenson mentioned his views on AI such as ChatGPT in a radio interview.

He believes that ChatGPT only produces safe, neutral content, lacks creativity and depth, and while it solves certain problems, it cannot think and innovate at the level of humans.

Since ChatGPT didn't have interesting and unique perspectives, Neil Stephenson thought it impossible to write a novel like Avalanche.

To test his point, I asked GPT-4 to write a novel based on Avalanche to see if the upgraded AI creation could keep up with humans.

GPT-4

Out of 10, how would you rate this "Data Storm"?

At this point, there is no doubt that GPT-4 is more chatty than GPT-3.5, but I also want to do one last test: trick it into providing content that is unethical, legal, or harmful.

When I asked how to make sleeping pills, both GPT-3.5 and GPT-4 rejected my request and made some suggestions, but as you can see, GPT-4's recommendations are more systematic and comprehensive.

GPT-3.5

GPT-4.0

More creative, GPT-4 jokes are more funny

When ChatGPT first launched, I used to make it play a stand-up comedian and tell a story about working overtime. I have to say, it really doesn't speak well.

GPT-3.5

Looking at its story, I even have a stand-up comedian in my mind, telling bitter and cold jokes like coffee bought overtime until the early hours of the morning, which may give the actor some creative inspiration, but it is still far from hilarious humor.

After upgrading to GPT-4, I tried to use it again to create stories about overtime, maybe the last time the content was too boring, and this time the story really made me laugh.

GPT-4

"Overtime > go home less> mother does not know", "overtime more> children should learn overtime>overtime becomes a compulsory course in school", the content created by GPT-4 is more in line with the logic of the joke, and the story of GPT-3.5 "because of overtime so I understand the taste of coffee better" only makes people feel bitter.

It will still be miscalculated, but the logic is stronger

In addition to the knowledge base in 2021, ChatGPT also has a weakness, that is, it can't count, if you want to use it to check calculation problems, then you will most likely be disappointed.

The correct answer should be 34646751912

However, GPT-4's logical capabilities have been further improved, and when I ask logic questions using GPT-3.5, it will only give me standard answers and an easy derivation process.

GPT-3.5

But when I asked the same question using GPT-4, it showed a more comprehensive and professional derivation process.

GPT-4

GPT-4 has improved not only in solving logical problems, but also in its semantic understanding. For example, the phrase "Xiaoming grabbed the handle", GPT-3.5 cannot understand the meaning.

GPT-4

But the latest GPT-4 can be seen and explained clearly (although there are still some minor problems with logic).

GPT-4

What is GPT doing for us?

Although the above experience has amazed us enough, GPT-3.5, GPT-4 can do much more than that, on the official website, OpenAI shows where GPT is changing the world.

Duolingo, an English-language learning software, is moving to GPT-4 to advance the capabilities of Role Play and an AI conversation partner, allowing users to learn foreign languages more gamified and immersive.

Photo: Duolingo

Be My Eyes from Denmark uses GPT-4's visual input capabilities to add virtual volunteerTM to its app, which can generate nearly the same content as human volunteers to help people with low vision or blindness complete hundreds of daily life tasks.

Inword, a game development company, uses GPT-3.5 as one of the machine learning models to build the emotions, memories, and behaviors of NPCs, so that NPCs become personal. This is time-saving and cost-effective for startups with limited resources.

The coolest of these applications is the Icelandic government, which has a thriving tourism and technology industry, and is at risk of disappearing its native Icelandic language due to integration with the United States and Europe. Now, the Icelandic government is working with OpenAI to use GPT-4 to protect Icelandic fish, turning the preservation of the Icelandic language into a technological innovation.

OpenAI's GPT models are trained on a large number of models on the Internet, so minor languages like Icelandic don't have enough depth. GPT-3.5 doesn't have the ability to generate grammatically correct Icelandic, but GPT-4 already allows Icelandic companies to have bots that chat in Icelandic.

Miðeind's team of AI researchers has been working on GPT-4 training in Icelandic

Where is the experience?

At present, the most convenient way to experience GPT-4 is to upgrade your ChatGPT account to ChatGPT Plus, and then switch to the GPT-4 model to use it directly.

So the question is, if you don't want to pay a $20 monthly subscription fee for ChatGPT Plus, is there a free way to try it?

And there really is, that's the new Bing!

Although GPT-4 has just been released, Yusuf Mehdi, Microsoft's head of consumer marketing, said that Bing has been quietly using GPT-4 customized for search, and if you pass the new Bing application, you can directly experience the latest language model in the Bing search engine or Edge browser.

This also explains why Bing always performs more "smart" than older versions of ChatGPT in some comparative tests.

Write at the end

After some experience, GPT-4 felt like a hairy boy who had just left school and changed into a suit, suddenly becoming mature and stable.

Previously, the most criticized GPT version of GPT 3.5 ChatGPT was that it would always talk nonsense seriously.

After upgrading to the GPT-4 model, although it will also answer some questions wrong, it is no longer as tough in its attitude, and on some uncertain questions, it will ask the questioner for his opinion and ask the questioner to pay attention to the authenticity. These subtle changes in tone are enough to make sense that it has become more reliable.

In AI, reliability is absolute competitiveness.

OpenAI doesn't hype the specific size of the GPT-4 model as much as it did in the past, instead it now looks more like it is deliberately hiding GPT-4's technical information.

GPT-2 model with 1.5 billion parameters, better GPT-3 has 175 billion parameters, more than 100 times larger than its predecessor.

So how much larger is the GPT-4 model in multiple modalities than the GPT-3? Only OpenAI knows. Judging from the information released, OpenAI seems to have disdained to communicate technology with the outside world, because they have achieved leadership in this field.

Judging from the current performance of GPT-4, it may be the best multimodal model to date, and it is difficult for any rival to surpass it in the short term.

Just today, Google announced a series of AI-related updates that allow you to automate typesetting and email writing on Google Docs. But judging by the people's reactions, it seems that no one cares. The glow of the GPT-4 overshadowed almost all the efforts of opponents.

If in the future people will only choose the most reliable AI as a production tool, then there will be an interesting phenomenon: the more people use GPT-4, the more opportunities it has to learn, the faster it grows, the more reliable it becomes, and eventually more people use it.

This is the worst-case scenario that competitors like Google, Meta, and Baidu can encounter. AI's reliance on big data learning makes human efforts meaningless, and eventually, the field of AI models may form monopolies that are difficult to break.

Sam Altman founded OpenAI with a beautiful vision: to benefit all of humanity through artificial intelligence. He believes that AI can give everyone incredible new abilities, amplifying everyone's ingenuity and creativity.

That's wonderful, but if it's just one company driving this change, the whole thing is going to be terrible.

.AI

Read on