Read GPT-4: The full text of the 10,000-word speech by the author of Microsoft's screen-brushing paper "The Spark of AGI" is here

Sébastien Bubeck, the first author of Microsoft's screen-brushing paper "The Spark of AGI," gave a campus talk called "First Contact," just three hours before the paper was released to arxiv, about the early work of the Microsoft Research team on GPT-4.

The full video of this speech was recently posted on Sébastien Bubeck's YouTube, with the most praised comments saying:

"This speech will go down in history. It's incredible to live in such crazy times. "

The following is a translation of the full content of this historic GPT-4 speech and shared with you.

You know, it's very unlikely that I'll give a talk like today, but that's the way it is, the world is changing, and I'm adjusting my research to that change.

Today, I will introduce you to a very mysterious theme, which is "first contact". Actually, this story is about the last few months at Microsoft, where I had early exposure to GPT-4 because we were integrating it with the new Bing.

Of course, in my work on it, I didn't just do the product part, which was really interesting, but we also did some scientific research around it, or tried to do some scientific research. Doing scientific research on these large models is difficult, and I'm going to tell you about the scientific research part and the journey of discovery that has taken you over the past few months.

So, the real title of this talk is "The Spark of Artificial General Intelligence."

Over the past few months, our collaborative evaluation with GPT-4 has shown that we are seeing something akin to artificial general intelligence. My goal in this talk is to try to convince you that things have indeed changed with the arrival of GPT-4.

This work was done with many great colleagues at MSR, and I want to mention in particular Varun Chandrasekharan, Ronaldo (which I think many of you know very well), who recently joined us, and Johannes Gerker, Ericovitz Eche, Camar Peter Lee, John and Julie, who are also members of my team. I think if GPT models had answered whether they were looking into this question or not, they would give a similar answer. I have also worked with Scott Landberg, Hashanori, Hamid Palangi, Marco Tulio Ribeiro and Yi Zhang, who have now joined us full-time.

First of all, I would like to express my gratitude for a number of matters and to provide some clarifications. I think that's very important.

First of all, the GPT-4 model we are working on is entirely the creation of OpenAI, and I have nothing to do with it. We're completely black-box allowed to use it, and OpenAI has created this truly amazing tool, a tool that's about to change the world, and they deserve full credit. I would like to stress this point again.

The second important thing is that the experiments we did were based on earlier versions of the model. This means that one of the versions of their published papers and statements made is multimodal, while the version we are exposed to is not multimodal, it only supports text input and text output. What's more, after our experiments, they made further modifications to the neural network. Because of these further modifications, the answer you will get may be different when you try some of the tips I'm going to show. In particular, you may not get as good an answer as I showed. The reason is that they have further fine-tuned it for security, which they explain very clearly in their technical report. They tweaked the model to somehow make it safer. So this is an important clarification.

Now, for any scientist in this room, you may be worried that we will not be able to replicate what you tell us, and yes, you will not be able to replicate it. That being said, in this particular case, I don't think repeatability is a big issue because I'm not going to give you any quantitative data. There will be no benchmarks in my presentation. It's about making a qualitative leap, not adding 10 to that benchmark and 20 to that benchmark. This is another matter.

What I'm trying to convince you is that there is some kind of intelligence in this system, and I think we should call it an intelligent system. We're going to talk about what I mean by intelligence, and finally, at the end of the talk, you'll see that it's a matter of judgment, it's not a clear definition, it's not a new type of intelligence, but anyway, I'll try to argue about that.

Now, when I say these things, I think a lot of people may have a lot of emotions, especially you may feel that something is wrong, it definitely does not have intelligence, it does not even have a representation and so on. So I would like to be wary of this kind of view. This is what you might see online or even in newspapers, for example it's just copy-paste, no internal representation, just statistics. How can it be intelligent? It doesn't even have a lexical model.

This talk is not intended to refute all these points, but I still want to say that we need to be wary of this high-dimensional space.

It's very, very hard for us humans to understand. You can do a lot with a trillion parameters. So when people say it doesn't have a lexical model, it's not as simple as that. It is entirely possible to build an internal representation of the world and operate at all levels and times in the process. What I'm saying here may just be two sentences to help you think about this.

From my perspective, we shouldn't think of these neural networks as learning simple concepts like Paris is the capital of France. It's doing more things, like learning operators and algorithms. Internally, it doesn't just retrieve information, it builds an internal representation that allows it to concisely reproduce the data it sees. So you really shouldn't think of it as pattern matching and just trying to predict the next word. Yes, it's trained to predict the next word, but what comes out of it is much more than a statistical pattern matcher.

I think we really need to think of it as learning algorithms, and in my opinion, there is no such learning tool in learning theory. It's very different from what we're used to, and I think it's going to be very interesting, but that's not the point of this talk. It's not something I want to do here, and I don't know how.

So right now, many of you may be wondering in your minds: wait, these things can't have common sense, they don't understand the real world, they just experience reality through texts on the internet, they don't know what it's like to have a hot cup of coffee.

Well, so what we're going to do in this talk is look at a lot of examples and see what happens. Here is an example, and you'll see a lot of similar examples that seem a little silly, but the purpose of stupidity is to go beyond what's on the web, to really try to go beyond memory.

Here's a simple puzzle we posed to GPT: I have a book, 9 eggs, a laptop, a bottle and a nail, please tell me how to stack them on top of each other. I don't think this issue has come up on the internet, it's a very strange question. Here's ChatGPT's answer: It will be very difficult to stack all these items on top of each other, first place the bottle on a flat surface and carefully balance the nails on the bottle. Well, this one doesn't start very well, put eggs on the nail. Well, this method does not work. So ChatGPT doesn't show common sense here. Any doubter will happily say, look, I was right all along, these things do not understand anything, they do not represent the world, there is no common sense, I won.

So, let's see what GPT-4 will do. One possible way to stack it is: first place the book on a flat surface, arrange the 9 eggs in a 3x3 square, leave some space, the eggs will form a second layer, distribute the weight evenly, then you can put on the laptop and so on. At least on this very simple issue, it shows some common sense.

Of course, the literature is full of examples of dramatic failures of these models on common sense problems, and we tried all of them, and GPT-4 showed itself well on all of them. So, let's agree for the time being that it has a certain common sense.

The next question is: well, it understands that eggs are fragile and need to distribute weight evenly, I admit that. But what about theories of mind? This requires more complex processing.

Of course, it doesn't really understand human motivations, emotions, and so on, which is beyond its capabilities. It's a hot topic, with a paper on whether large language models spontaneously produce theories of mind, and a follow-up paper that says models fail completely if small changes are made. There's also a very interesting paper by Josh Tenenbaum's team that argues that language and thought are two very different things.

You'll notice that I also mentioned a paper on interpretability and explainability. I'm not going to dwell on this too much, but it's an important point, and now I'm going to try to convince you guys that, of course, GPT-4 has a theory of mind, and not only does it have a theory of mind, but I think it's going to change the subfield of machine learning interpretability, because once these models understand humans, they'll also be able to interpret decisions in a way that you can understand.

Of course, I know everyone is thinking, well, it will explain itself, but does that really explain its inner workings? Again, I don't want this talk to be all about this topic, but I think there will be a lot of experimentation on this topic.

Plus, all of this will appear on arXiv in a paper tonight. Coincidentally, this paper coincided with this presentation. You can see all the details and everything will appear after three hours. I'm really excited about that so you can check it out and learn more.

Well, let me try to convince you to accept this theory of mine. I will take an example from Thomas's paper.

"In one room there were John, Mark, a cat, a box and a basket. John put the cat in the basket and left the room. When John wasn't there, Mark took the cat out of the basket and put it in the box. In the end, they all came back, what were they thinking? ”

My theory is simple, that is, the person who put the cat in the basket will still think that the cat is in the basket if he does not know that the cat has been removed. Let's see how ChatGPT performs in this regard. There is so much information here that you have to have an internal representation, and when you read the text, you have to change the representation of where the cat is. So let's see how GPT-4 does it. Interesting puzzles... John thinks the cat is still in the basket because he left the cat there. Yes, that's right. Mark thought the cat was in the box because he moved the cat there. Yes, that's right. Oh, and it also knows what cats think (audience laughs), and cats think these people are weird why they move me around.

Okay, so that's what surprised me again and again. I'm not saying this question is particularly profound, but take a moment to think about it, it's interesting.

Okay, let's move on, I don't want the whole talk to revolve around common sense and theory of mind, assuming GPT-4 has done both, but you still wouldn't easily think of it as intelligent, right?

I mean, intelligence isn't just that, and you know the answer here isn't that easy to nasure. I want to be very clear that if we start talking about intelligence, the first thing we have to do is find a definition that we can refer to. Here, I do not want to give my own definition. I mean, people have been working on this for decades, if not longer.

You know, you can argue that humans have been thinking about intelligence for a long time, so what I'm going to do is take a consensus definition published by 52 psychologists in 1994. In the 90s, there was a heated debate about the meaning of IQ tests, and this group of psychologists gave a definition of what intelligence is, and we can argue and disagree on various parts, but this will be the definition I will refer to.

The definition is that intelligence is a very common mental capacity that includes reasoning, planning, problem solving, abstract thinking, more complex ideas, and the ability to learn quickly and from experience. Okay, there are six, and the next thing we're going to do is try to measure GPT-4 in these six areas to see where it fails and where it succeeds. Our assessment is as follows:

I say with great confidence that GPT-4 has reasoning abilities; With great confidence that GPT-4 could not be planned.

This is a very delicate and complex issue that we will discuss at the end of the presentation because it may give you the impression of a plan, and there are so many problems that you might naively think that planning is needed, but there are actually linear solutions. In terms of algorithm design, it can be thought that some problems at first glance will make you think that you need to think about 10 steps in advance, etc., but if you are a little smarter in algorithm design, then there is a linear solution.

So GPT-4 solves these problems. It can solve a lot of problems, and we will see that it is able to think abstractly and of course understand complex ideas.

The last point is a delicate issue – learning quickly and learning from experience, GPT-4 is a fixed language model that does not update over time. For GPT-4, every day is a new day, and every session is a new session, so there is no real-time learning. But in the time frame of a session, you can teach it new concepts it has never seen before, and it can understand them and then work with them. So there is a certain ability in real-time learning, but of course there is no memory.

Now let me say right away, about this assessment, whether you call it intelligence or not, a little depends on you. Some may think that planning is the essence of human intelligence, and that animals can do everything else. What really differentiates us is the ability to plan. If that's your point, then GPT-4 doesn't have intelligence. Another point may be that the whole point of intelligence lies in being able to acquire new skills. If that's how you think about intelligence, GPT-4 doesn't have intelligence either. If you're concerned with problem-solving, abstract thinking, understanding complex ideas, and reasoning about new elements, then I think you have to call GPT-4 intellectual.

So how did we arrive at this assessment?

The point is that you can't make that assessment through benchmarks, which is completely meaningless. Moreover, we do not know what training GPT-4 underwent with. My working hypothesis is that it accepts all the digitized data generated by humans. This is my assumption, I'm not saying it's correct, but it's my working assumption.

So, I know that GPT-4 has probably looked at everything online, especially any benchmarks that already exist. So we can't benchmark it. Instead, the approach we are going to take is an approach rooted in psychology. Instead of following the methods of machine learning, it is better to actually follow the methods of psychology.

The way we want to test intelligence is by having it do creative tasks that are beyond what it has ever seen, that really require thinking in innovative ways and testing in a wide range of fields.

The areas we chose in our paper included vision, theory of mind, programming, mathematics, availability of tools to use, and privacy and harmfulness detection, which are all very important. We can also choose other fields such as medicine, law, physics and chemistry, among others. The point is that GPT-4's intelligence is universal, and it can perform equally well in all of these areas.

So, let's start this evaluation journey and show you an example of a creative task. When anyone first comes into contact with GPT, the first thing they do is ask it to write a poem, which is what we did when we first came into contact with GPT-4. As a mathematician, I asked it to prove the infinity of prime numbers in rhyme. Let's first see how ChatGPT does it. It gives a proof, but without rhyme, not as required. Of course, you know, there are many, many proofs to the thesis that prime numbers are infinite. It's certainly good to give proof, but it doesn't get the job done.

Let's see what GPT-4 will do. Okay, so can you write such a proof?

GPT-4: "Yes, I think I can. This may require an ingenious plan. I will start with Euclid's proof, which shows that prime numbers are not so isolated. I assume we have a finite list of prime numbers and there are no omissions. Multiply them all together and add one to show cleverness...."

(Author's note: GPT-4 answers prove that the original English text does rhyme)

At this point, I remember at the end of September and I thought, "What the hell is going on here?" "It's really unbelievable.

But the point is, you can't stop there. You can't test intelligence by asking questions, listening to answers, and that's it. You have to continue, whether you are right or wrong. When you're trying to test a student, you need to interact with the student. Sometimes students make mistakes, and you can't just say, "Ah, you really don't understand anything, let me stop here." "No, you're going to try to guide the students. So, that's what we're going to try to do. Throughout the presentation, we'll keep asking questions, and if GPT-4 goes off track, we'll help it a little.

Let's see how to proceed.

Again, what we need to do is to be creative, think outside the box, and ask questions.

I asked for an illustration of this proof. But this is not a visual proof, so if I ask you to draw a proof of infinite prime, you may come up with something, but it is not necessarily clear what to draw.

However, GPT-4 should not output images either, so how should it be painted? In this question, I mentioned the SVG format. I could even leave the SVG format aside and just say, "Can you draw an illustration?" Then it will answer: "Hey, this is an SVG image." ”

It doesn't matter what the SVG (Scalable Vector Graphics) format is, it's a bunch of code. GPT-4's answer is just such a bunch of code. If you save it as HTML, you will get such an image.

The painting, while not stunning, does convey the essence of this proof. You can see a finite list of prime numbers such as 2, 3, 5, 7, 11, and so on. These are prime numbers. Okay, now we combine them into a new number n and add one, as it says, just to appear clever. This new n+1 is the prime number we are looking for.

It's just a warm-up. Let's go ahead and dig deeper into the visual capabilities of GPT-4. Here, I would like to tell you about a strange case of a unicorn, which is also my favorite example.

Let me show you this question: Please draw a unicorn in TikZ. Many of the viewers here draw images in LaTeX with TikZ, and I personally wasted a lot of time fighting TikZ during and even after my PhD. Drawing anything with TikZ is painful. Of course, drawing a unicorn in TikZ, I estimate it will take two days. Moreover, I'm sure no one on the Internet has ever asked this question or drawn a unicorn. Who would waste time doing this? This simply does not make sense.

That being said, we can't be convinced just because we believe it's not on the internet. We have to dig deeper, and we'll do that, don't worry. But let me first show you the unicorn drawn by GPT-4. This is the unicorn of GPT-4. When I saw this, I was personally shocked because it really understood the concept of unicorn, knew the key elements of unicorns, was able to draw this very abstract unicorn. To make it clear to everyone the gap between GPT-4 and previous versions, here is the unicorn of the previous ChatGPT version:

So, that's the gap between ChatGPT and GPT4. In this example, we can see that GPT-4 has made great progress. If you've tried an earlier version before and aren't satisfied, I suggest you don't stop and try GPT-4 to see what surprises it can give you.

Okay, so of course you might still say that it's not that remarkable. But one thing we're going to see is that GPT-4 is smart enough to use tools. So you can respond to it and say, hey, you know what, I don't like your drawings very much, can you try to improve it? I've heard of these diffusion models (Author's Note: i.e. the Diffusion model, the core model commonly used in AI painting today), maybe you can use one of them. So it will say, okay, OK, please go to this diffusion model website and insert my picture, ask it to improve, and you will get this.

Well, that's the unicorn of GPT-4 that allows tools. So, you can see where that might go. Again, I don't want to stop there, and we'll continue to explore.

In this case, how are we going to explore further?

The next thing I'm going to do is: get the generated Tikz code and remove all the comments in the Tikz code, because one of the features of GPT-4 is that the generated code is very easy for humans to read, which is kind of interesting for a machine, but it adds a lot of comments and really guides you to understand its thinking.

So I'm going to delete all this information so that it doesn't know that this is called Draw Unicorn and there is no information about unicorns in it. I'm also going to make sure who knows if it's copying this from the web, I'm going to randomly perturb all the coordinates so it can't see, and then I'm going to delete the corners. I'm going to say, fix this code, I'm going to give the code back,

This is a new GPT-4 session, I gave the code back to it and said this Tikz code should draw a unicorn, but the horn is lost, can you add it back? So it has to really understand this code to do that, and that's what happens. Well, it is really able to find the head, you have to understand that this is not a simple problem. I mean, you have these three ovals, three elements, by the way, the head and the mane, it is not very good at drawing the mane. But it was really able to locate it.

Well, I don't want to talk too much about this unicorn example, but I would say that another very striking thing is that they have been training GPT-4 for a month after we were exposed to it in September. As they kept training it, I also kept asking my unicorn how it was on Tikz to see what would happen. So, it's always improving.

Of course, I left the best examples on my computer, maybe review them later. It continued to improve after that, but eventually after GPT-4 began to train for more safety, the unicorn's quality began to decline. So if you go home tonight and ask GPT-4 to let it draw a unicorn on Tikz, you might not get very good results, closer to ChatGPT.

As silly as it sounds, we often use this unicorn benchmark to measure intelligence and see how good your unicorn is. When we're tweaking security, we do focus on whether the unicorn is getting better, or if the security is too high, it will feel that the task is too dangerous to do. This is very useful.

I'm going to speed things up because I still have a lot more to tell you. You might still say that this visual ability is of little use. Actually, it's very, very useful. The reason is that GPT-4 is intelligent and understands you.

You can equate intelligence with understanding. Understanding means that it will follow your instructions. If you let it do something, it will do it the way you ask it to go.

Let me show you what that means. People are not convinced that this diffusion model is intelligent, and I think there is enough evidence that intelligence exists here, but anyway, people still don't believe it because it can't accurately understand the location of objects. For example, if you ask it to put a coffee cup next to a car, it may be placed randomly. So it doesn't really understand the photo. For example, this photo calls for a spoon above a cup, and you can see that it puts the spoon in the cup. So the effect is not good.

Let me show you the power of understanding. I'll ask a very weird question, but it can be useful in real life. For example, I asked GPT-4 to draw a screenshot of a 3D construction game with a river from left to right, a desert with pyramids below the river, a city with many tall buildings above the river, and four buttons at the bottom of the screen, green, blue, brown, and red. The question is casual, but maybe I'm making a video game and I want that effect.

If I let the diffusion model do the task, I get the result as follows: it looks good, but it's not what I asked for. First, some hallucination maps appeared in the upper left corner that I didn't ask for, and some things that resembled life symbols. Four buttons become two colorful buttons. It did do some work but didn't fully understand what I was asking for.

If you let the GPT-4 complete this task, you will get this result, which is exactly what you want to do, because it understands your instructions and performs them accurately.

Of course, you might say that this effect doesn't look good, but you don't have to stop there, you can use this as a sketch and enter it into the diffusion model so that you can get this effect. This works very well and exactly what you want. So I think that opens up a lot of possibilities for us.

Now, let's move on to this drawing ability, but in reality, it's more like a coding ability. Because after all, this drawing ability is really only one aspect of coding.

So, let's take a look at coding. Let's see what happens when you use a cocoding tool like GitHub Copilot, especially now that your Copilot is smart enough to understand what you mean.

Let's see how it would answer if I asked a rather tricky question: "Write a 3D game in HTML and JavaScript that includes the following elements: there are three spherical characters, the player can control one of them to move by pressing a key, there is an enemy trying to grab the player, and a defender trying to protect the player, standing between the enemy and the player." ”

You can understand that the defender is also an AI in some way. In addition, there are randomly generated obstacles.

I can get ChatGPT to do this task, and it gives the answer above.

First of all, which is already surprising, it gives me about 50 lines of code that can be compiled into a game. It's a game I can play where players can move green balls. Of course, the red ball didn't move, and I guess the blue ball was supposed to be the defender, but it didn't move either, and it wasn't 3D. It did some work but didn't really understand what I wanted and didn't follow my instructions precisely.

The GPT-4 does it like this: it's a real game and it's a lot of fun. You can move the dark blue ball, the red ball moves towards the dark blue ball, and the light blue ball is the defender, trying to stand between the red ball and the dark blue ball. In this video, I'm manipulating a dark blue ball. As you can see, the defenders did a great job blocking the red ball.

So, for us, coding has entered a new phase. In fact, Codex and GitHub Copilot implement code completion, which you can think of as autocompletion, providing short snippets of code.

ChatGPT has gone a step further and can write about 50 lines of code for you.

But GPT-4 can write 500 to 1000 lines of code and is perfectly ready to use immediately, without meta hints or anything else, all out of the box. So I think that's what you can achieve with a cocoding tool.

Here, I show two animations, the code generated by ChatGPT on the left and the code generated by GPT-4 on the right. Take a closer look and you'll see that the code generated by GPT-4 is more like an expert.

The key to this slide, however, is that both videos were generated by GPT-4. All I did was have GPT-4 generate a Python script that takes a text file as input and outputs a video with a continuous movement effect. For me, it definitely takes a long time to make these videos. The question is, who in this room can write a Python script to generate a video like this in a matter of hours? Probably only a few people. This is where GPT-4 is powerful, unlocking so much creativity.

I'll take a quick look at this slide. In past mock interviews, GPT-4 has passed interviews with Amazon and Google, not Microsoft (audience laughs).

In fact, not only did it pass these interviews, but in this particular one, it completed the two-hour interview task in just 3 minutes and 59 seconds. The reason it takes so long is because it copy-pastes between Playground and mock interview sites. Therefore, we can say that GPT-4 has reached the superhuman level in terms of coding.

Now let's talk about the availability of GPT-4 and the math. Although GPT-4 is very powerful, it still has many weaknesses. For example, it has no memory, it makes arithmetic mistakes, it answers incorrectly on some questions. However, GPT-4 is smart enough to use tools to solve these problems. For example, you can tell GPT-4: "You can use tools like search engines, calculators, APIs, etc." Use them if you need them. That way, when you ask it a question, it chooses the appropriate tool to answer the question.

But the point is, GPT-4 is smart enough to use the tools. You can tell it, "Hey, you know what? You can use search engines, you can use calculators, you can use this API (I'm just saying it's character functions), you can access all of these things. Use them if you need them. ”

So when someone asks, "Who is the President of the United States?" Instead of answering directly, it says, "Search," telling you that it needs to search for this information. When asked what the square root of this number is, it will answer: "calc". When asked what the specific letter of the word is, it says: "Character, comma 13." So, although I didn't tell it the number of letters it needed to enter after the comma, it automatically found the correct answer.

It may not be that impressive now, but it can do more complex tool collaboration.

For example, you can allow it to access your calendar and email. So, here I'm going to show you that this slide is 100% real, but I did it manually, and you can easily imagine automating the process.

What I say is, please schedule dinner with Joe and Luke at Contoso this week. It responded, "calendar.get_events.week" so it could find my events for the week in my calendar.

It then sent Joe an email asking him which night he was free. Then I fed the answer back to it, Joe said Tuesday and Wednesday nights were available, and Luke said any day Monday through Thursday was fine. My calendar shows that I already have a schedule for Monday and Tuesday.

It then reasoned based on the information I gave and concluded that Wednesday was the right day. So it sent Joe an email, added the event to his calendar, and sent a reservation to the restaurant. It's all done automatically. Then, it replied to me that it had booked dinner at 6 o'clock at Contoso.

Next, I will briefly summarize the mathematical problem, as it is a topic of great interest for many of us.

First of all, I'll say it's not very good at math, so don't worry, we still have work to do. But its performance is interesting. We can communicate with it as we would with students.

We've had master-level conversations with it, and it's pretty good at that level. But what I'm going to show you is a junior high school level problem. The problem is: In a year, the number of rabbits is first multiplied by a factor A, and on the last day B rabbits are adopted. Let's say there are x rabbits on the first day of the first year, and after three years, there are exactly 27x-26 rabbits. Find the values of a and b. This is a typical junior high school math problem.

GPT4 replied, "We can set the number of rabbits in the first year to y, then y=ax-b." After two years, the number of rabbits is ay-b, i.e. a(ax-b)-b. After three years, the number of rabbits is this equation, and we know that this equation is equal to 27x-26, so we can solve for a=3, b=2, and that's the answer we're looking for. "

So far, so good.

Again, you can't judge whether a student understands a question based on that alone. We move on and try more abstract ways to see if it really understands what it does.

Here, I want to not only ask a question with a clear answer, but also see if it really understands the concept and can modify the question itself.

Assuming I point out that this problem can be reformulated in a way about combinations, can you give this combination expression? It replies, "Yes, we can say that we are looking for values A and B such that if f(x) = Ax - B, then F(F(F(x))) = 27x - 26." ”

This is pretty good, it looks like it seems understood, let's move on.

"Suppose now f will be a quadratic polynomial. So, a polynomial with a nonzero x-squared coefficient, in this case, can you find such a function f? ”

As a human, you might think, OK, I want this combination to equal a linear function, i.e. a primary polynomial. However, three consecutive combinations will give an octave polynomial. An eighth-degree polynomial is not equal to a first-degree polynomial , so there is no such function. This is a very simple question, but let's see how GPT-4 handles it.

GPT-4: "If f is a quadratic polynomial, then f(x) can be written in this form." Then give this equation... ”

GPT-4 began to get a little disoriented, because it began to try to write down combinations of three times in a row, wrote a lot of things, said that this equation, that equation, began to write down eight equations, but it did not come up with an answer. But we didn't stop there, we said, "Wait, maybe here you can draw some conclusions without doing calculations, maybe you don't need to write everything down, it's not like it used to be." ”

Then, it says, "Ah, well, one thing I can notice is that if f is a quadratic polynomial, then the combination of cubic successives will be an eighth-degree polynomial, so there is no such function." ”

So here, you can see that the problem is subtle and not sure if it understands, or if it does not understand. I'm not sure, I'm really not sure.

Now there are also some weird things like arithmetic is still a little wobbly, I must say I don't fully understand, but I understand something on this slide.

Let's look at this, I'll give it a tip: 7 times 4 plus 8 times 8. Well, I don't know what this value is, but 8 times 8 is more than 60 and 7 times 4 is more than 20, so at least this value should be below 100.

But the answer it gives is 120, which is wrong. The point is, it doesn't stop there, but goes on to explain why it thinks the answer is 120. It calculates 7 times 4 plus 8 times 8 and then comes up with the correct answer 92.

"Wait a minute, what happened? You said at the beginning that the answer is 120, is it 120 or 92? ”

GPT4: "Oh, that's a typo, sorry." (Audience laughs)

All right...

There are a lot of great insights in this slide from which we can really understand everything that's going on.

First of all, regarding the answer 120, you need to understand that it must use an internal representation to complete this addition, which is slightly more difficult than answering the question immediately. Why would it answer right away? Because when you ask such a question, you write down the equation, and the most likely thing that will appear after the equal sign is a number, so it gives this number.

It tried to give what was most likely to appear after the equal sign, albeit failed. But the second most likely thing after that is for people to explain their rationality, their answers. So it tries to explain its answer, and crucially, it comes up with a different answer.

You have to understand, this is remarkable because, as far as I know, this is an attention-based Transformer model. When it says 7 by 4 plus 8 by 8 for the second time, its attention is very strongly directed to the answer 120. You have to understand that the 120 answer is now part of it, and for it, it might think that you're telling it, hey, you know what, 7 times 4 plus 8 times 8 is 120, and that's how it is from now on. This is probably part of my tip. Despite making mistakes at the beginning, the fact that it can get the right answer shows that it has been trained enough to overcome errors in the prompt, which is a very, very powerful attribute.

Of course, it's also very interesting when it says it's a spelling mistake, because it's clearly not a spelling mistake, which leads to many interesting topics like hallucinations. I wanted to set aside time to answer questions, so I don't want to explain this slide any more. But you really have to think deeply about this slide, and it contains a lot of information.

The last slide before getting to the conclusion is about the fact that it can't really be planned.

Of course, I was amazed at how many tasks I thought needed real planning. But let me take an example, let's move on to the 7 times 4 plus 8 by 8 problem.

"Okay, now that you have an equation equal to 92, let me ask an interesting question: Can you modify an integer to the left of the equation so that the answer becomes 106?"

As a human, your reasoning goes like this: OK, I want to get 106 on the right side, so I need to add 14. Ok, I need to increase by 14 and I can modify a number on the left. I look to the left and see a 7, and then I have this moment of "epiphany": "Ah, 14 is 7 times 2." "So, if it's 7 times 2, then I need to turn this 4 into 6. So that's what I'm talking about, we need to turn this 4 into 6. But as simple as this epiphany is, it comes out of some kind of planning, thinking about what needs to be done next.

However, GPT-4 cannot do this because it is a "next word prediction" device.

So, it will say, there are some possible ways to achieve this goal, and then it says, "I can modify an integer, I'm going to modify 7 to 9, and then 9 is multiplied by 4, which equals 106." "Wait, if I change 7 to 9 and add 8, that's 100, not 106. It then tries to explain why it's right to do so: 9 times 4 plus 8 times 8 equals 36 plus 64, and that's right. However, it says 106 again.

So you can see that here, it is not strong enough to overcome the initial mistake. What this means to me is that if it had been trained more, it might have corrected its mistakes. If it is more trained, perhaps it will understand that even when asking such a question, the most likely answer is a number, but if it is more trained, probably the best way to answer is to reason first.

So what I'm saying here is that with this silly example, I see that more training can give us more than we already have. What we have at the moment is already remarkable, but it's far from everything we can do with this technology. On the horizon, there's so much more to discover.

So, let's summarize: is GPT-4 intelligent?

Does this really matter? This is a very important question.

Again, whether GPT-4 has intelligence depends on your definition of intelligence, and I don't make a judgment. I think it's smart. As for my definition of intelligence, yes, it is intelligent. If you think it lacks memory and can't learn in real time, then it's not intelligent.

It can't be thought out several times in advance, it can't really plan. If that's your definition of intelligence, then it's not intelligent. But on the other hand, I think some of the examples I've given are very impressive and, perhaps more importantly, practical.

In my team, we use GPT-4 every day, and it has become part of our workflow. This fact in itself shows that whether you think it is intelligent or not, it will change the world.

Also, I would say that this may be an opportunity to rethink intelligence.

Although we have studied psychology and intelligence for decades, we have so far only one example of intelligence, and that is the natural intelligence that natural evolution has brought to us. But now, we have a new process that leads to something that seems to be intelligent.

Now that we have different examples, perhaps we can delve deeper into the nature of intelligence. The result of the study might be: "Yes, this new thing you shouldn't call intelligence because it doesn't have a certain characteristic." "This is a very likely conclusion. But more importantly, I think we can extract more from it.

GPT-4 is definitely not the end, but the beginning. This is the first model to show a truly intelligent flashpoint, but there's more to discover on the horizon.

So, what conclusions should we draw from this as universities, society, and human beings?

I mean, this is a very real problem that we need to face. Here I really want to say that as a society, to master this problem, we have to go beyond talking about whether it's copy-paste or not. We have to throw away this discussion, the train has left the station. If we continue to dwell on this issue, we will miss out on the issues that really matter. So, I think it's important to keep going.

Finally, I would also like to say that GPT-4 can do much more than what I have shown here.

It can do data analysis, you can give it data, it will analyze it for you. It can be used as a privacy detector. Its knowledge in the field of medicine and law is amazing.

Here, I would like to recommend a book published by Microsoft Research in which I have helped. The book's lead author is Peter Lee, along with Kerry Goldberg and Zako of Harvard University. This book is about how GPT-4 can be used in healthcare, titled The Artificial Intelligence Medical Revolution. It's a very complex subject, and I don't even want to say more because I can't do it fairly in one sentence. But in fact, GPT-4's knowledge in healthcare will make it a significant impact, and hopefully this impact will be positive. Of course, we need to think deeply about this.

In addition, GPT-4 can also play games and act as a gaming environment. It understands music – which is surprising again because it has never heard music, but it understands music. It can also do file management and much more.

Okay, I'll end here. Thank you.

Read GPT-4: The full text of the 10,000-word speech by the author of Microsoft's screen-brushing paper "The Spark of AGI" is here

Read on

Microsoft Open Source Deep Speed Chat: The era of ChatGPT for everyone is here

Musk does not talk about Wude: while publicly calling for the suspension of AI research, while secretly developing an "AI version of WeChat"?

Microsoft seeks to transform its digital advertising business with ChatGPT

Ten thousand layoffs turned around and embraced AI, and Meta was going to change its name again

Microsoft Google wants to reinvent the business with AI, Musk said that AI will destroy humanity... Talk about AI

Microsoft Azure OpenAI International Edition integrates ChatGPT and other five large model services

Samsung "backstabbed" Google

Musk threatened to sue Microsoft, saying it "illegally used Twitter data for AI training."

Keep up with Microsoft! Google's generative AI Bard can program and debug code bugs too

Bing chat improvement report: Correctly display math formulas to reduce abnormal ending of conversations

Gates: AI will disrupt education, but in the short term "there will be far more failures than successes"

"Red Sky Island" debut rollover was bombarded with bad reviews, and the president of Xbox apologized

GPT-4 Windows Fried Field! The whole system is a conversational robot, and Microsoft has built an AI universe

Game information: Microsoft is determined to win and settle with Sony Nintendo for mergers and acquisitions!

Sony Hong Kong service PS+ one, two and three levels of membership officially increased the price, and the national service annual membership has risen to 309 yuan

Microsoft today officially launched the XGP Core service: replacing the Gold membership and providing a mini-game library