laitimes

Big Language Model Science: Emergence

author:Everybody is a product manager
When it comes to big language models, everyone can associate a prominent ability, that is, the ability to emerge. So what is the emergence capability? How do we understand emergence? This article analyzes this phenomenon, and hopes to help you with the emergence of popular science language models.
Big Language Model Science: Emergence
I like the pleasure of simplicity, which is the last refuge of complexity. - Oscar Wilder

When it comes to big language models, one of their outstanding abilities is often heard: emergence. So what is the emergence capability? Readers who want to gain insight into emergence capabilities are advised to read this article carefully.

Let's start with the emerging definition:

The properties exhibited by a system are very different from those exhibited by the simple addition of its constituent individuals, and the overall system behavior is called "emergent behavior".

Extract the keywords in the definition: system, individual, feature, simple addition, difference.

Looking at the definition alone, the meaning of emergence still seems to be like a flower in the fog, and it is not so real. Some people also use the phrase "quantitative change causes qualitative change" to explain emergence ability. That's true of reason, but it doesn't seem to help much to understand the emergence of clarity.

Emergence is a very important concept in complex science. Based on what I have learned, I am very sorry to tell you that with the current level of human knowledge and cognition, it is not possible to quantitatively explain the emergence phenomenon.

Complex science itself is too complex (otherwise why is it called complex science), and emergence phenomena are too common, and there are a large number of emerging phenomena in various fields of research such as information science, neurology, ecology, economics, sociology, etc.

So how to understand emergence?

Since we can't analyze it quantitatively, we can only analyze it qualitatively. The deductive method does not work, so let's try the induction method.

First, the emergence of bee colonies

A bee is a creature with a very simple nervous system.

Professor Frisch, a famous professor of zoology and Nobel Prize winner, discovered that bees can exchange information with each other through a dance called the "figure eight dance". When a baby bee finds food outside, such as a large sea of flowers in full bloom, it excitedly flies back to the hive and performs an enthusiastic dance for its companions. The flight path of this dance is like the Arabic numeral "8", which consists of a swing back and forth and a turning motion. Through the length and frequency of the dance, the bees can tell their companions exactly where and how far to eat. For example, the longer it wages its hips, the farther away the food is.

What's even more amazing is that after other bees see this dance, they can decode the information and then find the flower feast according to the address provided. This is nature's wonderful algorithm: each bee may not have high intelligence, but through this specific way of communication, the entire colony can exert powerful "collective intelligence".

Bees not only dance, but also have the ability to escape the heat and cold. For bees to reproduce, they must maintain a suitable temperature in a small hive. When the hive is too cold, the bees huddle tightly together and flutter their wings frantically to increase the temperature. When the hive is too hot, the bees spread out and flap their wings to cool the hive.

Interestingly, the critical point at which each bee starts to warm up or cool its wings depends on their genetic characteristics. That is to say, those bees that are genetically similar will feel cold when the temperature is below a certain point and gather together to "huddle for warmth". Similarly, when the temperature is higher than this point, they will also spread out because of the "hot panic", flap their wings to cool the hive.

To understand this phenomenon, we cannot simply think of bee colonies as swarms of individuals. In fact, a bee colony is a complex system, and each bee plays an important role in maintaining the stability of the system. Although each bee behaves differently, by coordinating with each other, they ultimately achieve the group goal of keeping the temperature of the hive within the optimal range.

This self-organizing collective intelligence is amazing. "Where can a bee withstand the cold wind", but when the bees gather together, it is enough to withstand the threat of changing temperatures.

Second, the emergence of ant colonies

There is another creature in nature that is very simple, but has a very strong group ability - ants.

Ants, although individual behavior seems to be driven purely reflexively by external conditions, that is, almost exclusively driven by external circumstances. But that doesn't mean they're simple "mobile machines." In fact, most of the behavior of ants can be described by a few simple rules. For example:

  • Clamp the target tightly with the large jaw;
  • Travel in the direction where the concentration of pheromones rises or falls (pheromones are the smells that ants use to encode information, such as "this road has food" or "this road is going to fight");
  • Dead ants secrete a hormone, and ants tell by smell whether a companion is alive.

Once ants encounter new environments not covered by these rules, they are at great risk. In an environment outside the rules, most ants, especially worker ants, struggle to survive for more than a few weeks.

However, it is precisely by relying on these simple rules of behavior that the ant colony exerts amazing wisdom. Each ant is like a microscopic decision-making unit, which coordinates and cooperates with each other to eventually converge into an efficient whole. Can complete very complex tasks, such as building huge ant nests, cooperative hunting, etc. The behavior of individual members of an ant colony and their interactions determine the behavior of the entire colony. As a colony, however, ant colonies show much more flexibility than their individual members can do. Ant colonies can sense and respond to food, foreign predators, floods, and many other phenomena over a wide geographical area. Ant colonies are able to extend their territory over long distances and change their surroundings in ways that benefit the colony. Ant colonies generally live orders of magnitude longer than the lifespans of their individual members.

The collective intelligence of this simple rule has made ants, tiny insects, one of the most successful social species on Earth. They flexibly applied basic rules to changes in the environment, evolved over a long evolutionary process, and eventually multiplied widely on Earth.

Looking at each individual bee/ant and analyzing its body structure and behavior, we can never imagine that the bee colony/ant colony can have such complex group behavior as described above. That is, the colony/ant colony has a collective intelligence that cannot be obtained by simply adding up the bee/ant individual, and this collective intelligence is an emergent ability.

The emergence of the game of life is:

Conway's Game of Life: In this game, systems operate in unison in a two-dimensional grid where each cell is either dead or alive. Its rules are as follows:

  • A "dead" cell with exactly three "living" neighbors will be "resurrected" and become a viable cell at the next stage, otherwise, it would still be dead.
  • Living cells with two or three lively neighbors can "survive" to the next stage; Otherwise, it disappears (either out of "loneliness" or due to "overcrowding").

Overall, an intermediate number of lives (neighbor life) in the system will continue (a positive feedback), while too many or too few neighbor lives will result in death (a negative feedback).

Through the simple rules described above, a series of global patterns that are significant in space and time can be produced in different initial states, and these global patterns will emerge from a series of simple micro-rules.

For example, a glider in the game of life is a layout configuration of living cells throughout the space. At each successive time step (from left to right), a series of living cells change based on simple, local rules of the game. After four time steps, the initial configuration of the viable cells reappeared, moving only one cell to the lower right. If the left part is not disturbed, the structure continues to "glide" through the entire space.

Big Language Model Science: Emergence

There are more magical life game modes, interested readers may wish to search for themselves and feel the charm of life game "emerged".

Third, the simple emergence of the present is exploration

We can also set some simpler rules for observing the current behavior.

Table 8.1 is a mapping that maps every possible input state to some output state. The first row of this rule table (state 0) states that if a subject and its two neighbors took action 0 in the previous action, then the subject will also take action 0 in the next period. The next line (state 1) indicates that if the last action taken by the target subject and its neighbor to the left was 0, the action taken by the neighbor on the right was 1, then the subject will take action 1, and so on.

Big Language Model Science: Emergence

Take 20 numbers adjacent to each other in a ring, that is, 20 numbers are connected end to end, so that each number has a neighbor left and right, and the next state can be determined according to the neighbor and its current state.

This simple rule leads to some interesting system behavior. As can be seen from Table 8.2, a consistent macro structure of "a downward triangle of 0s" emerges throughout the chart. The scale of these triangles goes far beyond the scale of the rules of behavior. Thus, even if individual behavior is determined solely on the basis of actions observed from three positions, the resulting consistent triangular structure contains far more than three locations (e.g., a triangle formed at step 12 whose base spans 13 of the 20 positions).

Big Language Model Science: Emergence

This is reminiscent of Adam Smith's invisible hand, in which the actions of the subjects in the system appear to be coordinated by some invisible force, creating patterns beyond any individual intention.

Fourth, the emergence of language models

The development of large language models has not been entirely smooth.

Looking back at the first 10 years of deep learning development, the performance improvement of models mainly depends on the transformation of network structure. Due to the scaling law of language models that "model size grows exponentially, performance only increases linearly," the researchers found that even the largest GPT-3 models performed as well as well-tuned small models when prompted. At the same time, the large network size greatly increases the amount of data required for training, training and inference costs.

Therefore, there was no need to take risks and invest a lot of resources to train a "behemoth".

Big Language Model Science: Emergence

However, as neural network design techniques mature, it has become difficult to achieve significant performance improvements by simply optimizing the network structure. In recent years, with the improvement of computer computing power and the expansion of the scale of data sets, researchers have begun to turn their attention to the expansion of model scale. The experimental results showed.

Once the model size reaches a certain "critical mass", its performance improvement will far exceed the proportional relationship, showing a qualitative change caused by quantitative change. In short, when the number of parameters of a model exceeds a certain threshold, it suddenly exhibits far more powerful capabilities than small models. This has led to the boom of large-scale pre-trained language models, especially in the field of natural language processing.

Big Language Model Science: Emergence

How quickly are the parameters scaled for large language models? Let's take a look at the number of parameters of large language models with netizen statistics. It is said that the parameter scale of GPT-4, the most powerful large-language model, has exceeded one trillion, and it has increased by more than 100 times in just 4 or 5 years.

Big Language Model Science: Emergence

Why are big language models so powerful? The reason for the essence lies in the parameters of hyperscale. Each neural network unit has simple, descriptive operation rules, but a large number of neural network units are connected together, and the capabilities of the neural network units and layers that make up it emerge.

So why is the scale of language models skyrocketing and receiving widespread attention from the industry and even society?

The author believes that one of the reasons is as follows: language is one of the most basic symbol systems of human beings. It is one of the main ways people transmit and communicate information. Language is not only a tool for people to communicate, but also the basis of people's cognition. Language drives cognitive development and change, influencing people's perception and understanding of themselves, society, and the world. Language can make people aware of differences in their own cognition, which in turn affects the use of language itself.

There are many studies that show that language is the basis for human understanding of the world. For example, psycholinguists and neurolinguists have found that the brain mechanisms of language understanding and production involve some basic cognitive processes and neural networks. These processes and networks are also used in cognitive tasks that have nothing to do with language, such as visual perception and decision making. In addition, developmental psychologists and cognitive scientists have also found that infants and toddlers understand the world through language, and do not rely on language to think and perceive like adults.

Therefore, it is not surprising that the big language model is a disruptive technology that is the basic way humans perceive the world, and it has received widespread attention and has great application prospects.

summary

Emergence is everywhere, and the incredible characteristics of living organisms, community organizations, science and technology, culture, civilization and other fields have emerged to make up the world around us.

In a clever and complex way, according to very simple principles, we can connect several simple modular units (resistors, capacitors, inductors, and transistors) with wires to produce complex products with miraculous power that can perform difficult tasks at lightning speed - electronic computers.

Behind conscious perception lies the delicate and complex brain activity involving billions of neurons, which only emerges after half a second. Consciousness is a systemic phenomenon, not a consequence of the sum of neural pathways and neurons in the brain.

The big language model based on information science and brain science has emerged human-like intelligence. Like emerging phenomena, with the current level of human knowledge, it is difficult to explain the mechanism of large language models, but it still does not prevent us from observing, summarizing, and applying large language models. Just as we don't understand why the brain gives us wisdom, we can still use our wisdom to solve problems.

I hope this article can help readers understand the big language model, thank you for reading!

Columnist

Always product Wang, WeChat public account: apmdogy, everyone is a product manager columnist. Logical product manager, committed to combining scientific thinking with product manager methodology. Focus on artificial intelligence, education field, good at product incubation, demand mining, project management, process management and other product skills.

This article was originally published by Everyone is a product manager, and reproduction without permission is prohibited.

The title image is from Unsplash and is based on the CC0 protocol.

The views of this article only represent the author himself, everyone is a product manager, the platform only provides information storage space services.

Read on