Geoffrey Hinton, the "Father of Deep Learning": The architecture of GLOM, which can completely replicate human intelligent creative thinking, integrates philosophy into the design of engineering GLOM sounds philosophically sound philosophical. But will it succeed?

Last November, Geoffrey Hinton, the "father of deep learning," a computer scientist and cognitive psychologist, made a prediction. After half a century of experimentation—some of which were hugely successful—Hinton gained a deeper understanding of how the brain works and how to copy it into a computer.

During the COVID-19 pandemic, Hinton has been isolated in his home office in Toronto. "It's the best idea I can think of at the moment," he said. If this idea can be realized, then it may inspire the next generation of artificial neural networks.

The design of this mathematical computing system is inspired by the neurons and synapses of the brain, and it is also the core technology of today's artificial intelligence. As Hinton puts it, his "fundamental motivation" stemmed from curiosity. However, the actual motivation – the most desirable outcome – is to design more reliable and trustworthy AI systems.

Hinton, a Google researcher and co-founder of the Vector Artificial Intelligence Institute, wrote his thoughts intermittently and announced on Twitter in late February that he had published a 44-page paper on the arXiv preprint server.

In his disclaimer, Hinton wrote: "What this paper describes is not a system in real work, but a system in the conception. He called it "GLOM", and "GLOM" is derived from the word "gather" as well as the phrase "get together".

Hinton believes that GLOM can restore human perceptual systems in machines, and it provides new ways to process and present visual information in neural networks. Technically, the system is a collection of similar vectors.

Vectors are the basis of neural networks, and a vector is a set of digital arrays containing information. The simplest example is the xyz coordinate of a point, and three numbers can represent the position of a point in three-dimensional space.

The six-dimensional vector contains three other pieces of information, perhaps representing the red,green-blue values of the color of the point. In neural networks, vectors in thousands of dimensions represent the entire image or text. Hinton argues that when dealing with higher-dimensional problems, our brain activity involves "large vectors related to neural activity."

By analogy, Hinton likens the aggregation of similar vectors to the echo chamber effect—an effect that amplifies similar ideas. Hinton said, "For politics and society, the echo chamber effect is a complete disaster. But for neural networks, it's a good thing. ”

He calls neural networks that map the concept of echo chamber effects "islands of the same vector," or more understandable "consensus islands"—vectors point in the same direction when they agree on essential information.

Essentially, GLOM is also pursuing the elusive goal of simulating intuition. Hinton believed that intuition was essential for perceiving activity. He defined intuition as the ability of humans to easily make analogies. From small to large, we perceive the world through analogous reasoning and methods of mapping similar objects, ideas, or concepts to another, or, as Hinton puts it, from one large vector to another.

He also said: "The similarity of large vectors explains the way neural networks make intuitive analogical reasoning. On a broader level, intuition captures the indescribable way in which the human brain generates ideas. Hinton's work follows intuition and science, and everything is guided by intuition and analogy. His theories about how the brain works are all about intuition. Hinton said, "I've always been unwavering. ”

Hinton hopes that before AI can truly be flexible in solving problems — to think like humans, but also to understand things that have never been seen before, to find similarities from previous experiences, to ponder ideas, and to generalize, infer and understand — GLOM could be one of many technological breakthroughs.

"If neural networks are more like humans, then at least they can make mistakes like we do, so we can understand what exactly confuses them," Hinton said. ”

However, GLOM is just an idea for the time being. Hinton said, "It's a fog piece. He acknowledges that the ACRON is a perfect fit with "Jeffrey's Last Original Model." At the very least, this is Hinton's latest research.

<h1 class="pgc-h-arrow-right" data-track="31" > creative thinking</h1>

Hinton's passion for artificial neural networks, which was born in the mid-twenties, dates back to the early 1970s. By 1986, he had made great strides in the field: although the original network consisted of only a few layers of neurons responsible for input and output, Hinton and his colleagues proposed a more advanced multilayer network technique. However, it took 26 years for computing power and data capacity to catch up and take advantage of deep architectures.

In 2012, Sinton, who made a name for himself with a breakthrough in deep learning, worked with two students to develop a multilayer neural network that could be trained to recognize objects in large image datasets.

Neural networks learned to repeatedly improve classification and identify various objects — mites, mushrooms, scooters, and Malagasy cats, for example. The system exhibits unexpected precision.

Deep learning sparked the latest AI revolution and transformed the entire field of computer vision. Hinton believes that deep learning can almost completely replicate human intelligence.

Despite rapid developments in the field, major challenges remain. When faced with unfamiliar data sets or environments, neural networks appear relatively fragile and inflexible.

Self-driving cars and text language generators are impressive, but they can also go wrong. Ai-powered vision systems can also be confused: the system can recognize a coffee cup from a side view, but it can't recognize it from a top-down view without training; coupled with some pixel transformations, pandas can be mistaken for ostriches or even school buses.

GLOM solves two major challenges in the field of visual perception systems: recognizing the entire scene from the perspective of objects and their natural parts, and knowing things from a new perspective (GLOM focuses on vision, but Hinton hopes it can also be applied to the field of language).

Take Hinton's face as an example, where tired but energetic eyes, mouth, ears, and conspicuous nose are shrouded in a cleaner gray. As you can see from the conspicuous nose, you can easily recognize Hinton's picture even the first time you see him.

In Hinton's view, these two factors— the relationship of parts to the whole and perspective — are crucial to the human visual system. "If GLOM can work, it will perceive things more like humans than neural networks do today," he says. ”

However, for computers, integrating parts into the whole is a difficult problem, because sometimes the concept of parts is ambiguous. A circle may be an eye, a donut, or a wheel.

As Hinton explains, the first generation of AI vision systems primarily identified objects through the geometric relationship between parts and the whole—the spatial orientation between parts and parts and wholes.

Second-generation systems rely on deep learning—training large amounts of data using neural networks—and Hinton combines the strengths of the two approaches in GLOM.

Robust.AI founder and CEO Gary Marcus, a well-known critic of the strong reliance on deep learning behavior, said, "I love that humility. ”

Marcus praised Hinton's willingness to challenge himself to fame again and acknowledged that the approach didn't work very well. He said, "It's a brave idea. And the statement 'I'm trying to use creative thinking' is a very good corrective action. ”

<h1 class="pgc-h-arrow-right" data-track="31" > the architecture of GLOM</h1>

In building GLOM, Hinton tried to simulate some of the mental shortcuts humans use to perceive the world, such as intuitive strategies or heuristics. Nick Frost is a computer scientist at a Toronto language startup who also works with Hinton at Google Brain.

Frost said, "Most of the work of GLOM and Jeffrey is studying the heuristics that humans have and building neural networks that can learn heuristics, and then proving that neural networks are more suitable for visual analysis." ”

Through visual perception, one strategy is to analyze the parts of an object— such as different facial features— to understand the whole. If you see a particular nose, then you may recognize it coming from Hinton's face, which is a hierarchy of parts to the whole.

To build a better visual system, Hinton says, "I have a strong intuition that we need to use a hierarchy of parts to wholes." The human brain understands the composition of parts into the whole by building "analytic trees."

An analytic tree is a branch diagram that shows the hierarchical relationships between wholes, parts, and secondary parts. The face is at the very top of the parsing tree, while the eyes, nose, ears, and mouth are in the branches below.

One of Hinton's main goals with GLOM is to reproduce the analytic tree in neural networks, a feature that will set GLOM apart from previous neural networks. From a technical point of view, it is more difficult to build the system.

Frost said, "The reason it's so difficult to achieve is that everyone will parse each image in a unique parsing tree, and we want neural networks to do the same." For every new image the system sees, it's hard to use a neural network — a new structure — to parse a static structure like a tree. ”

Hinton has made a variety of attempts, and GLOM is a revised version of his attempt in 2017, combining other relevant advanced technologies in the field.

The approach to conceiving a GLOM architecture is generally this: the image of interest (say, a photograph of Hinton's face) is split by a grid. The area on the grid represents the "position" on the image – this location may include the iris, while another location may contain the tip of the nose.

Each location in the network has about five layers, or five levels. The system makes predictions layer by layer and uses vectors to represent content or information. Near the lower layers, the vector represents the position of the predicted tip of the nose: "I am part of the nose!" In the next level, by building a more logical visual representation, the vector might predict: "I'm part of the side face image!" ”

However, the question then arises: will neighboring vectors in the same hierarchy agree? When agreed, the vector points in the same direction: "Yes, we all belong to the same nose." "Or point to the analytic tree that follows." Yes, we all belong to the same face. ”

In the search for a consensus on the nature of an object— a precise definition of the object's finality— GLOM vectors are repeatedly, place-by-place, and layer-by-layer, evenly distributed with adjacent vectors adjacent to them, as well as those already predicted at the upper or lower levels.

However, Hinton said the network does not "arbitrarily divide" with nearby vectors. This is a selective average distribution, with adjacent predictions showing similarities.

He added, "It's very famous in the U.S. and is often referred to as the echo chamber effect. You will only accept the opinions of people you agree with. In this way, an echo chamber is formed, in which everyone holds the same view. In fact, GLOM actively uses this phenomenon. A similar phenomenon in the Hinton system is those of "consensus islands."

Frost said, "Imagine a group of people in a room loudly discussing similar ideas with some different points of view. Or think of them as vectors pointing roughly in the same direction." After a while, all the ideas converge into one. They will feel that the influence of the idea is getting stronger and stronger, because the idea has been confirmed by those around them. This is how GLOM vectors enhance and scale up collaborative predictions against an image.

GLOM uses these islands of consensus vectors to arrive at how analytic trees in neural networks work, and while some recent neural networks use consensus between vectors for activation, GLOM uses consensus to visualize the result—to build representations of things in the network.

For example, when multiple vectors all agree that they are part of the nose, small-scale identity vectors collectively represent the nose of the facial parsing tree in the network. Another smaller set of identity vectors may represent the mouth in the parsing tree, while the large set at the top of the parsing tree may represent a new result — the overall image is Hinton's face.

Hinton explains, "The analytical tree is presented here in such a way that the object hierarchy is represented as a large island, while the parts of the object are smaller islands, and the secondary parts are smaller islands, and so on downwards. ”

Geoffrey Hinton, the "Father of Deep Learning": The architecture of GLOM, which can completely replicate human intelligent creative thinking, integrates philosophy into the design of engineering GLOM sounds philosophically sound philosophical. But will it succeed?

In Hinton's diagram in glom paper, islands of the same vector (arrows of the same color) in each hierarchy represent an analytic tree (source: Hinton)

Joshua Bengio, a computer scientist at the University of Montreal who is an old friend and colleague of Hinton's, said that if GLOM can solve the engineering challenge of representing analytic trees in neural networks, it will be a great achievement, which is essential for neural networks to work properly.

He added, "Jeffrey has made many significant predictions in his career, many of which have been validated. So I'll keep an eye on these predictions, especially when Jeffrey is confident, like how he feels about GLOM right now. ”

Hinton's firm attitude comes not only from analogies to the echo chamber effect, but also from mathematical and biological analogies, which have inspired and demonstrated GLOM's new engineering design decision-making process.

Sue Baker, a computational cognitive neuroscientist at McMaster University who was a student at Hinton, said, "Jeffrey was an extremely unusual thinker who was able to develop his own theories by exploiting complex mathematical concepts and incorporating biological theories into them. Researchers who are limited to mathematical theories or neurobiological theories find it difficult to unravel the challenging problem of the principles of human-computer learning and thinking. ”

<h1 class="pgc-h-arrow-right" data-track="31" > integrate philosophy into engineering</h1>

So far, especially in those world-famous echo chambers, Hinton's new perspective has been widely accepted. He said, "On Twitter, I got a lot of likes. The tutorial on YouTube also claims to retain the one-time ownership of "MeGLOMania".

Hinton was the first to acknowledge the current GLOM with a slight philosophical musing, having taken a one-year undergraduate course in philosophy before switching to experimental psychology.

"If an idea sounds good philosophically, it's really good," it said. How can philosophical ideas that sound like garbage become reality? By philosophical measures, it cannot pass this criterion. ”

By contrast, he adds, "a lot of scientific stuff sounds like complete garbage," but they do well — neural networks, for example.

<h1 class="pgc-h-arrow-right" data-track="31" >gloM's design sounds philosophically sound. But will it succeed? </h1>

Chris Williams, a machine learning professor from the University of Edinburgh's School of Information Sciences, hopes GLOM will lead to a series of great innovations.

However, he says, "The thing that can distinguish AI from philosophy is that we can use computers to test these theories." With these experiments, it is impossible to identify defects in ideas or fix them. He added: "While I believe this study is promising, at the moment I don't think we have enough evidence to assess the true importance of an idea." ”

At Google Research in Toronto, where experiments with GLOM by some of Hinton's colleagues are in the early stages, Laura Karp, a software engineer using a new neural network architecture, is using computer simulation techniques to test whether GLOM can produce what Hinton calls an island of consensus when understanding the concept of parts and wholes of an object, even if the input part is ambiguous.

In the experiment, these parts were 10 ovals of different sizes that could form both a face and a sheep.

Figure | 10 ovals entered in the GLOM test model that form both sheep and faces (Source: Laura Karp)

By randomly entering one or another ellipse, Karp said, the model can make predictions about whether the ellipse belongs to a human face or a sheep, whether it is a leg of lamb or a sheep's head." Even if any interference is encountered, the model can be self-correcting.

The next step is to build a baseline that indicates whether a standard deep learning neural network will be confused by such a task. So far, GLOM has been heavily monitored — Karp has generated and labeled some of the data, prompting the model to make corrections over time to achieve the correct predictions. Hinton said the unsupervised version is called GLUM — "It's just a joke." ”

In this initial state, it is too early to draw any important conclusions. Karp is waiting for more data to come along. Still, Hinton has impressed the industry.

"The simple version of GLOM can observe 10 ellipses and can also identify a face and a sheep based on the spatial relationship between the ellipses," he says. This is a confusing question, because it is not visible from a single ellipse which object it belongs to, nor which part of the object it belongs to. ”

Overall, Hinton was pleased with the feedback results. "I just want to upload it to the community so that everyone who's interested can try it out or make some sub-combination of these ideas," he says. After that, philosophy will be transformed into science. ”