laitimes

Renormalization Swarm Meets Machine Learning: A Multi-scale Perspective Exploring the Intrinsic Unity of Complex Systems

author:Return
Renormalization Swarm Meets Machine Learning: A Multi-scale Perspective Exploring the Intrinsic Unity of Complex Systems

It is precisely because "we can't see the structure that is too small, and we can't see all the structures that are too large", so we need to use the method of reorganizing the group, constantly highlighting the important features of the system, and erasing the unimportant features, and eventually we will find that perhaps the whole world is composed of a finite number of islands, and each system will belong to an island, and nothing else. In this paper, we introduce the theory of renormalized swarms starting from the renormalization of the Ising model, then systematically sort out a series of studies on the combination of renormalization swarm and machine learning, and finally discuss the cutting-edge progress of multiscale dynamics modeling with the same path as renormalization swarm in exploring non-equilibrium dynamical systems, including causal emergence theory, eigenmicrostate theory, reinforcement learning world model, etc.

Written by | Tao Ruyi

Edit | Liang Jin

Renormalization groups play a very important role in the field of physics, especially in the fields of particle physics and statistical physics. Quoting from the University of California, San Diego, Professor You Yizhuang (E University) summarized the use scenario of renormalization groups: we can't see the structure that is too small, and we can't see the structure that is too large. Therefore, we need to reorganize the group to truncate or "roughen" the description of the system.

Regarding what a renormalization group is, this article mainly uses my learning path as an anchor to share my understanding of a renormalization group. At this stage, it is actually a phrase that is essentially a set of images describing the spatial dynamics of the system's parameters. Finally, we'll go back to the description of E University. So, let's start with the topic of kinetics.

01

What is a renormalization group

First, let's formalize a system. For a dynamical system, we can use such an equation to describe:

DX/dt = f(x, θ) (1)

where x is a variable of the system, while the forms of θ and f are usually fixed under a particular problem. For example, the differential equation of a spring oscillator is

Renormalization Swarm Meets Machine Learning: A Multi-scale Perspective Exploring the Intrinsic Unity of Complex Systems

Of course, it is written here in the form of dynamics because dynamic systems are more common to those of us who study complex systems. If the system we want to model is an equilibrium system, then we can describe the system in terms of a probability distribution p(x, θ). For the sake of convenience, we will focus on the more common dynamical system of complex systems, which does not hinder our understanding of the theory as a whole. In short, we can now describe a system in formal mathematics.

And then we introduce the concept of scale. What does it mean to introduce scale? In fact, to introduce the perspective of scale, it is as if we are looking at the system with different resolution eyes, for example, if we are using a lower resolution, then the system we see will be more "mushy", and a lot of detail will be lost, but we can still see that the system is moving. On this new scale, we can also describe it with a new set of kinetic equations, which can be written, for example

dy/dt = f'(y, θ') (equilibrium systems can be written as p'(y, θ') (2)

In this new form of description, the variables of the system can be different, the parameters can be different, and even the equation can be in a different form (from f to f'). In fact, the traditional approach in various fields is how to deal with the different scales of a system. We are very comfortable with every scale, and we have developed many very mature tools for analyzing differential and partial differential equations in various dynamics. Not to mention linear dynamics, nonlinear dynamics also has a wealth of analysis tools such as flow, chaos, attractors, fractals, etc. (recommended book: Stenven Strogatz's Nonlinear dynamics and chaos).

However, physicists are such a strange species, and they always like to do strange things in other ways. In the process of coarse-graining the system, they invented the method of reforming the group. The purpose of the renormalization swarm tool is actually to answer the following question: what happens to the system when we continue to coarse-grain, that is, what kind of relationship does the system exist between different scales?

In fact, continuing with the language above, the renormalization group needs to be modeled as the dynamics of the parameters of the system at different scales. namely

dθ/dl = g(θ, γ) (3)

There are several key points in the above formula, firstly, the independent variable has changed from the original time t to the scale l, and secondly, the dependent variable is θ, which is the parameter of the original system. This is to say that the equation models the parameters of the system as a function of scale. Of course, the premise of the above equation is that we assume that the kinetic forms f and f' at different scales can be written as substantially identical (the term "substantially identical" is subtle, and we will explain it below). This hypothesis may seem strong, but it is actually very reasonable in many physical systems.

Let's take the classical equilibrium state Ising model as an example (for a basic introduction to the Ising model, see Ising Model | Block coarse-graining can be approximate to the coupling coefficient of the control system (which can also be understood as temperature, which is directly related to temperature). So we'll have that classic image of dynamic ising renormalization (shown in Figure 1). The premise of this is that the Ising model of different scales is still an Ising model, and does not become something else, such as a Potts model (the variables of the Potts model are not binary). Under this premise, equation (3) describes a completely new dynamical system, which refers to the multi-scale perspective of the original system. By analyzing the system by using the dynamical system method, we can also obtain a series of quantitative laws of the system with respect to scale.

Renormalization Swarm Meets Machine Learning: A Multi-scale Perspective Exploring the Intrinsic Unity of Complex Systems

Figure 1. The real space of the 2D space of the Ising model is renormalized with an external field of 0. Each column represents a different reduction temperature. Each row represents the original system, the system after 1 renormalization and 2 renormalizations.

Since it is a dynamical system, it can be represented by a phase diagram, and Figure 2 is a phase diagram of the renormalization equation for Ising model 1d and 2d, respectively. There is no non-trivial fixed point in the case of 1d, and there is a non-trivial fixed point in the case of 2d. The fixed point of the parameter space means that the parameters of the system do not change in the process of scale change. This is actually a rather strange phenomenon, meaning that the process of coarse-graining does not essentially change the basic rules of interaction between elements within the system. This is only possible in two scenarios:

  • The correlation of the elements inside the system is either 0, i.e., there is no interaction;
  • Either there are infinite long-range correlations, so no matter how coarse-grained, we erase only inconsequential local information.

Therefore, the (non-trivial) fixed point of the renormalization equation corresponds to the phase transition point in the critical system. K in Figure 2 is the coupling coefficient of the system, which is a temperature-like parameter that determines the strength of the system's interaction. And with this method, we can also directly calculate the critical index of the system. With such a visual image, interested students can continue to dig deeper into the mathematical details of the renormalization group.

Recommended Materials:

Encyclopedia: Normalization of the Ising model https://wiki.swarma.org/index.php/Ising Normalization of the model

书籍:Complexity and Criticality、《边缘奇迹:相变与临界现象》

Renormalization Swarm Meets Machine Learning: A Multi-scale Perspective Exploring the Intrinsic Unity of Complex Systems

Figure 2. The left figure shows the renormalization phase diagrams (a) and (b) of the 1D Ising model, and the right figure shows the renormalization phase diagrams (a) and (b) of the 2D Ising model

Actually, there is another detail that is easy to overlook, but also crucial, that we mentioned in the discussion that the assumption of the renormalization group is that the dynamics f and f' at different scales are basically the same. In fact, in the example of the Ising model, the coupling strength at the microscopic scale is the first-order nearest neighbor interaction, and after renormalization once, there will be a higher-order interaction, and the change of the coupling strength K is not only a simple numerical change, but also involves the expansion of the dimension in a more accurate expression. Therefore, the phase diagram on the right side of Figure 2 is actually a dimensionality reduction of the higher-dimensional dynamical system. In other words, the coupling strength K here should actually be an infinite-dimensional vector, containing multiple types of interactions K=(K1, K2, K3,...) where K1 is the nearest neighbor interaction constant, K2 is the next nearest neighbor, K3 is the three neighbors, and so on. Its energy function is like this:

Renormalization Swarm Meets Machine Learning: A Multi-scale Perspective Exploring the Intrinsic Unity of Complex Systems

The Ising model we usually discuss has coefficients of 0 except for K1, so a more accurate description of the renormalization process of the original system would be shown in Figure 3. Figure 3 plots the three dimensions of the coupling coefficient, which varies in this space for each renormalization operation.

Renormalization Swarm Meets Machine Learning: A Multi-scale Perspective Exploring the Intrinsic Unity of Complex Systems

Figure 3. Schematic diagram in 3D parametric space. Usually when we talk about the Ising model, we mean that the two dimensions of K2 and K3 are zero. The position of the three fixed points in 3D space is marked. The one-dimensional phase diagram of Figure 2 is actually a projection of this space in the K1 dimension. The gray surface in the diagram is the Critical surface where all coupling constants are ∞

When we use the renormalization group to ignore the details of the system, we are pleasantly surprised to find that although the various types of systems in nature are very different, they only exhibit a limited number of types of renormalization behavior. For example, the phase transition behavior of the ferromagnetic system and the gas-liquid phase transition of water is very consistent. Based on this finding, people classify systems according to the different behaviors that different systems exhibit during the renormalization process, which is the concept of universality class. The proposal of universal classes has made a qualitative leap in our understanding of critical phase transition systems. This can go back to the sentence at the beginning: because "we can't see the structure that is too small, and we can't see the whole structure that is too large", we need to use the method of reorganizing the group, constantly highlighting the important features of the system, and erasing the unimportant features, and eventually we will find that the whole world is composed of a limited number of islands, and each system will belong to an island, and nothing else.

However, when we want to use the renormalization group theory to analyze specific systems, there are still some thresholds, such as we need to design a suitable renormalization strategy, and sometimes we don't actually know what principles should be followed to design this strategy, which is very dependent on the experience and even inspiration of scientists. Another example is whether it is possible to invent some methods to automate the entire computing process, so that machines can automatically calculate universal classes, so as to liberate the productivity of scientists and let scientists think about more important problems. As a result, a number of methods combining data-driven and renormalization group theory have emerged.

02

Renormalization Group and Machine Learning

Renormalization Group for Machine Learning

The combination of machine learning and renormalization swarms is a very cutting-edge but long-standing area of interest and has been discussed since PCA [1]. Moreover, there are many similarities between the deep structure and renormalization of deep learning in terms of form: the renormalization group extracts the key features of the system through continuous coarse-graining, and each layer of deep learning is also the process of extracting features, and the features of different layers also have the meaning of scale.

This connection is clearly pointed out for the first time in the paper[2], and an attempt is made to construct a neural network architecture based on the Restricted Boltzmann Machine (hereinafter referred to as RBM), and the block coarseness of the Ising model Kadanoff and the exact mapping of the neural network on the analysis are established, proving that the deep learning algorithm may indeed extract features from the data with a pattern similar to a renormalized stream. This is a great insight into how deep learning works.

[1] Bradde, Serena, and William Bialek. "Pca meets rg." Journal of statistical physics 167 (2017): 462-475.

[2] Mehta, Pankaj, and David J. Schwab. An exact mapping between the variational renormalization group and deep learning. arXiv preprint arXiv:1410.3831 (2014).

[3] It is a direct continuation of the previous article, albeit 6 years in between. This paper also uses the RBM structure to reproduce the renormalization flow of the Ising model, and even numerically finds various critical indices. This makes the correspondence between renormalization and machine learning clearer and more application-specific.

[3] Koch, Ellen De Mello, Robert De Mello Koch, and Ling Cheng. Is deep learning a renormalization group flow?. IEEE Access 8 (2020): 106487-106505.

Corresponding to this kind of exploration is another type of literature [4, 5], which directly investigates the characteristics of trained RBMs, discovers the so-called RG flow of RBMs, and concludes that the stable fixed point of RBMs is a non-trivel tipping point. This image is the opposite of the image described by the Ising model (the critical point of Ising is an unstable fixed point), and the principle behind it is well worth further investigation.

[4] Iso S, Shiba S, Yokoo S. Scale-invariant feature extraction of neural network and renormalization group flow. Phys Rev E. 2018; 97(5):1-16. doi:10.1103/PhysRevE.97.053304,

[5] Funai SS, Giataganas D. Thermodynamics and feature extraction by machine learning. Phys Rev Res. 2020; 2(3):1-11. doi:10.1103/PhysRevResearch.2.033415

Research such as these is very interesting, and the core of the question is how to better understand the representation of neural networks from the perspective of statistical physics. However, this kind of exploration is often limited to a specific neural network framework, which is usually called RBM, because this framework is almost designed to correspond to statistical physics. This is not particularly essential at the scientific level, either for solving specific problems or for the progress of the theory of statistical physics itself. It is as if physicists have repeatedly played RBM as a toy, although there have been attempts to calculate the critical index, but due to the structure of RBM, it has never been further applied in more complex systems, so there is still a distance from reality.

In fact, in addition to the traditional structure of RBM, CNNs, tensor networks, and even various generative models have the potential to be combined with RG. What we need to answer more often is what practical applications this combination can have, or what kind of help it can help to enhance the capabilities of neural networks. This leads to another type of article introduced below, which attempts to use RG theory as prior knowledge of neural network design, which can really be used to solve practical problems, or can really help physical theory.

Machine Learning for Renormalization Group

The most classic article of this kind is the work of renormalizing groups based on information theory published in Nature Physics in 18 years [6]. The motivation for the article is to reorganize the system by automatically constructing coarse-grained strategies in a data-driven manner. The means to achieve this goal is to constrain the system to have the greatest mutual information between the macroscopic variables and the "environmental variables" of the original system after coarse-graining, without any other prior knowledge. The "environment variables" here can understand other variables that do not participate in the renormalization of the current local real space block. Ultimately, the macroscopic variables learned by the neural network are the relevant variables. This is consistent with RG images, and the critical index can also be obtained in this way.

[6] Koch-Janusz M, Ringel Z. Mutual information, neural networks and the renormalization group. Nat Phys. 2018; 14(6):578-582. doi:10.1038/s41567-018-0081-4

The subsequent work published in PRX [7] rigorously deduced this coarse-grained analytical form based on information theory from the theoretical level, which provides a very valuable principle for the construction of renormalization strategies, so that physicists can hope to get rid of the limitation of relying on prior knowledge or inspiration to artificially design coarse-grained rules for systems.

[7]. Lenggenhager PM, Gökmen DE, Ringel Z, Huber SD, Koch-Janusz M. Optimal renormalization group transformation from information theory. Phys Rev X. 2020; 10(1):1-27. doi:10.1103/PhysRevX.10.011037

Furthermore, the work of Wang Lei and You Yizhuang is to propose the principle of minimizing holographic mutual information [8], that is, the mutual information between each discarded information is minimal, so that the relevant information of the system can also be retained, which is essentially an extension of the maximization of mutual information in the previous environment. And even more fancy, they introduced the Invertible Neural Network, which extended the renormalization process to a true swarm operation — rather than a traditional renormalization semi-swarm operation. As a result, the learned neural network essentially constitutes a generative model: it can not only extract key variables in a similar way to traditional renormalization, but also realize "reverse renormalization" to resample the configuration of the original scale. The benefit of this modeling is that each layer of the neural network's representation can be mapped to the actual physical meaning. The follow-up to this article is more about how much more explainable this framework can be applied to practical tasks [9] than traditional methods. No further theoretical analysis has been carried out at this time.

[8] Hu HY, Li SH, Wang L, You YZ. Machine learning holographic mapping by neural network renormalization group. Phys Rev Res. 2020; 2(2):23369. doi:10.1103/PhysRevResearch.2.023369

[9] Sheshmani A, You Y zhuang, Fu W, Azizi A. Categorical representation learning and RG flow operators for algorithmic classifiers. Mach Learn Sci Technol. 2023;4:20.

In addition, Prof. You Yizhuang's team has an interesting work [10], in which they designed a self-training framework that can automatically discover the universal classes of the system as long as the symmetry of the system is given without giving specific simulation data. The basic idea is to build a "fine-grained" model and a "coarse-grained" model, so that the coarse-grained model can generate a configuration similar to the fine-grained model as much as possible, which simulates the changes of the renormalization process system, and then uses the third model as a renormalization equation learner to learn the dynamics of the above two system parameters. When the three models are put together, the renormalization group equation for the symmetric system can be modeled, and the corresponding critical exponents can be modeled.

[10] Hou W, You YZ. Machine Learning Renormalization Group for Statistical Physics. arXiv Prepr. Published online 2023:1-13. http://arxiv.org/abs/2306.11054

03

Multiscale modeling of non-equilibrium systems

However, since the renormalization group theory itself has a lot of excellent analysis in equilibrium systems, what we have discussed so far is mainly about the normalization of equilibrium state models. This is partly due to the fact that the complexity of the problem has risen by more than one level for non-equilibrium systems, and there is still no good analytical technique to arrive at a unified theory similar to the equilibrium state. Complex systems, on the other hand, are more likely to be studied dynamically, which are often in non-equilibrium. In addition, in addition to the lack of a complete basic theory for non-equilibrium state analysis, there is no very thoroughly studied classical system like the Ising model as a toy model for scientists to play with, so the work related to renormalization groups is not as rich as that of equilibrium systems, let alone data-driven related renormalization modeling work.

However, multiscale dynamics modeling is flourishing in other areas. It has long been recognized that different scales of information do play different roles in both prediction and regulation of dynamical systems: small-scale high-frequency information is helpful for short-term predictions, while large-scale low-frequency information plays a key role in long-term modeling. From the Reduced-Order Model (ROM), to the Equation-free Model (EFM), to the current cutting-edge Causal Emergence Theory, these methods are all trying to reduce or simplify a dynamical system with their own principles.

The use of data-driven methods to reduce the dimensionality of power systems has also blossomed in the academic community in recent years. The intrinsic microstate method developed by Prof. Xiaosong Chen's team starts from the data, uses the singular value decomposition method to decompose the data of the physical system, and finds clear physical meanings in these models [11], which can not only solve the critical phase transition problem of classical equilibrium systems, but also make breakthroughs in many complex systems (including cluster systems, turbulence, climate, finance, quantum, etc.).

[11] Hu GK, Liu T, Liu MX, Chen W, Chen XS. Condensation of eigen microstate in statistical ensemble and phase transition. Sci China Physics, Mech Astron. 2019; 62(9). doi:10.1007/s11433-018-9353-x

A series of works [12,13] combine data-driven methods with the Koopman operator (a linearization of nonlinear dynamical systems, but it is difficult to calculate) to achieve dynamic mode decomposition or hidden space learning. Inspired by EFM, machine learning dimensionality reduction methods (such as VAE, etc.) are used to reduce the dimensionality of the system's variables and learn the dynamics directly in the hidden space—which is called effective dynamics—so as to achieve better prediction of the system [14].

[12] Khazaei H. A data–driven approximation of the koopman operator: extending dynamic mode decomposition. AIMS. 2016; X(0):1-33.

[13] Lusch B, Kutz JN, Brunton SL. Deep learning for universal linear embeddings of nonlinear dynamics. Nat Commun. 2018; 9(1). doi:10.1038/s41467-018-07210-0

[14] Vlachas PR, Arampatzis G, Uhler C, Koumoutsakos P. Multiscale simulations of complex systems by learning their effective dynamics. Nat Mach Intell. 2022; 4(4):359-366. doi:10.1038/s42256-022-00464-w

In addition, the concept of model-based learning in the field of reinforcement learning, such as opening up a new but very similar idea from its own worldview [15]: trying to represent the environment in which the agent interacts with it in a low-dimensional model, so as to improve the efficiency of prediction and control tasks, has also become a very cutting-edge topic in this field.

Renormalization Swarm Meets Machine Learning: A Multi-scale Perspective Exploring the Intrinsic Unity of Complex Systems

Figure 4. The information we use to predict is not the whole system and only requires a simplified representation

[15] Ha D. World Models. Published online 2018.

In conclusion, as we have more and more data available and the systems to be processed more and more complex, it has been found that it is important to introduce a multi-scale perspective to model dynamics, both to improve computational efficiency and to obtain key low-dimensional variables for the final task, and the idea of such methods is gradually becoming the mainstream for solving practical complex problems. The origins of these fields, which at first glance seem to have nothing to do with the reorganization group, have come to a similar problem in a different way. Even because there are no strong theoretical constraints, these methods sometimes seem very "simple and crude" when used, but they magically penetrate into various fields and become an important idea for solving problems in these fields. There are a lot of relevant materials in the AI+Science Reading Club for you to learn and reference, and the upcoming fifth season of the Cause and Effect Emergence Reading will further show you more cutting-edge work.

04

summary

From the perspective of dynamics, this paper introduces the basic idea of the renormalization group method in the form of images as much as possible, and introduces the cutting-edge work of combining machine learning and renormalization group in the second part. This kind of work is full of ingenuity, and does provide a very good solution for the broader and more intelligent application of renormalization group theory. In the third part, I listed the work of multiscale modeling of dynamics from a very rough perspective, showing the diversity and excitement of this field, and this kind of problem is essentially an exploration of multiscale solutions to problems in non-equilibrium dynamic systems, and indeed provides very important inspiration for solving practical problems. In addition, there is another area that is not mentioned in this article, but which is also very important for multiscale modeling of complex systems: the renormalization of complex networks. This kind of research is relatively more independent, focusing on how to represent a large complex network with a smaller network to solve complex computational problems on large networks, which is also a promising field.

However, whether it is dynamic multi-scale modeling or the renormalization of complex networks, although it has attracted the attention of scientists, there is no unified theory like the equilibrium state renormalization theory to help us truly understand the inherent unity of complex systems. But we can still believe that in the near future, we too will be able to discover where the islands of these complex systems lie.

About the Author

Tao Ruyi is a Ph.D. candidate in systems science at Beijing Normal University. His research interests include complex system modeling, focusing on multi-scale dynamics, scale law, renormalization, and deep learning.

This article is reprinted with permission from the WeChat public account "Jizhi Club".

Special Reminder

1. Enter the "Boutique Column" at the bottom menu of the "Huipu" WeChat official account to view a series of popular science articles on different themes.

2. "Back to Park" provides the function of searching for articles by month. Follow the official account and reply to the four-digit year + month, such as "1903", to get the article index in March 2019, and so on.

Read on