laitimes

Bayesian Deep Learning: A framework for unifying deep learning and probability graph models

Bayesian Deep Learning: A framework for unifying deep learning and probability graph models

Author | Wang Hao

Organize | Victor

Advances in artificial intelligence (AI) show that significant performance gains can be achieved by building multi-layered deep networks and learning with large amounts of data. But these advances are basically taking place in perceptual tasks, for which the traditional AI paradigm needs to be extended.

On April 9, Wang Hao, an assistant professor in the Department of Computer Science at Rutgers University, shared a Bayesian-based probability framework that can unify deep learning and probability graph models, as well as unify AI perception and reasoning tasks, at the AI TIME Young Scientists - AI 2000 Scholars Forum.

According to reports, the framework has two modules: the depth module, represented by a probabilistic depth model, and the graph module, that is, the probability graph model. The depth module handles high-dimensional signals, and the graph module handles the task of partial inference.

The following is the full text of the speech, AI technology reviews have been sorted out without changing the original meaning:

Today I would like to share with you the work on Bayesian deep learning, the theme is the probability framework we have been studying, hoping to use it to unify deep learning and probability graph models, as well as unify AI perception and reasoning tasks.

As we all know, AI technology under the blessing of deep learning has a certain visual ability, which can recognize objects; the ability to read, can understand text; the ability to hear, can be speech recognition. But there is still some lack of thinking ability.

"Thinking" corresponds to the task of inference, specifically its ability to handle complex relationships, including conditional probability relationships or causal relationships.

Deep learning is suitable for dealing with perceptual tasks, but "thinking" involves high-level intelligence, such as decision data analysis and logical reasoning. Probability graphs have advantages in dealing with inference tasks because they represent complex relationships between variables very naturally.

Bayesian Deep Learning: A framework for unifying deep learning and probability graph models

As shown in the figure above, an example of an overview diagram. The task is to infer the probability of the grass being wet by the current sprinklers on or off on the grass and the weather outside, and also to push back the weather by the grass being wet. The disadvantage of probability plots is that they cannot efficiently process high-dimensional data.

Bayesian Deep Learning: A framework for unifying deep learning and probability graph models

To sum up, deep learning is better at perceptual tasks, not good at reasoning and inference tasks, and probability graph models are good at reasoning tasks, but not good at perceptual tasks.

Unfortunately, in real life, these two types of tasks generally appear at the same time and interact with each other. Therefore, we hope to be able to unify the probability graph of deep learning into a single framework, hoping to achieve the best of both worlds.

Bayesian Deep Learning: A framework for unifying deep learning and probability graph models

The framework we propose is Bayesian deep learning. There are two modules: the depth module, represented by a probabilistic depth model, and the graph module, the probability graph model. The depth module handles high-dimensional signals, and the graph module handles the task of partial inference.

It is worth mentioning that the graph module is essentially a probabilistic model, so in order to ensure that it can be fused, the depth model is also probabilistic. Model training can be done using classical algorithms, such as MAP, MCMC, VI.

Bayesian Deep Learning: A framework for unifying deep learning and probability graph models

To give specific examples, in the field of medical diagnosis, the depth module can be imagined as the doctor's medical image of the patient, and the graph module is the doctor's judgment and reasoning of the disease in the brain based on the image. From the doctor's point of view, the physiological signals in the medical image are the basis of reasoning, and excellent ability can deepen his understanding of the medical image.

Bayesian Deep Learning: A framework for unifying deep learning and probability graph models

By extension, in the movie recommendation system, the depth module can be imagined as an understanding of the video plot, actors, etc. of the movie, while the graph module needs to model the similarity between user preferences and movie preferences. Further, video content understanding and "preference" modeling are also complementary.

Bayesian Deep Learning: A framework for unifying deep learning and probability graph models

Specific to the model details, we divide the variables of the probability graph model into three categories: depth variables, which belong to the depth module, assuming that they are generated from a relatively simple probability distribution; graph variables, which belong to the graph module, are not directly connected to the depth module, assuming that it comes from a relatively complex distribution; and the pivot variable, which belongs to the depth module and the interrelated part of the graph module.

Here's how the framework works in practice.

Recommend the system

The basic assumption of the recommendation system is that the user's preferences for certain movies are known, and then you want to predict the user's preferences for other movies.

Bayesian Deep Learning: A framework for unifying deep learning and probability graph models

Users' love of movies can be written as a rating matrix, which is very sparse and used for direct modeling, and the accuracy obtained is very low. In the recommendation system, we rely on more information, such as movie plot, film director, and actor information for auxiliary modeling.

In order to model the content information and effectively purify it, there are three ways to choose from: manually establish features, deep learning to automatically establish features, and deep learning adaptively establish features. Obviously, the adaptive way can achieve the best results.

Unfortunately, the independent homoduction hypothesis inherent in deep learning is fatal to recommendation systems. Because it is obviously wrong to assume that there is no correlation between the user and the user.

Bayesian Deep Learning: A framework for unifying deep learning and probability graph models

In order to solve the above difficulties, we have launched collaborative deep learning, which can promote "independent" to "non-independent". The model has two challenges:

1. How to find a valid probabilistic depth model as a depth module. It is hoped that the model will be compatible with the graph module and have the same effect as the non-probabilistic module.

2. How to connect the depth module to the main module for effective modeling.

Bayesian Deep Learning: A framework for unifying deep learning and probability graph models

Let's look at the first challenge. The autoencoder is a very simple deep learning model that is generally used to extract features in an unsupervised manner, and the output of the middle layer is represented as text. It is worth mentioning that the representation of the middle layer is deterministic, it is not probabilistic, and the graph module is incompatible and cannot work.

Bayesian Deep Learning: A framework for unifying deep learning and probability graph models

We propose a probabilistic autoencoder, the difference being that the output is transformed from a "definite vector" to a "Gaussian distribution". Probabilistic autoencoders can degenerate into standard autoencoders, so the latter is a special case of the former.

Bayesian Deep Learning: A framework for unifying deep learning and probability graph models

How do I relate the Depth Module to the Diagram Module? First, the hidden vector of the item j is proposed from the Gaussian distribution:

Then, from the Gaussian distribution, the hidden vector of user i is extracted:

Based on these two hidden vectors, the distribution of the user i to the item j can be sampled from another Gaussian distribution, and the mean of the Gaussian distribution is the inner product of the two hidden vectors.

Bayesian Deep Learning: A framework for unifying deep learning and probability graph models

The blue box in the image above represents the diagram module. Defines the conditional probability relationship between items, users, ratings, and so on. Once there is a conditional probability relationship, the hidden vectors of users and objects can be inverted by scoring, and unknown backgrounds can be predicted according to the "internal product".

Bayesian Deep Learning: A framework for unifying deep learning and probability graph models

The figure above is a diagram of the entire model, where λ is the hyperparameter that controls the variance of the Gaussian distribution. To measure the model's effectiveness, we used three datasets: citeulike-a, citeulike-t, and Netflix. For citeulike, the title and abstract of each paper are used, and Netflix is using the movie plot introduction as content information.

The experimental results are shown in the figure below, Recall@M indicators indicate that our method greatly exceeds the benchmark model. When the scoring matrix is more sparse, the performance improvement of our model can be even greater. The reason is that the more sparse the matrix, the more dependent the model will be on the content information and the representation extracted from the content.

Bayesian Deep Learning: A framework for unifying deep learning and probability graph models

Recommendation system performance improvements can boost corporate profits, and according to a McKinsey & Company survey, 35 percent of Amazon's turnover is generated by a recommendation system. This means that for every 1% increase in the referral system, there will be a $620 million increase in turnover.

Bayesian Deep Learning: A framework for unifying deep learning and probability graph models

To summarize, so far, we have proposed the probabilistic deep model as a depth module of the Bayesian deep learning framework, and the non-probabilistic depth model is actually a special case of the probabilistic depth model. A hierarchical Bayesian model is proposed for the depth recommendation system, and experiments show that the system can greatly recommend the efficiency of the system.

Other application designs

Bayesian Deep Learning: A framework for unifying deep learning and probability graph models

Given a graph, we know the edges and understand the contents of the nodes. If this figure is a social network, it is actually a representation of the friend relationship between users, and the node content is the picture or text that the user pastes on the social platform. This graph relationship can also indicate the title, abstract, citation, and so on of the paper.

Bayesian Deep Learning: A framework for unifying deep learning and probability graph models

Our task is to hope that the model can learn the expression of the node, that is, to capture the content information, and to capture the information of the graph.

The solution is a probabilistic autoencoder based on a Bayesian deep learning framework designed. The deep module is specifically responsible for processing the content of each node, after all, deep learning can be advantageous in processing high-dimensional information; the graph module handles the relationships between nodes and nodes, such as reference networks and complex relationships in knowledge graphs.

Bayesian Deep Learning: A framework for unifying deep learning and probability graph models

In the medical field, we focus on medical monitoring. The task scenario is: there is a small radar at home that emits signals, and the designed model hopes to find out whether the patient is taking medication on time and whether the order of medication is correct according to the signal reflected from the patient. The problem is that the steps to administer the medication are very complex and need to be sorted out.

Based on the Bayesian deep learning probabilistic framework method, the deep module is used to process very high-dimensional signal information, and the graph module is used to model the medical know-how.

It is worth mentioning that even for the same model for different applications, the parameters in it have different learning methods, for example, the parameter distribution can be directly learned using MAP and Bayesian methods.

For deep neural networks, once there is a parameter distribution, a lot of things can be done, such as estimating the uncertainty of the prediction. In addition, if you can get the parameter distribution, even if the data is insufficient, you can get very robust predictions. At the same time, the model will be more powerful, after all, the Bayesian model is equivalent to the sampling of countless models.

The lightweight Bayesian learning methods given below can be used on any deep learning model or any deep neural network.

Bayesian Deep Learning: A framework for unifying deep learning and probability graph models

The goal is clear first: the method is efficient enough to learn through backward propagation and "discard" the sampling process, while the model conforms intuitively.

Our key idea is to think of neural network neurons and parameters as distributions, rather than simply points or vectors in high-dimensional space. Allows neural networks to propagate forward and backward during learning. Because the distribution is expressed as a natural parameter, the method is named NPN (natural-parameter networks).

#参考文献:

A survey on Bayesian deep learning. Hao Wang, Dit-Yan Yeung. ACM Computing Surveys (CSUR), 2020. Towards Bayesian deep learning: a framework and some existing methods. Hao Wang, Dit-Yan Yeung. IEEE Transactions on Knowledge and DataEngineering (TKDE), 2016.

Collaborative deep learning for recommender systems. Hao Wang, Naiyan Wang, Dit-Yan Yeung. Twenty-First ACM SIGKDD Conference on

Knowledge Discovery and Data Mining (KDD), 2015.

Collaborative recurrent autoencoder: recommend while learning to fill in the blanks. Hao Wang, Xingjian Shi, Dit-Yan Yeung. Thirtieth Annual

Conference on Neural Information Processing Systems (NIPS), 2016.:

Natural parameter networks: a class of probabilistic neural networks. Hao Wang, Xingjian Shi, Dit-Yan Yeung. Thirtieth Annual Conference on

Neural Information Processing Systems (NIPS), 2016.

Relational stacked denoising autoencoder for tag recommendation. Hao Wang, Xingjian Shi, Dit-Yan Yeung. Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI), 2015.

Relational deep learning: A deep latent variable model for link prediction.

Hao Wang, Xingjian Shi, Dit-Yan Yeung. Thirty-First AAAI Conference on Artificial Intelligence (AAAI), 2017.

Bidirectional inference networks: A class of deep Bayesian networks for health profiling.

Hao Wang, Chengzhi Mao, Hao He, Mingmin Zhao, Tommi S. Jaakkola, Dina Katabi. Thirty-Third AAAI Conference on Artificial Intelligence (AAAI),

2019.

Deep learning for precipitation nowcasting: A benchmark and a new model. Xingjian Shi, Zhihan Gao, Leonard Lausen, Hao Wang, Dit-Yan Yeung,

Wai-kin Wong, and Wang-chun Woo. Thirty-First Annual Conference on Neural Information Processing Systems (NIPS), 2017.

Convolutional LSTM network: A machine learning approach for precipitation nowcasting. Xingjian Shi, Zhourong Chen, Hao Wang, Dit-Yan Yeung,

Wai-kin Wong, Wang-chun Woo. Twenty-Ninth Annual Conference on Neural Information Processing Systems (NIPS), 2015.

Continuously indexed domain adaptation. Hao Wang*, Hao He*, Dina Katabi. Thirty-Seventh International Conference on Machine Learning (ICML),

2020.

Deep graph random process for relational-thinking-based speech recognition. Hengguan Huang, Fuzhao Xue, Hao Wang, Ye Wang. Thirty-

Seventh International Conference on Machine Learning (ICML), 2020.

STRODE: Stochastic boundary ordinary differential equation. Hengguan Huang, Hongfu Liu, Hao Wang, Chang Xiao, Ye Wang. Thirty-Eighth

International Conference on Machine Learning (ICML), 2021.

Delving into deep imbalanced regression. Yuzhe Yang, Kaiwen Zha, Yingcong Chen, Hao Wang, Dina Katabi. Thirty-Eighth International Conference

on Machine Learning (ICML), 2021.

Adversarial attacks are reversible with natural supervision. Chengzhi Mao, Mia Chiquier, Hao Wang, Junfeng Yang, Carl Vondrick. International

Conference on Computer Vision (ICCV), 2021.

Assessment of medication self-administration using artificial intelligence. Mingmin Zhao*, Kreshnik Hoti*, Hao Wang, Aniruddh, Raghu, Dina

Side by side. Nature Medicine, 2021.

Bayesian Deep Learning: A framework for unifying deep learning and probability graph models

Leifeng NetworkLeifeng Network

Read on