Bayesian Deep Learning: A framework for unifying deep learning and probability graph models

Author | Wang Hao

Organize | Victor

Advances in artificial intelligence (AI) show that significant performance gains can be achieved by building multi-layered deep networks and learning with large amounts of data. But these advances are basically taking place in perceptual tasks, for which the traditional AI paradigm needs to be extended.

On April 9, Wang Hao, an assistant professor in the Department of Computer Science at Rutgers University, shared a Bayesian-based probability framework that can unify deep learning and probability graph models, as well as unify AI perception and reasoning tasks, at the AI TIME Young Scientists - AI 2000 Scholars Forum.

According to reports, the framework has two modules: the depth module, represented by a probabilistic depth model, and the graph module, that is, the probability graph model. The depth module handles high-dimensional signals, and the graph module handles the task of partial inference.

The following is the full text of the speech, AI technology reviews have been sorted out without changing the original meaning:

Today I would like to share with you the work on Bayesian deep learning, the theme is the probability framework we have been studying, hoping to use it to unify deep learning and probability graph models, as well as unify AI perception and reasoning tasks.

As we all know, AI technology under the blessing of deep learning has a certain visual ability, which can recognize objects; the ability to read, can understand text; the ability to hear, can be speech recognition. But there is still some lack of thinking ability.

"Thinking" corresponds to the task of inference, specifically its ability to handle complex relationships, including conditional probability relationships or causal relationships.

Deep learning is suitable for dealing with perceptual tasks, but "thinking" involves high-level intelligence, such as decision data analysis and logical reasoning. Probability graphs have advantages in dealing with inference tasks because they represent complex relationships between variables very naturally.

As shown in the figure above, an example of an overview diagram. The task is to infer the probability of the grass being wet by the current sprinklers on or off on the grass and the weather outside, and also to push back the weather by the grass being wet. The disadvantage of probability plots is that they cannot efficiently process high-dimensional data.

To sum up, deep learning is better at perceptual tasks, not good at reasoning and inference tasks, and probability graph models are good at reasoning tasks, but not good at perceptual tasks.

Unfortunately, in real life, these two types of tasks generally appear at the same time and interact with each other. Therefore, we hope to be able to unify the probability graph of deep learning into a single framework, hoping to achieve the best of both worlds.

The framework we propose is Bayesian deep learning. There are two modules: the depth module, represented by a probabilistic depth model, and the graph module, the probability graph model. The depth module handles high-dimensional signals, and the graph module handles the task of partial inference.

It is worth mentioning that the graph module is essentially a probabilistic model, so in order to ensure that it can be fused, the depth model is also probabilistic. Model training can be done using classical algorithms, such as MAP, MCMC, VI.

To give specific examples, in the field of medical diagnosis, the depth module can be imagined as the doctor's medical image of the patient, and the graph module is the doctor's judgment and reasoning of the disease in the brain based on the image. From the doctor's point of view, the physiological signals in the medical image are the basis of reasoning, and excellent ability can deepen his understanding of the medical image.

By extension, in the movie recommendation system, the depth module can be imagined as an understanding of the video plot, actors, etc. of the movie, while the graph module needs to model the similarity between user preferences and movie preferences. Further, video content understanding and "preference" modeling are also complementary.

Specific to the model details, we divide the variables of the probability graph model into three categories: depth variables, which belong to the depth module, assuming that they are generated from a relatively simple probability distribution; graph variables, which belong to the graph module, are not directly connected to the depth module, assuming that it comes from a relatively complex distribution; and the pivot variable, which belongs to the depth module and the interrelated part of the graph module.

Here's how the framework works in practice.

Recommend the system

The basic assumption of the recommendation system is that the user's preferences for certain movies are known, and then you want to predict the user's preferences for other movies.

Users' love of movies can be written as a rating matrix, which is very sparse and used for direct modeling, and the accuracy obtained is very low. In the recommendation system, we rely on more information, such as movie plot, film director, and actor information for auxiliary modeling.

In order to model the content information and effectively purify it, there are three ways to choose from: manually establish features, deep learning to automatically establish features, and deep learning adaptively establish features. Obviously, the adaptive way can achieve the best results.

Unfortunately, the independent homoduction hypothesis inherent in deep learning is fatal to recommendation systems. Because it is obviously wrong to assume that there is no correlation between the user and the user.

In order to solve the above difficulties, we have launched collaborative deep learning, which can promote "independent" to "non-independent". The model has two challenges:

1. How to find a valid probabilistic depth model as a depth module. It is hoped that the model will be compatible with the graph module and have the same effect as the non-probabilistic module.

2. How to connect the depth module to the main module for effective modeling.

Let's look at the first challenge. The autoencoder is a very simple deep learning model that is generally used to extract features in an unsupervised manner, and the output of the middle layer is represented as text. It is worth mentioning that the representation of the middle layer is deterministic, it is not probabilistic, and the graph module is incompatible and cannot work.

We propose a probabilistic autoencoder, the difference being that the output is transformed from a "definite vector" to a "Gaussian distribution". Probabilistic autoencoders can degenerate into standard autoencoders, so the latter is a special case of the former.

How do I relate the Depth Module to the Diagram Module? First, the hidden vector of the item j is proposed from the Gaussian distribution:

Then, from the Gaussian distribution, the hidden vector of user i is extracted:

Based on these two hidden vectors, the distribution of the user i to the item j can be sampled from another Gaussian distribution, and the mean of the Gaussian distribution is the inner product of the two hidden vectors.

The blue box in the image above represents the diagram module. Defines the conditional probability relationship between items, users, ratings, and so on. Once there is a conditional probability relationship, the hidden vectors of users and objects can be inverted by scoring, and unknown backgrounds can be predicted according to the "internal product".

The figure above is a diagram of the entire model, where λ is the hyperparameter that controls the variance of the Gaussian distribution. To measure the model's effectiveness, we used three datasets: citeulike-a, citeulike-t, and Netflix. For citeulike, the title and abstract of each paper are used, and Netflix is using the movie plot introduction as content information.

The experimental results are shown in the figure below, Recall@M indicators indicate that our method greatly exceeds the benchmark model. When the scoring matrix is more sparse, the performance improvement of our model can be even greater. The reason is that the more sparse the matrix, the more dependent the model will be on the content information and the representation extracted from the content.

Recommendation system performance improvements can boost corporate profits, and according to a McKinsey & Company survey, 35 percent of Amazon's turnover is generated by a recommendation system. This means that for every 1% increase in the referral system, there will be a $620 million increase in turnover.

To summarize, so far, we have proposed the probabilistic deep model as a depth module of the Bayesian deep learning framework, and the non-probabilistic depth model is actually a special case of the probabilistic depth model. A hierarchical Bayesian model is proposed for the depth recommendation system, and experiments show that the system can greatly recommend the efficiency of the system.

Other application designs

Given a graph, we know the edges and understand the contents of the nodes. If this figure is a social network, it is actually a representation of the friend relationship between users, and the node content is the picture or text that the user pastes on the social platform. This graph relationship can also indicate the title, abstract, citation, and so on of the paper.

Our task is to hope that the model can learn the expression of the node, that is, to capture the content information, and to capture the information of the graph.

The solution is a probabilistic autoencoder based on a Bayesian deep learning framework designed. The deep module is specifically responsible for processing the content of each node, after all, deep learning can be advantageous in processing high-dimensional information; the graph module handles the relationships between nodes and nodes, such as reference networks and complex relationships in knowledge graphs.

In the medical field, we focus on medical monitoring. The task scenario is: there is a small radar at home that emits signals, and the designed model hopes to find out whether the patient is taking medication on time and whether the order of medication is correct according to the signal reflected from the patient. The problem is that the steps to administer the medication are very complex and need to be sorted out.

Based on the Bayesian deep learning probabilistic framework method, the deep module is used to process very high-dimensional signal information, and the graph module is used to model the medical know-how.

It is worth mentioning that even for the same model for different applications, the parameters in it have different learning methods, for example, the parameter distribution can be directly learned using MAP and Bayesian methods.

For deep neural networks, once there is a parameter distribution, a lot of things can be done, such as estimating the uncertainty of the prediction. In addition, if you can get the parameter distribution, even if the data is insufficient, you can get very robust predictions. At the same time, the model will be more powerful, after all, the Bayesian model is equivalent to the sampling of countless models.

The lightweight Bayesian learning methods given below can be used on any deep learning model or any deep neural network.

The goal is clear first: the method is efficient enough to learn through backward propagation and "discard" the sampling process, while the model conforms intuitively.

Our key idea is to think of neural network neurons and parameters as distributions, rather than simply points or vectors in high-dimensional space. Allows neural networks to propagate forward and backward during learning. Because the distribution is expressed as a natural parameter, the method is named NPN (natural-parameter networks).

#参考文献:

A survey on Bayesian deep learning. Hao Wang, Dit-Yan Yeung. ACM Computing Surveys (CSUR), 2020. Towards Bayesian deep learning: a framework and some existing methods. Hao Wang, Dit-Yan Yeung. IEEE Transactions on Knowledge and DataEngineering (TKDE), 2016.

Collaborative deep learning for recommender systems. Hao Wang, Naiyan Wang, Dit-Yan Yeung. Twenty-First ACM SIGKDD Conference on

Knowledge Discovery and Data Mining (KDD), 2015.

Collaborative recurrent autoencoder: recommend while learning to fill in the blanks. Hao Wang, Xingjian Shi, Dit-Yan Yeung. Thirtieth Annual

Conference on Neural Information Processing Systems (NIPS), 2016.：

Natural parameter networks: a class of probabilistic neural networks. Hao Wang, Xingjian Shi, Dit-Yan Yeung. Thirtieth Annual Conference on

Neural Information Processing Systems (NIPS), 2016.

Relational stacked denoising autoencoder for tag recommendation. Hao Wang, Xingjian Shi, Dit-Yan Yeung. Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI), 2015.

Relational deep learning: A deep latent variable model for link prediction.

Hao Wang, Xingjian Shi, Dit-Yan Yeung. Thirty-First AAAI Conference on Artificial Intelligence (AAAI), 2017.

Bidirectional inference networks: A class of deep Bayesian networks for health profiling.

Hao Wang, Chengzhi Mao, Hao He, Mingmin Zhao, Tommi S. Jaakkola, Dina Katabi. Thirty-Third AAAI Conference on Artificial Intelligence (AAAI),

2019.

Deep learning for precipitation nowcasting: A benchmark and a new model. Xingjian Shi, Zhihan Gao, Leonard Lausen, Hao Wang, Dit-Yan Yeung,

Wai-kin Wong, and Wang-chun Woo. Thirty-First Annual Conference on Neural Information Processing Systems (NIPS), 2017.

Convolutional LSTM network: A machine learning approach for precipitation nowcasting. Xingjian Shi, Zhourong Chen, Hao Wang, Dit-Yan Yeung,

Wai-kin Wong, Wang-chun Woo. Twenty-Ninth Annual Conference on Neural Information Processing Systems (NIPS), 2015.

Continuously indexed domain adaptation. Hao Wang*, Hao He*, Dina Katabi. Thirty-Seventh International Conference on Machine Learning (ICML),

2020.

Deep graph random process for relational-thinking-based speech recognition. Hengguan Huang, Fuzhao Xue, Hao Wang, Ye Wang. Thirty-

Seventh International Conference on Machine Learning (ICML), 2020.

STRODE: Stochastic boundary ordinary differential equation. Hengguan Huang, Hongfu Liu, Hao Wang, Chang Xiao, Ye Wang. Thirty-Eighth

International Conference on Machine Learning (ICML), 2021.

Delving into deep imbalanced regression. Yuzhe Yang, Kaiwen Zha, Yingcong Chen, Hao Wang, Dina Katabi. Thirty-Eighth International Conference

on Machine Learning (ICML), 2021.

Adversarial attacks are reversible with natural supervision. Chengzhi Mao, Mia Chiquier, Hao Wang, Junfeng Yang, Carl Vondrick. International

Conference on Computer Vision (ICCV), 2021.

Assessment of medication self-administration using artificial intelligence. Mingmin Zhao*, Kreshnik Hoti*, Hao Wang, Aniruddh, Raghu, Dina

Side by side. Nature Medicine, 2021.

Leifeng NetworkLeifeng Network

Bayesian Deep Learning: A framework for unifying deep learning and probability graph models

Read on

Hyperparameter tuning Heber, combo optimizer CompBO, Huawei Noah open source Bayesian optimization library

How reliable is the theoretical basis for the alchemy of machine learning?

Building a Modern Deep Learning Framework from Scratch (TinyDL-0.01)

How to configure the threshold of similarity when doing a semantic similarity query? When performing a semantic similarity query, configuring the threshold of similarity is an important step, which determines which texts

Deep Learning "Four Masterpieces"

Computer Graduation Project: Hanging Tutor, PySpark + Hadoop, Flight Delay Prediction, Flight Visualization, Ticket Visualization, Ticket Crawler, Flight Big Data, Machine Learning, Deep Learning, Human

CPU, GPU, TPU, NPU !️are several different types of processors, each with its own advantages and disadvantages

I sat on the podium to supervise, and slowly most of the students entered the state of study. I stood slowly

Exploring the Future World: Application and Principle Analysis of Deep Learning in Autonomous Driving

Deep Learning Basics: Explanation of Some Common Terms in Complete Sets of Electrical Appliances (Recommended Collection)

To predict the fragment spectrum of intact glycopeptides, Zhejiang University developed a deep learning method DeepGlyco

The Stanford team has developed a new deep learning model that can predict surface displacement caused by carbon capture

Wang Ziqi's private clothes are recommended for good-looking boys to learn deeply!

Deep Thinking: Is the bigger the visual deep learning model, the better?

Southern Surveying and Mapping Recommendation | Liu Li: Stope information extraction from Weining Beishan open-pit mine by combining deep learning and object-oriented analysis

【Technology】End-to-end large model of automobiles: deep learning of driving rules by AI

A review of multimodal deep learning!

Schneider Electric and NVIDIA are redefining AI data center design for superior performance and performance

How is AI reshaping the automotive industry?

Researchers propose a new concept of artificial intelligence that allows large language models to interact with the real physical world

Preschool Education|Dong Xinran: Promoting Children's Deep Learning in Game Workshops: A Case Study of "Pengcheng Food Street".

孙远钊 | 论人工智能生成内容应否享有著作权

The fifth issue of the monthly salon "Artificial Intelligence and Business Change" was successfully held

The Apocalypse of "Dongge AI Clone": The Beginning of Artificial Intelligence Reshaping Retail

Artificial intelligence, Internet of Things, big data and other information technologies are widely used - plugging in "digital wings" for the comprehensive revitalization of rural areas

Meta released the latest artificial intelligence model Llama 3, Tesla has launched a compensation plan for global layoffs, Musk apologizes, and Huawei's Pura 70 series is sold out in a minute|Geeks already knew

How state-owned enterprises can develop artificial intelligence

Microsoft showcases VASA-1 AI model that can turn photos into "talking faces"

Society News丨Review of the high-level report of the 2023 Chinese Artificial Intelligence Industry Annual Conference

Eight Chinese universities are on the list, and Tsinghua University ranks first

Build and co-prosper Chinese artificial intelligence system | CCF C³

The national new generation of artificial intelligence open innovation platform for medical imaging has been completed, connecting the whole link from scientific research to clinical practice

DAI 2024 Call for Papers: Agent Day, dozens of conference-level conference reports, Jim Fan will be in attendance!

"Hashrate coupons" are on fire, and artificial intelligence is "winning"?