This article describes the new work taken over by NeurIPS21[1] exploring a completely new problem in a common setting. The author's team proposes a new learning method, based on the graph structure formed by the relationship between features and samples.

Using known feature representations (embedding) to extrapolate the representation of new features, the model can be generalized to the data containing the new feature without retraining.

The relevant problem definitions and methods are applicable to many specific applications.

NeurIPS 2021 | use graph neural networks to achieve extrapolation and generalization in an open environment

论文题目：Towards Open-World Feature Extrapolation: An Inductive Graph Learning Approach

Author:Qitian Wu, Chenxiao Yang, Junchi Yan (Shanghai Jiao Tong University)

Thesis link: https://arxiv.org/pdf/2110.04514.pdf

NeurIPS 2021 | use graph neural networks to achieve extrapolation and generalization in an open environment

I. Motivation:

Definition and importance of feature extrapolation problems

Most current machine learning tasks typically assume that training data shares a feature space with test data. In practice, however, the trained model often needs to interact with an open environment, and new features will emerge in the test set. For example, the recommendation system uses the user's age, occupation and other characteristics to train a recommendation model, and then the company released a new application and collected new user data, which requires the use of new user characteristics to make decisions.

The following figure gives an intuitive illustration, we consider that the training data is inconsistent with the feature dimension of the test data (the latter is the expansion of the former), in this case if we directly migrate the trained neural network to the test set, because the neurons corresponding to the new feature dimension are not trained, the test performance of the network will be greatly degraded, and it takes a lot of computational resources to re-train a neural network on the dataset containing the new features. This paper proposes a new learning method, based on the graph structure formed by the relationship between features and samples, using known feature representations (embedding) to extrapolate the representation of new features, and the model can be generalized to the data containing new features without retraining.

We represent the first data sample as a one-hot representation vector representing the first feature (a common representation of discrete features, continuous features can be discretized before being represented as one-hot vectors), where there are a common feature. The following is a mathematical definition of the open-world feature extrapolation problem:

Given the training data where , we need to train a classifier so that it can be generalized to the test data where , . Notice that here we assume:

1. The training set feature space is contained in the test set feature space, that is;

2. The training set shares the same output space as the test set.

2. Important observations

Solving the above problems directly is difficult because the training phase knows nothing about the additional feature information in the test data (including the number and distribution of features is not visible). However, this article makes two important observations that can lead to a reasonable solution.

First, the neural network classifier can be decomposed into two cascading submodules, namely the feature embedding layer and the classifier layer (as shown in Figure (a) below). The embedding layer here can be thought of as a feature epimediting dictionary, each line is a feature epimediting, it finds the corresponding embedding for each non-0 feature in the input vector, and then makes a sum pooling aggregation of all the returned features, and obtains the hidden vector of the middle layer for the feed-forward computation of the next classifier layer. The sum pooling operation here is permutation-invariant for feature dimensions[2]. The so-called permutation invariance means that when the position of each element of the input is exchanged, the output remains unchanged.

Second, if we stack all the input data (such as training samples) to form a matrix (the dimension is the number of samples and the number of features), it defines a sample-feature dichotomy. Each sample and each feature in the graph are nodes, and the contiguous edge is the sample's membership to the feature (that is, whether the sample contains the feature).

Combining the above two points, we can find that if we treat all the input data (or a batch data) as a graph, and then input the neural network, due to the displacement invariance of the network embedding layer, the network can handle it flexibly, no matter how many feature nodes the input graph contains. This means that we can properly modify the neural network to handle the expansion of the feature space.

3. Methods

The following is a model framework for feature extrapolation proposed in this article (the following figure shows the feed-forward process of the model). The entire model framework contains input data representations, a high-level graph neural network (GNN) model, and a low-level backbone model. The GNN model is used for message passing on a sample-feature dichotomy, inferring the embedding of new features through abstract information aggregation. This process simulates the thinking process of the human brain, that is, the understanding of new concepts from outside the familiar concept of knowledge. The backbone model is a normal classifier, but the parameters of the embedding layer will be replaced by the output of the GNN model.

The following figure shows two training strategies proposed for the above model. In Figure (a), we use self-supervised training, each time some features are masked, and then other features are used to infer the characteristics of the mask. In Figure (b), we use the inputive training method, sampling the characteristics of a part of the training set at a time, and only using this part of the features to give the prediction results. In addition, GNN and backbone use asynchronous updates, i.e. GNNs are updated again after each k-round update of the backbone.

Fourth, theoretical analysis

We have made a theoretical analysis of the proposed training method, mainly considering the expected difference between empirical risk loss (that is, model prediction error on some observation sets) and expected risk loss (that is, model prediction error on the overall data distribution) on the randomness of the algorithm. The upper bound of this difference can be given by the theorem that the upper bound of the generalization error is mainly related to the input feature dimension d and the number of feature combinations M that the sampling algorithm may produce.

5. Experimental results

On multi/dichotomical datasets, we consider the following evaluation criteria: randomly divide the data sample into 6:2:2 train/val/test sets, and then randomly select some observed features from all features; the model is trained on training data with only observed features, and okcuracy (multiclass) or ROC-AUC (binary classification) is calculated on test data with all features. Compare the following methods:

1. Base-NN is trained and tested with only observational features;

2. Oracle-NN: Training and testing using all features;

3. Average-NN/KNN-NN/Pooling-NN: Use avg pooling to aggregate all features embedding/KNN aggregate similar features embedding/mean pooling GCN aggregate adjacent features embedding to infer the embedding of new features;

4. INL-NN is first trained on the training data with only observed features until saturated and then updated locally on the new features.

On 6 small datasets, considering different proportions of observed features (from 30% to 80%), the comparison results are as follows (FATE in the figure is the method proposed in this paper).

In addition, we experimented with large-scale advertising datasets (millions of samples and features). Here we use dynamic time division: all samples are sorted in chronological order, and then divided into 10 copies, the first copy is the training data, the second copy is the verification set, and the third to tenth copies are used as the test set. This division naturally introduces new features that do not appear in the training set in the test set, and the ratio of new and old features is close to 1:1. We used DNN and DeepFM[3] as the backbone model and ROC-AUC as the evaluation index, respectively, and the results are as follows. Our method achieves optimal predictive performance.

For more experimental results, such as scalability test (the model's training computation time and memory consumption show a linear growth trend relative to the number of features and sample match size), ablation experiments, and feature visualization results can be found in our paper.

VI. More Explanations:

Why graph learning can help solve extrapolation problems

In fact, when we use the input data to represent samples and features on a graph, we can get sample-features and feature-feature relationships through graph structure. Here the feature-feature relationship is given by the sample as an intermediate node, that is, the second-order adjacent information on the graph. Based on this, the establishment of the graph provides us with a natural relationship between observed features and unenserved features. When the model is trained, we can get the representation of the observed features, and then for the new features introduced in the testing stage, we can use the graph structure to do information transmission, and the epimediding information of the observed features is calculated through the graph neural network to calculate the embedding of the new features, so as to achieve the extrapolation of the features.

7. Future outlook and summary

The biggest contribution of this paper is to define a completely new problem framework, that is, the extrapolation problem of feature space, and to illustrate that the neural network backbone model can be combined with graph neural networks to solve new features that emerge in the testing phase. Since the focus of this paper is to explore a new direction and adopt a more general setting, the research questions in this paper can be further expanded in the future. In addition, the problems and solutions studied in this paper can also be applied to many other areas and scenarios, such as dealing with new users in the recommendation system [4] (cold start users or users with a small sample).

bibliography

[1] Q. Wu, C. Yang, and J. Yan. Towards open-world feature extrapolation: An inductive graph learning approach. In NeurIPS 2021.

[2] M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Póczos, R. Salakhutdinov, and A. J. Smola. Deep sets. In NeurIPS 2017.

[3] H. Guo, R. Tang, Y. Ye, Z. Li, and X. He. Deepfm: A factorization-machine based neural network for CTR prediction. In IJCAI 2017.

[4] Q. Wu, H. Zhang, X. Gao, J. Yan, and H. Zha. Towards open-world recommendation: An inductive model-based collaborative filtering approach. In ICML 2021.

This article is from: public number [PaperWeekly]

Author: Wu Qitian

Illustration by Polina Orlova from icons8

-The End-

New this week!

Scan the code to watch!

About my "door"

▼

Shomen is a new venture capital firm focused on discovering, accelerating and investing in technology-driven startups, including Shomun Innovation Services, Shomun Technology Community and Shomun Venture Capital Fund.

Founded at the end of 2015, the founding team was built by the original team of Microsoft Venture Capital in China, and has selected and in-depth incubated 126 innovative technology-based startups for Microsoft.

If you are a start-up in the field of technology, you not only want to get investment, but also want to get a series of continuous, valuable post-investment services,

Welcome to send or recommend items to my "door":

⤵ One click to send you into techbeat happy planet

NeurIPS 2021 | use graph neural networks to achieve extrapolation and generalization in an open environment

3. Methods

Fourth, theoretical analysis

5. Experimental results

VI. More Explanations:

Why graph learning can help solve extrapolation problems

7. Future outlook and summary

Read on

In the second month of the baby's birth, Bao Mom complained that "February" was too difficult. Learn this knowledge and easily cope with various conditions in "February", including recognition, intestinal colic, eczema, and flatulence. freshman

Desilam, a little girl in Litang, Sichuan Province, suffers from severe scoliosis, which has reached a lateral bend angle of 120 degrees, which has compressed her lungs and even caused her to develop symptoms of lung failure. This is the case

Since South Koreans don't care about the deterioration of relations with China, why should we worry? Are you worried that the trade deficit with South Korea is too small? To put it in good conscience, it is you who should be worried, in a few more

I'm going to class again, which day is a head! Pioneered the modern health concept, scientific system and skill art of spinal nerve pain rehabilitation in the teaching of spinal rehabilitation techniques in rehabilitation therapy.

How terrible is depression? He said: "The first time I walked into a psychiatric clinic, the doctor called me three times before I reacted, and I didn't know anything

The Academic Exchange on Neuroimaging Research in China is for your reference

Luisa,hey. What? Louisa... What the? Sorry--Whatisyourproblem? Hey

Why does Valiva, who is so strong, still have to be fed medicine in Russia? Japanese media revealed key testimony that Kamila Valieva, a talented 15-year-old girl in Russia, was tested for banned drugs

#Rolex repair #repair maintenance of two pieces of table a Rolex black water ghost a piece of plum blossom 777 maintenance of a very fine maintenance, reflecting the spirit of the big country craftsman through the instrument test to fully reach the factory

#Meet the Season Essay #Lei Jiayin turned around and changed from poor boy to chairman! Even he himself is joking on Weibo: half an episode of the runaway, the return is already the president. I have to say, "The World" and

What to do with the large nerve bar of Ome

Can you still save money? Being able to break even is a luxury. Ms. Zhao of the unit has been pushing the ground for many years, and she can also be regarded as an ash-level employee of the unit, and her work experience in these years has also accumulated a lot of contacts and income

Self-discipline is the main tool for solving life's problems; avoidance of problems and avoidance of pain are the root causes of mental illness. Benjamin Franklin of the Beijing Happiness Foundation said: "Only pain can do it

When do aconitum bloom? Aconitum is a common flower plant in China and has high ornamental value. But the most ornamental time must be the time of flowering. When does aconitum bloom

The growth environment of aconitum aconitum prefers warm and humid climate, and chooses soil with sufficient sunshine, loose surface drainage and medium fertility as a good, strong adaptability, and avoids continuous cropping. Due to the appendages

Not surprisingly, Ukrainian female soldiers have gone to the battlefield, such a good-looking female soldier, I don't know what Russian male soldiers will think? Thanks to us living in China, there is no war, there is a group of lovely people in silence

Google's First Offline Retail Space: A User Experience Space under Neuroaesthetic Practice

#乌总统: The West has completely abandoned Ukraine #Why use force? Why do you want to shed blood and sacrifice? Why can't you tell the difference between being shot by other countries? The parties are fascinated bystanders. I

#是否已向俄罗斯提供武器? China responds #The world needs peace, and peace is the common aspiration of people all over the world. Although he did not experience the First and Second World Wars, he was bloodied and sacrificed

The best way to recover from stress and self-control is to relax. Relaxation, even if it is only for a few minutes, activates the parasympathetic nervous system and soothes the sympathetic nervous system, thereby increasing heart rate changes

#My hometown in Hulunbuir #Blue sky and white clouds, flowers and grasslands A harmonious beauty, harmony is the best combination, harmony is the unity of many factors, only harmonious things are beautiful, everything

A friend's family bought a villa in Changsha, Hunan Province. This villa is quite large, and the total price has cost more than 10 million. According to my friend, this villa was better when it was bought

The internal of the men's football team will "change the sky", the Chinese men's football giant was officially arrested, and on February 16, Beijing time, Tianjin officially released the news that Dong Wensheng, the former chairman of Tianjin TEDA Club, was punished according to law

Thank you China, Bach Chinese affectionate speech, and announced an important decision to the world, the US media blackface the Beijing Winter Olympics has come to a successful end, but also let many sports fans including the Olympic Games around the world

In the context of the epidemic, what is the safest and hygienic biometric technology?

When it comes to carbon neutrality, people always think of high-carbon industries such as energy and electricity, transportation, manufacturing, and construction, while the first impression of agriculture is that it is naturally "environmentally friendly" and "ecological"

Abstract! The Romantic Laurence Amelie of the Swiss painter is a famous Swiss abstract painter Gerard

【Professional Development of Class Teachers: Thinking and Practice of Moral Education Curriculum Development】At 1:30 p.m. on February 25, our school held the first class teacher training meeting of the new semester in the conference room on the fifth floor, Li Jianbo

Yuantan Town, Tanghe County: Rural human settlement environment improvement and afforestation go hand in hand

"Student Story" Yumiti Yomel: Every experience is the driving force for my progress "I was the first time I left home so far away to go to an internship in a strange city, tired and crazy at work