Why is it important to pay attention to implicit feedback? What are the solutions? Example demo postscript reference

What is implicit feedback?

The core data required for a recommendation system is user feedback. The recommendation system without feedback belongs to the first-degree disability, which belongs to the disabled joint management and is not responsible here.

Because there is no feedback on:

There is no continuously optimized label data

There are no real data evaluating the effect

In a word: no feedback data, no data loop, not transparent, the metabolism of the product is problematic.

Note that the user feedback we are talking about is not the "feedback" that the user calls the customer service, but the preference for the item.

Feedback data comes in many forms: click to view, favorite, watch (listen) to finish, add to cart, pay, rate, give a review...

There are two forms of feedback: implicit feedback and explicit feedback.

Among the feedback methods listed above, the rating is explicit feedback, because the user knows that they are expressing their attitude and gives a very clear attitude in a more quantitative way. Other feedback is the natural behavior left by the user when using the product, the user leaves this data, the purpose is not to tell his preferences, but we can "guess" from it to the user's preferences, which is implicit feedback.

Implicit feedback has the following characteristics over explicit feedback:

The number will be much larger than the explicit feedback, and the matrix will be denser and more stable;

Naturally occurring in users, often more realistically and comprehensively reflecting their attitudes;

Closer to product metrics such as per capita playtime (video), click-through rate (news);

Xavier Amatriain (Vice President of Engineering at Quora) also shared the idea of valuing implicit feedback at acm resys 2016. See the ten experiences of the recommended system for veteran drivers.

Is there anything wrong with only implicit feedback?

It seems that the implicit feedback is good, so what is the problem to use it? Let me elaborate.

Looking back at history, the Netflix Prize Million Dollar Recommendation Algorithm Contest has spawned many excellent algorithms, especially matrix decomposition algorithms based on pseudo-svd, which perform well in predictive scoring. However, when actually building the recommendation system, I found that there was a little embarrassment: there was no rating function in the product, so there was no rating data. This makes people very helpless, so many good hammers, but can not find the nail, must be the problem of the nail, there have been some oolong events, a recommendation algorithm competition released the game data, the user "like" the operation of the movie are assigned to 4 points, hard to structure into the scoring data, is simply to use the ready-made hammer, first use the iron sand palm to cut the nail.

The matrix decomposition algorithm designed for scoring data is similar to the regression problem in supervised learning, and the implicit feedback data is usually 0-1 binary data, ideally it should be used as a classification problem that "predicts whether the user will produce some kind of implicit feedback on the item", and the classification problem requires a negative sample, and obviously the negative sample of implicit feedback is usually not so obvious. This is one of the problems.

Recommended scenarios are more recommended lists, concerned with sorting, whether it is "guess what you like" comprehensive list recommendations, or item-2-item related recommendations, or feed stream recommendations, care about sorting more than care about a single item prediction error, so using rmse to evaluate the implicit feedback recommendation model, it seems to be somewhat unsatisfactory, this is the second problem.

Explicit feedback is so scarce, and matrix decomposition algorithms such as svg are designed for explicit feedback, is it not only possible to construct scoring data? No, the wisdom of the people is infinite, and the problem of implicit feedback exists, and various solutions emerge.

There are two main approaches to matrix decomposition for implicit feedback:

Point wise class: Or predict the user's preference for a single item, and finally sort it according to preference.

pair wise class: For the same user, directly consider the order of any two items who is in front and who is behind.

The point wise class, represented by the paper collaborative filtering for implicit feedback datasets[1], has excellent implementations of this:

There are three ways to construct a negative sample:

The missing value of the random sampling matrix, i.e. the item that is not consumed by the user as a negative sample;

Sample an item from a popular site that is not fed back by the user as a negative sample;

Modeling with one-class class classifiers (such as scikit-learn oneclasssvm) identifies negative samples in missing values (actually treated as outliers by oneclassmodels)

The pair wise class, represented by /bpr: bayesian personalized ranking from implicit feedback/this paper[2], for which excellent implementations are:

The idea of the second category is to solve the order of the best items more directly. The following assumptions are made when constructing samples:

User feedback (whether explicitly or implicitly) item indicates that the user has consumed item. Compared with the unfeated item, it is more likely to be favored by the user.

The training sample for this method is no longer a single project, but an item pair, which is, after all, a pairwise algorithm.

Where the positive sample is such a pair:

The item with feedback precedes the item that is not fed back

Negative samples are paired in reverse:

An item without feedback precedes a site with feedback

There is feedback between the items, and between the feedback items that do not participate in the construction of the sample.

Here's a topic off-topic question, why is Quora using a stand-alone version internally? In this regard, xavier Amatriain, quora's vice president of engineering, once expressed that do not talk lightly about distribution, and single-machine computation will be faster [3].

The following is a public dataset to demonstrate how to use such implicit feedback data to make recommendations and evaluate their effects.

We used last.fm open record of user listening to songs as a dataset for demonstration [4]. This dataset is summarized below:

File name: usersha1-artmbid-artname-plays.tsv

File Contents:

359,347 users

107,373 (no mbdid) and 186,642 (with mbdid) singers

17,559,530 listening relationships and number of listens

Split the training and testing sets, noting that they are randomly selected from each user's listening history:

We trained the model with the Pointwise method and the paywise method, respectively.

BPR model (implemented using lightfm)

Wals model (implemented using impricit)

Because of the concern for the sorting effect, so the use of auc as our evaluation index, and usually concerned about the overall auc is different, here care about each user's sorting effect, according to the user one by one to calculate auc, and finally look at the average and standard deviation.

Lightfm comes with its own AUC evaluation, while Implicit's AUC evaluation needs to be implemented by itself. First of all, a negative sample is constructed specifically for the evaluation of the ALS model, the method is to sample those sitems that have not been fed back by the user from the popular sites, and it is a weighted sample, the higher the popularity, the higher the probability of sampling.

Start by implementing a weighted sampling with an exponential distribution.

There are many ways to implement weighted sampling, the simplest and most reasonable is to use exponential distribution to achieve. There are also many uses for weighted sampling, such as: 1. K are randomly selected for recall based on the user's tag weight weighting 2. I want the recommendation result to be output not exactly according to the score level, but to have a certain change.

Then count the popular items.

The auc is then calculated by generating a test sample for each test user.

Put the training and evaluation of als together:

Run both models with the same copy of data and the result is as follows:

You can see that the BPR model is slightly better at optimizing AUC.

Well, at this point, we have some understanding of how to make recommendations with implicit feedback, knowing that in addition to the matrix decomposition of the traditional point wise class, there is also a method of direct optimization of auc by pair wise. The next natural question comes out of your mind and mine, the model is trained, how to use it?

Here is a simple way to write a recommendation interface, enter the userid, list the singers he has heard in history, and output a list of recommended singers and scores, and see the visual effect.

Two simple interfaces are implemented: recommend and similar_items. The former gives users personalized recommendations, while the latter is used for relevant recommendations. Simply test these two interfaces:

To recommend the user 173031:

The user listens to the historical output as follows:

[('周杰倫', 2122.0), ('陳奕迅', 1181.0), ('glenn gould', 673.0), ('楊丞琳', 536.0), ('wolfgang amadeus mozart', 492.0), ('aly & aj', 452.0), ('secondhand serenade', 425.0), ('avril lavigne', 424.0), ('s.h.e', 417.0), ('johann sebastian bach', 304.0), ('張敬軒', 291.0), ('kevin kern', 223.0), ('倉木麻衣', 223.0), ('張韶涵', 203.0), ('mitsuko uchida', 197.0), ('鄧麗欣', 194.0), ('x japan', 193.0), ('academy st. martins in the fields', 180.0), ('garnet crow', 160.0), ('michelle branch', 153.0), ('troy and gabriella', 150.0), ('jesse mccartney', 148.0), ('céline dion', 144.0), ('gil shaham and goran sollscher', 143.0), ('hide', 136.0), ('westlife', 136.0), ('孫燕姿', 128.0), ('jason mraz', 126.0), ('andy mckee', 123.0), ('oku hanako', 122.0), ('high school musical 2', 113.0), ('zard', 113.0), ('sara bareilles', 101.0), ('backstreet boys', 99.0), ('the corrs', 99.0), ('kelly sweet', 98.0), ('三枝夕夏 in db', 97.0), ('glay', 92.0), ('david garrett', 90.0), ('タイナカサチ', 87.0), ('james blunt', 84.0), ("the st. philips boy's choir", 84.0), ('david archuleta', 82.0), ('linkin park', 81.0), ('coldplay', 69.0), ("b'z", 68.0), ('nelly furtado', 65.0), ('sigiswald kuijken', 62.0)]

It can be seen that the user has a wide range of listening interests, Including Hong Kong and Taiwan, Japan, Europe and the United States.

See which singers the bpr model recommended to him:

{'item': , 'items': [('josh groban', 2.541165828704834), ('michael w. smith', 2.5229959487915039), ('hillsong', 2.4939250946044922), ('宇多田ヒカル', 2.3995833396911621), ('hayley westenra', 2.369992733001709), ('angela aki', 2.3347458839416504), ('casting crowns', 2.3013358116149902), ('hillsong united', 2.2782249450683594), ('boa', 2.2732632160186768), ('steven curtis chapman', 2.2700684070587158), ('rebecca st. james', 2.2616958618164062), ('kokia', 2.2322888374328613), ('barlowgirl', 2. 2148218154907227), ('do as infinity', 2.213282585144043), ('f.i.r.', 2.1844463348388672), ('corrinne may', 2.1044127941131592), ('chris tomlin', 2.0857734680175781), ('celtic woman', 2.0772864818572998), ('depapepe', 2.0735812187194824), ('ayaka', 2.0611968040466309)], 'user': }

Mainly in Europe and the United States, there are Japan, there are also Taiwan's f.i.r.

The als output is as follows:

{'item': , 'items': [('taylor swift', 1.186539712906344), ('frédéric chopin', 1.1863112931343351), ('colbie caillat', 1.0978262200222491), ('jonas brothers', 1.0811056577548976), ('bruno coulais', 1.0621494330528296), ('boa', 1.0488158944365353), ('bryan adams', 1.0277970165423405), ('daniel powter', 1.0220579888574337), ('yann tiersen', 1.0143708728515772), ('f.i.r.', 1.012305501265341), ('yiruma', 1.003527371248728), ('hilary duff', 0.99238201965409678), ('mandy moore', 0.97962069749414749), ('natasha bedingfield', 0.97652199359550906), ('simple plan', 0.95801904189552334), ('daughtry', 0.95771285384471316), ('bz', 0.93582329901255945), ('mariah carey', 0.93090378201808766), ('angela aki', 0.92184519678959209), ('claude debussy', 0.9190989429349703)], 'user': }

It is also mainly in Europe and the United States, and there are also Taiwan's f.i.r.

Let's take a look at the recommended similar singers:

{'item': [('Jay Chou', 1)], 'items': [('Jay Chou', 0.99999999999999989), ('Wang Lihong', 0.88290674032689487), ('Tao Zhe', 0.85771472510708613), ('Nanquan Mama', 0.85302933173781792), ('Eason Chan', 0.85081279073917004), ('Lin Junjie', 0.8493482632159628), ('Sun Yanzi', 0.83399343224087041), ('Zhang Huimei', 0.82906432515041484), ('Fang Datong', 0.8248567578398334), ('Mayday', 0.82292313630632996)], ' user': }

Similar item recommendations may seem simple, but they are one of the most common applications of traditional recommendation systems. The similar item calculation here is calculated using the factor dense vector obtained by matrix decomposition, rather than the sparse vector of the traditional neighbor model. Using dense vectors to calculate neighbors, there are also some artifacts, implicitly using another python package, called annoy[5], open source from Spotify. In addition to annoy, similar are kgraph[6],nmslib[7]. They are all used to quickly calculate close neighbors and can be considered for use in their own recommendation system.

In addition, lightfm can also use the user and item attribute data to do embedding, so as to achieve a cold start strategy to a certain extent, which will be introduced later if there is a relationship between you and me.

There's nothing to write about, so let's take a freestyle:

yo, yo, check, check Follow implicit feedback Recommend not to get tired from now on Throw away the mean variance and bear hug auc

Note that here is a "single bet x3".

The full code of this article can be found:

https://github.com/xingwudao/learning-to-rank-with-implicit-matrix

[1] http://yifanhu.net/pub/cf.pdf

[2] https://arxiv.org/abs/1205.2618

[3] https://www.slideshare.net/xamat/recsys-2016-tutorial-lessons-learned-from-building-reallife-recommender-systems

[4] https://www.upf.edu/web/mtg/lastfm360k

[5] https://github.com/spotify/annoy

[6] https://github.com/aaalgo/kgraph

[7] https://github.com/searchivarius/nmslib

Author of this article: Chen Kaijiang @ No Knife. Specific introduction is estimated that you do not look at it, if you are interested, see the previous article of the public account.

★ Those who read this article also read: "Ten Experiences of Recommended System Old Drivers"

Guess what I want to do after writing so much? Of course, I also want to recruit people: [Back-end development engineer]: Want to transform AI engineers and suffer from no one to lead the way? It doesn't matter, we give you the opportunity to practice, get on the bus for free, and grow and learn together. As long as you have a good programming foundation, smart, self-motivated, passionate, and do not require you to understand ai big data, these Rausch sub-concepts, come is a shuttle, in the dry school! (Java, python direction can be) Resume: [email protected]