Graph databases
Common use cases
- Social application scenarios: friend recommendation, social relationship combing (fan attention), how to know Trump through the least number of people?
- Knowledge Graph: Put the confused network of relationships and quickly search for what you need
-
Recommended Scenarios:
Personalized recommendations for retail, e-commerce, content, etc., explore user needs, and improve user experience
- Navigating Urban Planning: The Shortest Path
- Finance: Asset trading charts
- Node portraits: device portraits, user portraits, etc
Basic concepts
- Node (V): People, movies, recipes, ingredients
- Node attribute (VP): The person's age, height, age, and occupation
- Relationship (E): The relationship between people and movies (likes, likes, ratings, etc.)
- Relational Attribute (EP): The score of a person's rating of a movie
- Label: The user and device that classify nodes and relationships
It is more suitable for relational databases than relational databases
Body: The user --- the recipe
Common recommendation logic
- item ==> item
- Recommend similar recipes based on content (ingredients, nutrition, labels, etc.)
- user ==> item ==> user
- Recommend creation records to recommend similar users
- user ==> item ==> user ==> item
- Recommend his usual recipes based on the user with the highest similarity
Commonly used recommendations in graph database libraries
- Content filtering
- Collaborative filtering
- Cosine similarity
- Pearson similarity
- Recommended in the field
Content recommendations
As the name suggests, it is a recommendation based on the content
Recipe ingredients similar content recommendation: I2I
Chicken stew with mushrooms and chicken stew with shiitake mushrooms are the same ingredients
You can find recipes sorted with the same ingredients as chicken stew with shiitake mushrooms
You can also use recipe labels, recipe nutrition facts, and more as recommendations
Jaccard u2i2u
Official:
Recommend similar users based on users who have made the same recipe
Union: This is the full recipe
Intersection: Get recipes that have been made in the same way
Cosine similarity u2i2u
Official:
Sample data:
User A and User B calculate the cosine similarity
Collection Cosine: (Correct Cosine Similarity)
Intersection Cosine: (with some missing cosine similarity)
Due to the merger of the graph database logic and the parameter logic, a part of the value of the cosine similarity calculation method is lost
Accuracy issues
What about A and C?
After making the same recipe too
Expand the recipe sample
That is to say, it only happens to users who have made a small number of new users, that is, when the collection is small, when the recipe sample is expanded and after the user has made the same recipe, the difference between the two effects will be reduced, that is, the more users make recommendations, the more accurate they are. For users who make it often, the template deviation will be reduced.
Pearson similarity u2i2u
Official:
Sample data:
User A and User B are counted as Pearson similarity
Null values are expected to be the average number of times the user has made them
Collection Pearson:
Intersection Pearson:
Compare cosine similarity:
The difference between the two formulas is the null expectation (zero, average), and the formulas can be extrapolated from each other
Recommended u2i2u2i in the field
Recommended recipes based on the users with the highest similarity (using cosine similarity as a reference)
Similarity between users:
User A ---> User B--->0.617
User A ---> User C--->0.954
User A ---> User D--->0.796
==》Fetch the first two similar users C and user D
==》红烧肉 =8x0.954+0x0.796=7.632
==》辣椒炒肉 =1x0.954+1x0.796=1.75
==》, that is, the weighted sum is sorted, and the weighted sum is in front
Good Recommendation Expectation:
- Feature-specific, personalized recommendations
- The results are precise
- Fast performance
In the actual recommendation scenario, functions, effects, and performance complement each other and affect each other, and many times a compromise needs to be considered
1. Select the similarity algorithm problem
Performance: Cosine similarity >> Pearson similarity
Accuracy: Cosine similarity < Pearson similarity
In the initial stage of Alibaba Cloud GDB, we will use cosine similarity as the similarity coefficient between users
2. The quantity level is too large, and the full amount is too slow (data filtering)
Recipes are recommended based on the users with the highest similarity
Adopt a data filtering approach
10,000/100,000/million ==> Find the person with the most recipes for the same making
Thousand/10,000==> Collaborative filtering
Ten/100==> Weighted sum
3. Cold data issues
Fourth, timeout truncation
After the timeout, select the hot data to make up the data
作者:xiaospace
Source-WeChat public account: Joyoung technical team
Source: https://mp.weixin.qq.com/s/cq5GNDeT495xVwdAUTavvQ