How to build a knowledge graph using Neo4J and Transformer

2022-03-25 15:10:26

Author | Walid Amamou

Translated by | Hirakawa

Planning | Ling Min

This article was originally published at Towards Data Science.

How to build a knowledge graph using Neo4J and Transformer

Image courtesy of the author: Knowledge Graph in Neo4j

Introduction

In this article, I'll show how to create a knowledge graph based on job descriptions using an optimized, converter-based named entity recognition (NER) and spaCy relational extraction model. The methods presented here can be applied to any other field, such as biomedicine, finance, healthcare, etc.

Here are the steps we're going to take:

Load the optimized converter NER and spaCy relationship extraction model in Google Colab;

Create a Neo4j Sandbox and add entities and relationships;

Query the graph to find the positions that match your target resume the most, find the three most popular skills and the ones with the highest co-occurrence rates.

To learn more about how to use UBIAI to generate training data and optimize NER and relational extraction models, check out the following articles.

UBIAI: Easy-to-use NLP application text annotations

How to use the BERT converter to train a federated entity and relationship extract classifier with spaCy3

How to optimize the BERT converter using spaCy3

The job description dataset is available from Kaggle.

At the end of this article, we can create a knowledge graph like the one shown below.

Image courtesy of the author: Knowledge Graph of Job Description

Named entities and relationship extraction

First, we load the dependencies of the NER and the relational model, as well as the previously optimized NER model itself, to extract skills, qualifications, specializations, and years of service:

Load the job dataset from which we want to extract entities and relationships:

To extract entities from a job collection:

Before we feed an entity to a relational extraction model, we can look at some of the extracted entities:

We are now ready to predict relationships; first load the relationship extraction model, be sure to change the directory to rel_component/scripts so that you can access all the necessary scripts for the relationship model.

Neo4J

Now we can load the job dataset and extract it into the Neo4j database.

First, we create an empty Neo4j Sandbox and add the connection information as follows:

Next, we add documents, entities, and relationships to the knowledge graph. Note that we need to extract the integer age from the name of the entity EXPERIENCE and store it as a property.

Now let's get into the fun part. We can start the knowledge graph and run the query. Let's run a query to find out which position best matches your target resume:

Public entities in the results displayed in tabular form:

Diagrams displayed in visual form:

Image courtesy of the author: Based on Best Match positions

Although this dataset has only 29 job descriptions, the approach presented here can be applied to large-scale datasets with thousands of jobs. With just a few lines of code, we can instantly extract the work that matches the most of our target resume.

Let's find out the skills we need the most:

And skills that require a maximum number of years of service:

Web development and technical support require the highest number of years of work, followed by security settings.

Finally, let's look up the skill pairs with the highest co-occurrence rate:

Summary

In this article, we describe how to extract models using converter-based NER and spaCy relationships to create knowledge graphs with Neo4j. In addition to information extraction, the topology of the graph can also be used as input to other machine learning models.

Combining NLP with the graph database Neo4j can accelerate information discovery in many fields, with more significant effects in medical and biomedical applications.

https://medium.com/m/global-identity?

How to build a knowledge graph using Neo4J and Transformer

Read on

Evolving Perception Architecture: Transformer's Application in Automated Driving

The Cassia Software Institute has made progress in the analysis of the construction dependence of Python programs

What is the Event Knowledge Graph? The Institute of Computing, Chinese Academy of Sciences, released a review paper on the "Overview of Event Knowledge Graph"

The parameter volume is reduced by 85%, and the performance exceeds viT across the board: the new image classification method ViR

The world's strongest artificial intelligence list refreshed The "Chinese team" occupies three of the top five

The "Chinese team" refreshed the list of the world's strongest artificial intelligence, accounting for three of the top five

Professor Wang Jie of the University of Science and Technology of China: Knowledge graph reasoning technology based on representation learning

The Chinese team refreshed the knowledge map "World Cup", but when I clicked on the championship team, I was stunned

Use Transformer to do line work, really fragrant!

Magi Search has paused to provide a structured search experience based on a knowledge graph

Mille-feuille Transformer is here! Multilingual machine translation standards refresh multiple SOTA levels

AI Cracks Ancient Text on Nature Cover: Fix Missing Text, Precise Geographic Location and Writing Time

Will Transformer dominate the AI space? It is too early to draw conclusions

With only one picture + camera position, AI can brain supplement the surrounding environment| CVPR2022

Is there still true or false open source? Neo4j sued 3 companies for falsely advertising open source in favor

LSTM is not "dead" yet!

He Kaiming's team's new work: only use ordinary ViT, do not do layered design can also do the target detection

No training required, the auto-expanding visual transformer is coming

Transformer papers cite more than 40,000, and the two authors left Google to start a business