laitimes

A string fuzzy match in Python to an open-source library

author:The Lost Little Bookboy's Note

Brief introduction

theFuzz is a powerful tool for fuzzy string matching and similarity calculations. It can help us to perform fuzzy matching and string comparison when processing text data, such as spelling correction, string similarity calculation, and fuzzy search, etc.

Basic principle

The basic principle of the theFuzz library is to use different algorithms to calculate the similarity between strings. One of the most commonly used algorithms is the Levenshtein distance algorithm, which measures the minimum number of edit operations (insert, delete, replace) between two strings. By calculating the Levenshtein distance between two strings, we can get their similarity.

In the TheFuzz library, we can calculate the similarity of two strings using the fuzz.ratio() function, which returns an integer between 0-100 representing the percentage of similarity of the two strings. We can also use the fuzz.partial_ratio() function to calculate the partial similarity of two strings, it will ignore the parts of the two strings that do not match, and only calculate the similar parts that appear in the two strings. In addition, there are fuzz.token_sort_ratio() and fuzz.token_set_ratio() functions that can be used to calculate the word sort similarity and word set similarity of two strings.

Installation

To install the theFuzz library, you can use the pip command. Open a terminal (command prompt) and run the following command

pip install fuzzywuzzy python-Levenshtein
           

Basic use

Once installed, let's look at a few code examples

from fuzzywuzzy import fuzz

str1 = "hello world"
str2 = "hello python"

similarity = fuzz.ratio(str1, str2)
print(similarity)
           

The result of the run is 61, which means that str1 and str2 are 61% similar

A string fuzzy match in Python to an open-source library

TheFuzz

from fuzzywuzzy import fuzz

str1 = "hello world"
str2 = "world hello"

partial_similarity = fuzz.partial_ratio(str1, str2)
print(partial_similarity)
           

Running the above code, the output is 45, which means that the partial similarity of the two strings is 45%.

A string fuzzy match in Python to an open-source library

TheFuzz

from fuzzywuzzy import fuzz

str1 = "hello world"
str2 = "world hello"

token_sort_similarity = fuzz.token_sort_ratio(str1, str2)
token_set_similarity = fuzz.token_set_ratio(str1, str2)

print(token_sort_similarity)
print(token_set_similarity)
           

Execute the above code, the output results are 100 and 100, respectively, indicating that the word order similarity and word set similarity of the two strings are 100%.

A string fuzzy match in Python to an open-source library

TheFuzz

References https://github.com/seatgeek/thefuzz https://github.com/seatgeek/fuzzywuzzy