Text Processing Tools: 9 Practical Cases for Python Regular Expressions

Processing and analyzing text data is a crucial skill in software development or data development. Whether it's data cleaning, information extraction, or log analysis, regular expressions play an indispensable role. It is a powerful tool that efficiently searches, replaces, and parses text with a concise and elegant pattern-matching language. Although regular expressions may seem complex, once mastered, their application will greatly improve productivity and make complex text processing tasks effortless.

However, to truly master regular expressions, it's not enough to understand their syntax and basic rules. Practice brings true knowledge, and only through concrete examples can we deeply understand its power and use it flexibly. Let's introduce nine examples of Python regular expressions, and take you step by step to explore the charm of regular expressions. From basic string matching to complex text parsing, each case is designed to help you master this important skill in practice when working with text.

Example 1: Verify your email address

Verifying the validity of an email address is a classic use case for regular expressions. Here's an example program:

import re


def val_email(email):
    pattern = r"^[a-zA-Z0-9]+@[a-zA-Z0-9]+\.[a-zA-Z]{2,}#34;
    if re.match(pattern, email):
        print("有效的email")
    else:
        print("无效的email!!")


val_email(email="[email protected]")
val_email(email="snb/smartnotebook.tech")
val_email(email="[email protected]")

In this example, Python's re module is used to compile a regular expression pattern that matches the format of a valid email address. Then, use its match() function to check if the email variable matches that pattern.

In this regular pattern, there are a few key points:

Use [] to represent a range. For example, [a-zA-Z0-9] can match numbers between 0 and 9, uppercase letters between A and Z, or lowercase letters between a and z.
^ denotes the beginning of the line. In this example, use it to make sure that the text must start with [a-zA-Z0-9].
$ denotes the end of the line.
\ is used to escape special characters (which allows matching in the example with a . such characters).
The {n,m} syntax is used to match the n to m repetitions of the previous regular expression. {2,} is used, which means that the preceding part [a-zA-Z] should be repeated at least 2 times. That's why "[email protected]" is recognized as an invalid email address.
+ indicates 1 or more repetitions that match the previous regular expression. For example, AB+ will match an ace followed by any number of bs.

This classic example shows some of the basic syntax for using regular expressions in Python. In fact, Python's re module is a hidden gem from which many more tricks can be used.

Example 2: Extracting numbers from strings

The most straightforward idea to find some special characters from a long piece of text is to use the for loop to iterate through all the characters and find the one you need. But you don't really need to use any loops. Regular expressions are inherently used as filters.

import re

def extract_numbers(text):
    pattern = r"\d+"
    return re.findall(pattern, text)


print(extract_numbers("There are over 1000 views of Snb's articles."))

As you can see above, the re.findall() function takes a regular expression and a text that can be handy to help find all the characters you need. \d is used to match a number in a regular expression.

Example 3: Verify your phone number

The following example also leverages \d to check for valid phone numbers:

import re


def is_valid_phone_number(phone_number):
    pattern = r"^\d{3}-\d{4}-\d{4}#34;
    return bool(re.match(pattern, phone_number))


print(is_valid_phone_number("137-1234-5678"))
print(is_valid_phone_number("13712345678"))

In addition to \d, ^, $, and {n} syntax are used in regular expressions to ensure that the string is a valid phone number.

Example 4: Divide text into words

Splitting long texts into separate words is another common need in everyday programming. With the help of the split() function of the re module, we can easily accomplish this task:

print(re.split(r'\s+', 'a b   c'))


print( re.split(r'[\s\,]+', 'a,b, c  d'))


print(re.split(r'[\s\,\;]+', 'a,b;; c  d'))

As shown in the code above, use \s to match spaces in the regular expression.

Example 5: Use regular expressions to find and replace text

After using a regular expression to find special characters from the text, we may need to replace them with new strings. The sub() function in the re module makes this process very smooth:

import re


text = """SmartNotebook is a modern, 
          enterprise-grade notebook designed 
          for data analysis/data science platform."""
pattern = r"book"
replacement = "Book"


new_text = re.sub(pattern, replacement, text)
print(new_text)

As shown above, you only need to pass three parameters to the sub() function: pattern, replacement, and original text. Once executed, it will return a new text.

Example 6: Recompile a regular expression in Python

When matching strings with regular expressions in Python, there are typically two steps:

Compile regular expressions.
Use compiled regular expressions to match strings.

Therefore, if a regular expression needs to be reused, recompiling it each time can be a waste of resources. To avoid this, Python allows us to precompile a regular expression once and then reuse the compiled object in subsequent matches. This can significantly improve performance and efficiency.

import re
re_numbers = re.compile(r'^\d+#39;)
print(re_numbers.match('123'))
print(re_numbers.match('SmartNotebook'))

As shown in the example above, it demonstrates how to use the re-module's compile() function to pre-compile a regular expression and use it later. As long as the string doesn't match the regular expression, the match() function returns None.

Example 7: Extracting and manipulating sub-content of text

The group() method is a function in the Python re module that returns one or more matching subgroups of a regular matching object. It comes in handy for extracting different parts of text.

For example, the following code shows how to extract two parts of a time string represented in the format "HH":

import re


time='18:05'
matched = re.match(r'^([0-1][0-9]|2[0-3])\:([0-5][0-9])#39;, time)


print(matched.groups())
print(matched.group())
print(matched.group(0)) 
print(matched.group(1))
print(matched.group(2))

As shown above, group(0) returns the original string. group(1) and group(2) then return the first and second parts of the matching string, respectively.

Example 8: Named groups are used to extract sub-content

When there are a large number of subgroups, the numbers in the program can make the code difficult to understand. As a result, Python provides a trick for named groups to extract sub-content: you can use named groups to capture specific parts of a matching string without having to use numbered capture groups. This makes the code easier to read and maintain. Here's an example:

import re


text = "SmartNotebook, age 2"
pattern = r"(?P<name>\w+),\sage\s(?P<age>\d+)"
match = re.search(pattern, text)
print(match.group("name"))  
print(match.group("age"))

As shown above, the key syntax for naming groups is ? P<xxx>。 It defines the name of the corresponding group, and you can use the group() function to extract content based on that name.

EXAMPLE NINE: USE THE VERBOSE FLAG TO MAKE REGULAR EXPRESSIONS MORE READABLE

In some complex scenarios, regular expressions can become increasingly complex and difficult to understand. There definitely needs to be a way to make it neater and clearer. In this case, you can use re. VERBOSE TRICKS.

import re


text = "SmartNotebook, [email protected], 198-2133-7583"
pattern = r"""
    (?P<name>\w+),\s
    (?P<email>\w+@\w+\.\w+),\s
    (?P<phone>\d{3}-\d{4}-\d{4})
"""


match = re.search(pattern, text, re.VERBOSE)
if match:
    print(match.group("name"))
    print(match.group("email"))
    print(match.group("phone"))

As shown above, you can split a long regex into multiple lines to improve readability. As long as there is re. VERBOSE logo, it will not be correctly identified as usual.

In software development and data processing, the processing and analysis of text data is a crucial skill. Regular expressions play an integral role in this process, efficiently searching, replacing, and parsing text through a concise pattern-matching language. Mastering regular expressions not only boosts productivity, but also makes complex text processing tasks easy. From verifying email addresses to extracting and manipulating sub-content of text, each example helps you understand and put this powerful tool to work, from verifying email addresses to extracting and manipulating sub-content of text.

Text Processing Tools: 9 Practical Cases for Python Regular Expressions

Example 1: Verify your email address

Example 2: Extracting numbers from strings

Example 3: Verify your phone number

Example 4: Divide text into words

Example 5: Use regular expressions to find and replace text

Example 6: Recompile a regular expression in Python

Example 7: Extracting and manipulating sub-content of text

Example 8: Named groups are used to extract sub-content

EXAMPLE NINE: USE THE VERBOSE FLAG TO MAKE REGULAR EXPRESSIONS MORE READABLE

Read on

It was jointly reported by 11 students, and Huang Feiruo was notified and dealt with again

Jia Bing, secretary of the county party committee, investigated the progress of the harmless treatment of rural domestic sewage

The Xiaomi Mi 15 is completely stable! Snapdragon 8 main processor early test: performance and power consumption are perfect

The result of the treatment of the stinky meat problem in Changfeng school! A summary of recent school problems!

Handbook of Yellow Leaf Control of Orchid: Causes, Identification and Treatment Methods

If you don't do this when building the wall, even if you hang the steel wire mesh, it is useless, and the wall will still crack!

At the stinky meat disposal site, the identity of the smiling woman in red was picked up, and it turned out to be her!

Kick the girl down on the men's basketball court! The result of the police handling is controversial! Insiders revealed details

What kind of grudge? The boy on the basketball court kicked the girl with a flying foot, and the police intervened: it has been dealt with

The official response is here! Yi Jianlian Company spoke for the first time, CBA: It has been concerned and will be verified and handled

CCTV Bulletin: The results of the handling of the stinky meat school incident were announced, the principal was dismissed, and the contractor was heavily fined

In the past two days, a college student friend and I have a private chat, telling the story of him and the college counselor teacher [covering his face] He is a junior in college, the female counselor is 33 years old, married, and the two are separated. He and her because of

Why did the state "deal with" Jack Ma harshly? 10 big mistakes that opened your eyes, do you still support him?

He was originally the executioner of the Kuomintang, but he let go of 19 Communists, and after the founding of the People's Republic of China, he was treated leniently

With an annual growth of 7,000 tons and radiation up to 10,000 years, what should be done with nuclear waste?

How did the ancient Romans deal with female slaves? You can't imagine how tragic their fate is