Processing and analyzing text data is a crucial skill in software development or data development. Whether it's data cleaning, information extraction, or log analysis, regular expressions play an indispensable role. It is a powerful tool that efficiently searches, replaces, and parses text with a concise and elegant pattern-matching language. Although regular expressions may seem complex, once mastered, their application will greatly improve productivity and make complex text processing tasks effortless.
However, to truly master regular expressions, it's not enough to understand their syntax and basic rules. Practice brings true knowledge, and only through concrete examples can we deeply understand its power and use it flexibly. Let's introduce nine examples of Python regular expressions, and take you step by step to explore the charm of regular expressions. From basic string matching to complex text parsing, each case is designed to help you master this important skill in practice when working with text.
Example 1: Verify your email address
Verifying the validity of an email address is a classic use case for regular expressions. Here's an example program:
import re
def val_email(email):
pattern = r"^[a-zA-Z0-9]+@[a-zA-Z0-9]+\.[a-zA-Z]{2,}#34;
if re.match(pattern, email):
print("有效的email")
else:
print("无效的email!!")
val_email(email="[email protected]")
val_email(email="snb/smartnotebook.tech")
val_email(email="[email protected]")
In this example, Python's re module is used to compile a regular expression pattern that matches the format of a valid email address. Then, use its match() function to check if the email variable matches that pattern.
In this regular pattern, there are a few key points:
- Use [] to represent a range. For example, [a-zA-Z0-9] can match numbers between 0 and 9, uppercase letters between A and Z, or lowercase letters between a and z.
- ^ denotes the beginning of the line. In this example, use it to make sure that the text must start with [a-zA-Z0-9].
- $ denotes the end of the line.
- \ is used to escape special characters (which allows matching in the example with a . such characters).
- The {n,m} syntax is used to match the n to m repetitions of the previous regular expression. {2,} is used, which means that the preceding part [a-zA-Z] should be repeated at least 2 times. That's why "[email protected]" is recognized as an invalid email address.
- + indicates 1 or more repetitions that match the previous regular expression. For example, AB+ will match an ace followed by any number of bs.
This classic example shows some of the basic syntax for using regular expressions in Python. In fact, Python's re module is a hidden gem from which many more tricks can be used.
Example 2: Extracting numbers from strings
The most straightforward idea to find some special characters from a long piece of text is to use the for loop to iterate through all the characters and find the one you need. But you don't really need to use any loops. Regular expressions are inherently used as filters.
import re
def extract_numbers(text):
pattern = r"\d+"
return re.findall(pattern, text)
print(extract_numbers("There are over 1000 views of Snb's articles."))
As you can see above, the re.findall() function takes a regular expression and a text that can be handy to help find all the characters you need. \d is used to match a number in a regular expression.
Example 3: Verify your phone number
The following example also leverages \d to check for valid phone numbers:
import re
def is_valid_phone_number(phone_number):
pattern = r"^\d{3}-\d{4}-\d{4}#34;
return bool(re.match(pattern, phone_number))
print(is_valid_phone_number("137-1234-5678"))
print(is_valid_phone_number("13712345678"))
In addition to \d, ^, $, and {n} syntax are used in regular expressions to ensure that the string is a valid phone number.
Example 4: Divide text into words
Splitting long texts into separate words is another common need in everyday programming. With the help of the split() function of the re module, we can easily accomplish this task:
print(re.split(r'\s+', 'a b c'))
print( re.split(r'[\s\,]+', 'a,b, c d'))
print(re.split(r'[\s\,\;]+', 'a,b;; c d'))
As shown in the code above, use \s to match spaces in the regular expression.
Example 5: Use regular expressions to find and replace text
After using a regular expression to find special characters from the text, we may need to replace them with new strings. The sub() function in the re module makes this process very smooth:
import re
text = """SmartNotebook is a modern,
enterprise-grade notebook designed
for data analysis/data science platform."""
pattern = r"book"
replacement = "Book"
new_text = re.sub(pattern, replacement, text)
print(new_text)
As shown above, you only need to pass three parameters to the sub() function: pattern, replacement, and original text. Once executed, it will return a new text.
Example 6: Recompile a regular expression in Python
When matching strings with regular expressions in Python, there are typically two steps:
- Compile regular expressions.
- Use compiled regular expressions to match strings.
Therefore, if a regular expression needs to be reused, recompiling it each time can be a waste of resources. To avoid this, Python allows us to precompile a regular expression once and then reuse the compiled object in subsequent matches. This can significantly improve performance and efficiency.
import re
re_numbers = re.compile(r'^\d+#39;)
print(re_numbers.match('123'))
print(re_numbers.match('SmartNotebook'))
As shown in the example above, it demonstrates how to use the re-module's compile() function to pre-compile a regular expression and use it later. As long as the string doesn't match the regular expression, the match() function returns None.
Example 7: Extracting and manipulating sub-content of text
The group() method is a function in the Python re module that returns one or more matching subgroups of a regular matching object. It comes in handy for extracting different parts of text.
For example, the following code shows how to extract two parts of a time string represented in the format "HH":
import re
time='18:05'
matched = re.match(r'^([0-1][0-9]|2[0-3])\:([0-5][0-9])#39;, time)
print(matched.groups())
print(matched.group())
print(matched.group(0))
print(matched.group(1))
print(matched.group(2))
As shown above, group(0) returns the original string. group(1) and group(2) then return the first and second parts of the matching string, respectively.
Example 8: Named groups are used to extract sub-content
When there are a large number of subgroups, the numbers in the program can make the code difficult to understand. As a result, Python provides a trick for named groups to extract sub-content: you can use named groups to capture specific parts of a matching string without having to use numbered capture groups. This makes the code easier to read and maintain. Here's an example:
import re
text = "SmartNotebook, age 2"
pattern = r"(?P<name>\w+),\sage\s(?P<age>\d+)"
match = re.search(pattern, text)
print(match.group("name"))
print(match.group("age"))
As shown above, the key syntax for naming groups is ? P<xxx>。 It defines the name of the corresponding group, and you can use the group() function to extract content based on that name.
EXAMPLE NINE: USE THE VERBOSE FLAG TO MAKE REGULAR EXPRESSIONS MORE READABLE
In some complex scenarios, regular expressions can become increasingly complex and difficult to understand. There definitely needs to be a way to make it neater and clearer. In this case, you can use re. VERBOSE TRICKS.
import re
text = "SmartNotebook, [email protected], 198-2133-7583"
pattern = r"""
(?P<name>\w+),\s
(?P<email>\w+@\w+\.\w+),\s
(?P<phone>\d{3}-\d{4}-\d{4})
"""
match = re.search(pattern, text, re.VERBOSE)
if match:
print(match.group("name"))
print(match.group("email"))
print(match.group("phone"))
As shown above, you can split a long regex into multiple lines to improve readability. As long as there is re. VERBOSE logo, it will not be correctly identified as usual.
In software development and data processing, the processing and analysis of text data is a crucial skill. Regular expressions play an integral role in this process, efficiently searching, replacing, and parsing text through a concise pattern-matching language. Mastering regular expressions not only boosts productivity, but also makes complex text processing tasks easy. From verifying email addresses to extracting and manipulating sub-content of text, each example helps you understand and put this powerful tool to work, from verifying email addresses to extracting and manipulating sub-content of text.