天天看點

Regular expression metacharacters 

Metacharacters are the building blocks of regular expressions. Characters in RegEx are understood to be either a metacharacter with a special meaning or a regular character with a literal meaning.

The following are some common RegEx metacharacters and examples of what they would match or not match in RegEx.

Metacharacter Description Examples
\d Whole Number 0 - 9

\d\d\d = 327

\d\d = 81

\d = 4

-----------------------------------------

\d\d\d ≠  24631

\d\d\d doesn't return 24631 because 24631 contains 5 digits.  \d\d\d only matches for a 3-digit string.

\w Alphanumeric Character

\w\w\w = dog

\w\w\w\w = mule

\w\w = to

-----------------------------------------

\w\w\w = 467

\w\w\w\w = 4673

-----------------------------------------

\w\w\w ≠  boat

\w\w\w doesn't return boat because boat contains 4 characters.

-----------------------------------------

\w ≠  !

\w doesn't return the exclamation point ! because it is a non-alphanumeric character.

\W Symbols

\W = %

\W = #

\W\W\W = @#%

-----------------------------------------

\W\W\W\W ≠  dog8

\W\W\W\W doesn't return dog8 because d, o, g, and 8 are alphanumeric characters.

[a-z]

[0-9]

Character set, at least one of which must be a match, but no more than one unless otherwise specified.

The order of the characters does not matter.

pand[ora] = panda

pand[ora] = pando

-----------------------------------------

pand[ora] ≠  pandora

pand[ora] doesn't bring back pandora because it is implied in pand[ora] that only 1 character in [ora] can return.

(Quantifiers that allow pand[ora] to match for pandora is discussed below.)

(abc)

(123)

Character group, matches the characters abc or 123 in that exact order.

pand(ora) = pandora

pand(123) = pand123

-----------------------------------------

pand(oar) ≠  pandora

pand(oar) does not match for pandora because it's looking for the exact phrase pandoar.

| Alternation - allows for alternate matches. | operates like the Boolean OR. pand(abc|123) = pandora OR pand123
? Question mark matches when the character preceding ? occurs 0 or 1 time only, making the character match optional.

colou?r = colour (u is found 1 time)

colou?r = color (u is found 0 times)

*

Asterisk matches when the character preceding * matches 0 or more times.

Note: * in RegEx is different from * in dtSearch.  RegEx * is asking to find where the character (or grouping) preceding * is found ZERO or more times.  dtSearch * is asking to find where the string of characters preceding * or following * is found 1 or more times.

tre*= tree (e is found 2 times)

tre* = tre (e is found 1 time)

tre* = tr (e is found 0 times)

-----------------------------------------

tre* ≠  trees

tre* doesn't match the term trees because although "e" is found 2 times, it is followed by "s", which is not accounted for in the RegEx.

+ Plus sign matches when the character preceding + matches 1 or more times. The + sign makes the character match mandatory.

tre+ = tree (e is found 2 times)

tre+ = tre (e is found 1 time)

-----------------------------------------

tre+ ≠  tr (e is found 0 times)

tre+ doesn't match for tr because e is found zero times in tr.

. (period) The period matches any alphanumeric character or symbol.

ton. = tone

ton. = ton#

ton. = ton4

-----------------------------------------

ton. ≠  tones

ton. doesn't match for the term tones because . by itself will only match for a single character, here, in the 4th position of the term.  In tones, s is the 5th character and is not accounted for in the RegEx.

.*

Combine the metacharacters . and *, in that order .* to match for any character 0 or more times.

NOTE:  .* in RegEx is equivalent to dtSearch wildcard * operator.

tr.* = tr

tr.* = tre

tr.* = tree

tr.* = trees

tr.* = trough

tr.* = treadmill

RegEx quantifiers

RegEx use quantifiers to indicate the scope of a search string. You can use multiple quantifiers in your search string. The following table gives examples of the quantifiers you can use in your RegEx:

Quantifier Description Examples
{n} Matches when the preceding character, or character group, occurs n times exactly.

\d{3} = 836

\d{3} = 139

\d{3} = 532

-----------------------------------------

pand[ora]{2} = pandar

pand[ora]{2} = pandoo

pand(ora){2} = pandoraora

-----------------------------------------

pand[ora]{2} ≠  pandora

pand[ora]{2} doesn't match for pandora because the quantifier {2} only allows for 2 letters from the character set [ora].

{n,m} Matches when the preceding character, or character group, occurs at least n times, and at most m times.

\d{2,5} = 97430

\d{2,5} = 9743

\d{2,5} = 97

-----------------------------------------

\d{2,5} ≠  9

9 does not match because it is 1 digit, thus outside of the character range.

Escaping RegEx Metacharacters

When using RegEx to search for a character that is a reserved metacharacter, use the backslash \ to escape the character so it can be recognized. The following table gives an example on how to escape a reserved metacharacter when searching.

Search For RegEx Match Results
UK phone number \+[0-9]{11}

+14528280001

+38119930978

-----------------------------------------

If the + sign is not escaped with a backslash, RegEx treats + as a quantifier instead of the literal plus sign character.

繼續閱讀