天天看点

regex 正则表达式_正则表达式揭秘:RegEx并不像看起来那么难

regex 正则表达式

by Vijayabharathi Balasubramanian

通过Vijayabharathi Balasubramanian

Are you one of those people who stays away from regular expressions because it looks like a foreign language? I was one. Not anymore.

您是那些因为它看起来像外语而远离正则表达式的人之一吗? 我是一个。 不再。

Think of all those sounds, traffic signs and smells that you can recognize. Regular expressions are no different. It’s like a sign language to analyze strings.

考虑一下您可以识别的所有声音,交通标志和气味。 正则表达式没有什么不同。 就像分析字符串的手语一样。

We are going to get our head around regular expressions today. At least, regularly used expressions.

今天,我们将着眼于正则表达式。 至少, 定期使用表达式。

Much like any programming language, a regular expression is a succinct language in its own right.

就像任何编程语言一样,正则表达式本身就是一种简洁的语言。

We will know how to put regular expressions to good use by the end of this article. We will solve simple problems and learn loads in the process.

在本文结尾处,我们将知道如何充分利用正则表达式。 我们将解决简单的问题并在此过程中学习负载。

Are you willing to invest 30 minutes and come out enlightened in RegEx? Settle down then.

您愿意投资30分钟并在RegEx启发下成长吗? 然后坐下来。

为什么使用正则表达式? (Why regular expressions?)

We each have our own ‘why’, don’t we? One may be to test if the string is a valid hex color code. You may be writing a processor library such as Sass that leverages RegEx.

我们每个人都有自己的“为什么”,不是吗? 一种可能是测试字符串是否为有效的十六进制颜色代码。 您可能正在编写利用RegEx的处理器库(例如Sass) 。

I’ll let the universe throw the why at you and help you cover the how.

我让宇宙向你抛出原因 ,并帮助你解决怎样的问题 。

0.准备好您的游乐场 (0. Get Your Playground Ready)

参考文献 (References)

Most of the time, I find this page adequate to get going: Regular Expressions from MDN. In fact, that page is all you need. You can stop reading this post. Right now. Close this tab. ?

大多数时候,我发现此页面足以开始: MDN的正则表达式 。 实际上,该页面就是您所需要的。 您可以停止阅读这篇文章。 马上。 关闭此标签。 ?

Still with me? Thanks. You need a sandbox to play around in. Luckily, one is available on your browser. Just use the DevTools in your browser’s console.

还在我这儿? 谢谢。 您需要一个沙箱来玩耍。幸运的是,您的浏览器上有一个沙箱。 只需在浏览器的控制台中使用DevTools。

熟悉语法 (Familiarize yourself with the syntax)

To start with, we are going to use the

/expression/.test('string')

syntax.

首先,我们将使用

/expression/.test('string')

语法。

An

expression

is any regular expression that we build. A

string

is the string under test. The

test

method returns

true

or

false

depending on the match.

expression

是我们构建的任何正则表达式。

string

是被测字符串。

test

方法根据匹配结果返回

true

false

Slashes mark the start and end of the expression. Treat them like the double quotes (“) and single quotes (‘) that you use to the mark start and end of a plain string.

斜杠标记表达式的开始和结束。 像对待双引号(“)和单引号(')一样对待它们,将它们用于标记纯字符串的开头和结尾。

The expression between

/

is a literal. They are treated as literal characters. Variable names wouldn’t be resolved down to their contents.

/

之间的表达式是文字。 它们被视为文字字符。 变量名称无法解析为其内容。

To make it dynamic, we’ll have to go via the constructor route, using

new RegEx(variable_name)

syntax. This will come to rescue towards the end of the post.

为了使其动态,我们必须使用

new RegEx(variable_name)

语法通过构造函数路线。 这将在职位后期挽救过来。

Do it right now. Just type this into your browser console.

现在就做。 只需在浏览器控制台中输入即可。

/a/.test("a"); //true
/a/.test("b"); //false
           

If that works, you are ready. Don’t worry about what it is. That’s what we are going to breakdown into pieces in the following lines.

如果可行,则准备就绪。 不用担心它是什么。 这就是我们将在以下几行中细分的内容。

Let’s dive in…

让我们潜入……

1.从字母开始 (1. Start Small With Letters)

Let’s start small. We need to find if a string has a particular character. Look for the character

a

in a string.

让我们从小处开始。 我们需要查找字符串是否具有特定字符。 在字符串中查找字符

a

Here is the expression in all its glory:

这是所有荣耀的表达:

/a/.test("abc"); //true 
/a/.test("bcd"); //false 
/a/.test("cba"); //true
           

The expression does what we asked for, “Look for

a

in the string under test”. In our case,

abc

and

bca

do have the character

a

. But

bcd

does not have it.

该表达式完成了我们所要求的“在被测字符串中查找

a

”。 在我们的例子中,

abc

bca

确实具有字符

a

。 但是

bcd

没有。

分解 (Breakdown)

Now, that’s a lot of slashes and backslashes. Let’s break them down.

现在,这是很多斜杠和反斜杠。 让我们分解一下。

We’ve seen that

/expression/

is how we build regular expressions. So no question about slash there. In fact, we can even assign it to a variable and make it look better.

我们已经看到

/expression/

是我们构建正则表达式的方式。 因此,毫无疑问在这里有斜线。 实际上,我们甚至可以将其分配给变量并使其看起来更好。

The same code:

相同的代码:

let e=/a/; 
e.test("abc"); //true 
e.test("bcd"); //false 
e.test("cba"); //true
           

The expression between slashes is just a single character

a

in our case. We are looking only for that one character.

斜杠之间的表达仅仅是一个字符

a

在我们的例子。 我们只在寻找那个角色。

达到多字符 (Reach Multi-Characters)

Let’s scale the solution.

让我们扩展解决方案。

What if you want to find more than one character?

如果要查找多个字符怎么办?

Put them in sequence. Treat them as a substring.

按顺序放置它们。 将它们视为子字符串。

Here is an example:

这是一个例子:

/ab/.test("abacus"); //true 
/bac/.test("abacus"); //true  
/abc/.test("abacus"); //false 
/abas/.test("abacus"); //false
           

The string under test should contain the exact expression within slashes. We get a match if that condition is met.

被测字符串应在斜杠内包含确切的表达式。 如果满足该条件,我们将获得一场比赛。

bac

is within

abacus

but

abas

is not in

abacus

as it is. Even though we have those characters scrambled, we do not get an exact match.

bac

abacus

abas

不在

abacus

中。 即使我们把那些字符弄乱了,我们也没有完全匹配。

审查地面覆盖 (Review Ground Covered)

Symbol

/.../

. Slash (/) marks the start and end of the regular expression. Ignore the dots, that’s where we place the pattern. The

/a/

character between slashes is a pattern matched on string under test. The

/abc/

characters between slashes are looked up as a sub-string during the pattern matching test on string under test.

符号

/.../

。 斜杠(/)标记正则表达式的开始和结束。 忽略点,这就是我们放置图案的地方。 斜线之间的

/a/

字符是在测试字符串上匹配的模式。 在对被测字符串进行模式匹配测试期间,将斜线之间的

/abc/

字符作为子字符串查找。

2.数字模式 (2. Patterns in Numbers)

Let’s spice it up a bit. Let’s say you want to find out if a string is full of numeric characters.

让我们加一点香料。 假设您要找出一个字符串是否充满数字字符。

Here it is:

这里是:

let e=/0|1|2|3|4|5|6|7|8|9/;
e.test("42"); //true 
e.test("The answer is 42"); //true
           

First of all, the pattern looks pretty long. But the same long streak of characters can be expressed in just two characters. I reserved it towards end of this section for a dramatic closure.

首先,该模式看起来很长。 但是相同的长字符可以用两个字符表示 。 我保留了本节末尾的内容,以实现戏剧性的效果。

The second case shouldn’t be true. We’ll deal with it a bit later.

第二种情况不应该是真的。 我们待会儿处理。

For now, the pipe symbol (

|

) means or. Outside of regular expressions, we’ve used it as a bitwise or and conditional or with double pipes (||). That’s the same guy.

目前,管道符号(

|

)表示或 。 在正则表达式之外,我们将其用作按位或条件式或双管道(||)。 那是同一个人。

I could call that easy and call it a day. But you would scream for something better, right? We are developers. We spend the best part of our day thinking about better Bash and Git aliases to save few keystrokes.

我可以这么简单地称呼它为一天。 但是你会尖叫得更好,对吗? 我们是开发商。 我们一天中的大部分时间都在考虑更好的Bash和Git别名,以节省几次击键。

Should I type in nine pipe symbols? Nah.

我应该输入九个管道符号吗? 没事

Here we go again:

再来一次:

e=/[0123456789]/; 
e.test("42"); //true 
e.test("The answer is 42"); //still true
           

This is better. 9 pipes were replaced with 2 square brackets. 7 characters were saved. That’s 77.7% less keystrokes.

这个更好。 将9个管道替换为2个方括号。 保存了7个字符。 击键次数减少了77.7%。

By the way, anything within square brackets is considered as

Either this

or

that

. It is a character set. In our case, the string should contain either

, or

1

, or

2

, or…bear with me, I promised myself to write 1000 words a day, or

3

or

4

or

5

. All right, let’s stop. You get it.

顺便说一句,方括号内的任何东西都被视为

Either this

that

。 它是一个字符集。 在我们的情况下,字符串应包含

1

2

,或者…与我在一起,我答应自己每天要写1000个单词,或者

3

4

5

个单词。 好吧,让我们停止。 你懂了。

What are you saying? It still looks quite lengthy? Not satisfied?

你在说什么? 看起来还是很长吗? 不满意?

Okay, here we go once again:

好的,我们再来一次:

e=/[0-9]/; 
e.test(42); //true 
e.test("42"); //true 
e.test("The answer is 42"); //true!
           

How about that? Looks much cleaner, doesn’t it? Anything within square brackets

[]

means or.

0-9

marks a range, meaning zero to nine.

那个怎么样? 看起来更干净了,不是吗? 方括号

[]

任何内容表示或 。

0-9

表示一个范围,表示零到九。

So the test looks for characters from zero to nine in the test string.

因此,测试在测试字符串中查找从零到九的字符。

As you can see, the test takes numbers too.

如您所见,测试也采用数字。

前缀和后缀模式 (The prefix and suffix patterns)

Let’s now address that failing second case.

The answer is 42

matches our test because our pattern looks for numeric characters somewhere within the string. Not start to end.

现在让我们解决失败的第二种情况。

The answer is 42

与我们的测试匹配,因为我们的模式在字符串中的某个位置查找数字字符。 还没开始就结束 。

Let’s bring in

^

and

$

to help us.

让我们引入

^

$

来帮助我们。

  • ^

    means the start of the string. He is a double agent and he’ll trip us off. His second avatar is unmasked only in the last section.

    ^

    表示字符串的开头 。 他是双重代理人,他将带我们离开。 他的第二个头像仅在最后一节中显示。
  • $

    means the end of the string.

    $

    表示字符串的结束 。

Let’s get the prefix pattern sorted out:

让我们整理一下前缀模式:

/^a/.test("abc"); //true 
/^a/.test("bca"); //false 
/^http/.test("https://pineboat.in"); //true /^http/.test("ftp://pineboat.in"); //false
           

Any pattern that follows

^

should be at the start of the string under test.

^

任何模式都应在被测字符串的开头。

The second string starts with

b

while our pattern looks for

a

. The fourth one looks for

http

while the string starts with

ftp.

This is the reason they fail.

第二个字符串以

b

开头,而我们的模式查找

a

。 第四个查找

http

而字符串以

ftp.

开头

ftp.

这就是他们失败的原因。

后缀图案 (The suffix patterns)

The suffix pattern follows.

$

at the end of the pattern directs the test to look for end of string.

后缀模式如下。 模式结尾的

$

指示测试寻找字符串的结尾。

/js$/.test("regex.js"); //true 
/js$/.test("regex.sj"); //false
           

That should sound in your head like, “Look for

js

and then the end of the string”. Better yet, “Look for a string that ends in

js

”.

这在您的脑海中听起来像是“先查找

js

,然后查找字符串的结尾”。 更好的是,“寻找一个以

js

结尾的字符串”。

模式匹配端到端 (Pattern match End to End)

That paves the road to pattern match start to end, you might as well call it end to end.

这就为模式匹配从头到尾铺平了道路,您不妨称之为端到端。

let e=/^[0-9]$/ 
e.test("42"); //false - NO! 
e.test("The answer is 42"); //false 
e.test("7"); //true
           

Surprisingly, the first one failed when we added

^

and

$

.

令人惊讶的是,当我们添加

^

$

时,第一个失败

$

/^[0-9]$/

reads like, “Go to the start of the string. Look for a single numeral from the character set. Check if the string ends right there.” That’s the reason the last entry returned

true

. It is just a single number, start to end.

/^[0-9]$/

读法是:“转到字符串的开头。 从字符集中查找单个数字 。 检查字符串是否就此结束。” 这就是最后一个条目返回

true

的原因。 它只是一个数字,从头到尾。

That’s not what we wanted. We wanted to test if the string had one or more numerals.

那不是我们想要的。 我们想测试字符串是否有一个或多个数字。

We are very close. One last thing we need to learn is how to instruct the pattern to look for more than one character in the set.

我们很亲。 我们需要学习的最后一件事是如何指示模式在集合中查找多个字符。

三剑客的故事 (Tale of Three Musketeers)

A question mark (

?

), a plus (

+

) and an asterisk (

*

) met at a battle ground. Each is differently sighted.

问号(

?

),加号(

+

)和星号(

*

)在战场上相遇。 每个人都有不同的眼光。

The humble question mark (

?

)says, “I can see none or just one.”

谦虚的问号(

?

)表示:“我什么也看不到。”

Plus (

+

) says, “I need to see at least one or more.”

加号(

+

)说:“我需要至少看到一个或多个。”

Asterisk (

*

) says, “I get you both. I can see none, one, or more.”

星号(

*

)说:“我都明白了。 我看不到一个或多个。”

One of them is cleverly hiding what he is capable of.

其中之一巧妙地隐藏了他的能力。

The question mark gets on stage first:

问号首先登台:

/a?/.test(""); //true 
/a?/.test("a"); //true 
/a?/.test("b"); //true! 
/a?/.test("aa"); //true 
/^a?$/.test("aa"); //false
           
  • Matches empty string

    ""

    匹配空字符串

    ""

    as

    ?

    stands for 0 or 1

    ?

    代表0或1
  • Matches

    a

    匹配

    a

    one match

    一场比赛

  • Matches

    b

    符合

    b

    matches 0 occurrence

    匹配0次出现

  • Matches

    aa

    符合

    aa

    one match and the second

    一场比赛,第二场

    a

    is not part of the pattern

    a

    不是模式的一部分
  • /^a?$/

    does not match

    aa

    /^a?$/

    aa

    不匹配

    It looks for zero or one

    它寻找零或一

    a

    , start to end, nothing more, nothing less

    a

    ,开始到结束,仅此而已,仅此而已

The plus (

+

) looks at question mark and remarks, “I’m impressed, but your focus is so binary!”. And takes the stage to show off:

加号(

+

)看着问号和备注,“我印象深刻,但您的关注点是如此二元!”。 并上台炫耀:

/a+/.test("a"); //true 
/a+/.test("aa"); //true 
/a+/.test("ba"); //true! 
/^a+$/.test("aa"); //true  
/a+/.test(""); //false 
/a+/.test("b"); //false 
/^a+$/.test("ab"); //false
           

Remember what plus (

+

) said? It can match one or more occurrences of preceding pattern.

还记得加号(

+

)所说的吗? 它可以匹配一个或多个先前模式的出现。

All those returning

true

have one or more

a

. We even managed to get a whole string comprised only of

a

in the last one that returned true with

/^a+$/

.

所有返回

true

都有一个或多个

a

。 我们甚至设法获得了一个完整的字符串,

a

字符串仅包含最后一个用

/^a+$/

返回true的

/^a+$/

false

should make sense now, but a word on the last one that returned false.

/^a+$/

looks for

a

start to end, no other characters allowed. This is why

ab

failed the test.

false

现在应该有意义,但是最后一个返回false的单词。

/^a+$/

验看

a

开始到结束,不允许其他字符。 这就是

ab

测试失败的原因。

Finally, star (

*

) of the show gets on stage. He boasts that, “I can duel alone or duel you both at once” and says, “I can match zero, one or more”.

最后,节目的星星(

*

)登上舞台。 他夸口说:“我可以单独对决,也可以一次对决。”他说:“我可以匹配零个,一个或多个”。

/a*/.test("a"); //true 
/a*/.test("aa"); //true 
/a*/.test("ba"); //true 
/a*/.test(""); //true 
/a*/.test("b"); //true 
/^a*$/.test("aa"); //true 
/^a*$/.test(""); //true  
/^a*$/.test("ab"); //false
           

Except the last one, * was able to handle all else.

/^a*$/

reads like, 0 or more

a

start to end. Which is why empty string

""

passed the test and

"ab"

failed.

除了最后一个,*能够处理所有其他内容。

/^a*$/

读为0或更多,

a

开始到结束。 这就是为什么空字符串

""

通过测试而

"ab"

失败的原因。

返回通用答案 (Back to the Universal Answer)

Remember where were we before we met the three musketeers? Yes, “The answer is 42”.

还记得我们遇到三个火枪手之前在哪里吗? 是的,“答案是42”。

Now if we need to look for only numerals, one or more, start to end, what do we do?

现在,如果我们只需要查找一个或多个数字,请开始到结尾,我们该怎么办?

//Let's throw in a plus 
let e=/^[0-9]+$/ 
e.test("4"); //true 
e.test("42"); //true 
e.test("The answer 42"); //false - Hurray
           

The plus sign (

+

) in

[0-9]+

comes to our rescue. Plus means more than one occurrence of the character or pattern in front of it. In our case, more than one numerals.

[0-9]+

的加号(

+

)可

[0-9]+

我们一臂之力。 加号表示在其前面多次出现字符或图案。 在我们的例子中,有多个数字。

It also fails the match for our last case

The answer is 42

because, there are no numerals at the start of the string.

它也无法匹配上一个情况

The answer is 42

因为字符串的开头没有数字。

练习模式 (Practice Patterns)

  • Can you try to write a pattern for hexadecimal numbers (consisting of numerals 0–9 and letters a-f, with an optional # in front)?

    您是否可以尝试编写一个十六进制数字(由数字0–9和字母af组成,前面带有可选的#)组成的模式?

  • How about a binary number? Can you test if a string is full of just 0 and 1?

    二进制数呢? 您可以测试一个字符串是否仅包含0和1吗?

那戏剧性的结局 (That Dramatic End)

Oh, I almost forgot.

[0-9]

stands for any of the numeric character set and also has a shorthand version

\d

.

哦,我差点忘了。

[0-9]

代表任何数字字符集,并且还具有速记版本

\d

let e=/^\d+$/; e.test("4"); //true e.test("42"); //true e.test("The answer 42"); //false - Hurray
           

Just two characters denoting numerals. And No, it doesn’t get any shorter than that.

只有两个字符表示数字。 不,没有比这更短的了。

There are a whole bunch of such special patterns to specify clusters such as numbers (

\d

), alpha numeric characters (

\w

), white spaces (

\s

).

有很多这样的特殊模式来指定簇,例如数字(

\d

),字母数字字符(

\w

),空格(

\s

)。

评论 (Review)

  • [123]

    [123]

    The expression within square brackets are a character set

    方括号中的表达式是一个字符集

    Any one of the characters match will pass the test. Just ONE character.

    匹配的任何字符都将通过测试。 只需一个字符。

  • [0-9]

    [0-9]

    Looks for a single numeric digit between 0 to 9

    查找介于0到9之间的单个数字

  • [0-5]

    [0-5]

    Looks for a single numeric digit between 0 to 5

    查找介于0到5之间的单个数字

  • [a-z]

    [az]

    Looks for a single letter between a to z

    寻找a到z之间的单个字母

  • [A-F]

    [AF]

    Looks for a single letter between A to F

    寻找A到F之间的单个字母

  • [123]+

    [123]+

    Plus (

    加号(

    +

    ) looks for one or more occurrence of the characters within the set This one matches a “23132” sub-string that consists of 1, 2 and 3 within a larger string “abc23132”.

    +

    )查找集合中字符的一个或多个出现。该匹配一个“ 23132”子字符串,该子字符串由较大字符串“ abc23132”中的1、2和3组成。
  • |

    |

    Pipe symbol stands for

    管道符号代表

    or

    要么

  • \d

    \d

    A shorthand for numerals

    数字的简写

    Matches a single numeric digit.

    匹配一个数字。

  • \D

    \D

    A shorthand for non-numeric characters

    非数字字符的简写

    Anything other than numerals that’ll be matched by

    除数字以外的任何其他内容

    \d

    \d

3.重复匹配以查找重复项 (3. Recurrence Match to Find Duplicates)

This is the actual problem I was trying to solve. I dove deep into regular expressions, which eventually led to this post.

这是我要解决的实际问题。 我深入研究了正则表达式,最终导致了这篇文章。

You’ve been given a string. Find out if it has been infused with duplicate characters before sunset.

您已得到一个字符串。 找出日落之前是否注入了重复的字符。

Here is the solution for duplicate characters appearing immediately after an occurrence:

以下是出现重复字符后立即出现的解决方案:

let e=/(\w)\1/; 
e.test("abc"); //false 
e.test("abb"); //true
           

The expression does not match any part of the string

abc

as there are no duplicate characters in sequence. So it returns false.

该表达式与字符串

abc

任何部分都不匹配,因为顺序中没有重复的字符。 因此它返回false。

But it matches

bb

part of the string

abb

and returns true.

但是它匹配字符串

abb

bb

部分并返回true。

Go ahead, type that on your DevTool console. Look at this!

继续,在DevTool控制台上键入。 看这个!

Let’s break it down to understandable pieces.

让我们将其分解为可以理解的部分。

反斜杠\释放 (Backslash \ Unleashed)

I’ve been a little quiet about the backslash that was introduced in the last section. To those who have been there and done that, it may not have been a surprise. They might have escaped the confusion. But if you are new to programming world, you need to know more about backslash.

我对上一节中介绍的反斜杠一直保持沉默。 对于那些去过那里并做到这一点的人来说 ,这并不奇怪。 他们可能已经摆脱了困惑。 但是,如果您不熟悉编程领域,则需要了解有关反斜杠的更多信息。

In the regular expression language, backslash is special. The backslash alters the meaning of the characters that follow them. Ring a bell?

在正则表达式语言中,反斜杠是特殊的。 反斜杠改变了其后的字符的含义。 按门铃?

What do you call

\n

when you encounter it in a string? Yes, a new line. We’ve got something similar here.

当您在字符串中遇到

\n

时,您将其称为什么? 是的,换行了。 我们这里有类似的东西。

In fact,

\n

is what you use as a pattern if you want to look for a new line. That’s called

escaping

the usual meaning of

n

and giving it a whole new attire called

new line

.

实际上,如果要查找新行,则

\n

用作模式。 这被称为

escaping

n

的通常含义,并赋予其全新的着装,称为

new line

  • \d

    \d

    A shorthand for numerals

    数字的简写

    Matches a single numeric digit

    匹配一个数字

  • \D

    \D

    A shorthand for non-numeric characters

    非数字字符的简写

    Anything other than numerals that’ll be matched by

    除数字以外的任何其他内容

    \d

    \d

  • \s

    \s

    Shorthand for single white space character such as space, new line or tab.

    单个空格字符(例如空格,换行或制表符)的简写。

  • \S

    Antonym of

    \s

    \S

    \s

    反义词

    anything other than white space

    除空白以外的任何东西

  • \w

    \w

    Shorthand for alpha-numeric character

    字母数字字符的简写

    Matches a-z, A-Z, 0–9 and underscore _.

    匹配az,AZ,0–9和下划线_。

  • \W

    \W

    Antonym of

    的反义词

    \w

    \w

可召回的比赛 (Recallable Matches)

We started this section with the solution for finding duplicate characters.

/(\w)\1/

matched

"abb"

. That shows use of memory and recall within regular expressions.

我们从寻找重复字符的解决方案开始本节。

/(\w)\1/

匹配了

"abb"

。 这表明在正则表达式中使用了内存和调用。

Consider the use of brackets in this format

(expression)

. The resulting string that matches the expression within a bracket is remembered for later use.

考虑使用这种格式的括号

(expression)

。 与括号内的表达式匹配的结果字符串将被记住以备后用。

\1

remembers and uses the match from first expression that is within brackets. Likewise,

\2

from second set of brackets. And so on.

\1

记住并使用括号中第一个表达式的匹配项。 同样,第二组括号中的

\2

。 等等。

Let’s translate our expression

(\w)\1

to plain English:

让我们将表达式

(\w)\1

为简单的英语:

Match any alpha-numeric character on a given string. Remember it as

\1

. Check if that character appears right next to the first occurrence.

匹配给定字符串上的任何字母数字字符。 记住它为

\1

。 检查该字符是否出现在第一个字符的旁边。

扩展1 —反向对 (Extension 1 — Reverse Pairs)

Let’s say we want to find two characters appearing in reverse order right next to each other. That is like

abba

.

ab

is reversed as

ba

and is right next to each other.

假设我们要查找两个字符以相反的顺序彼此相邻出现。 那就像

abba

ab

反转为

ba

并且彼此相邻。

Here is the expression:

这是表达式:

let e=/(\w)(\w)\2\1/; 
e.test("aabb"); //false 
e.test("abba"); //true 
e.test("abab"); //false
           

The first

(\w)

matches

a

and remembers it as

\1

. The second

(\w)

matches

b

and remembers it as

\2

. Then the expression expects

\2

to occur first followed by

\1

. Hence,

abba

is the only string that matches the expression.

第一个

(\w)

匹配

a

并将其记为

\1

。 第二个

(\w)

匹配

b

并将其记为

\2

。 然后该表达式期望

\2

首先出现,然后是

\1

。 因此,

abba

是唯一与表达式匹配的字符串。

扩展2-无重复 (Extension 2 — No duplicates)

This time, we are going to look at sequence of characters with no duplicates. No character should be followed by the same character. Plain and simple.

这次,我们将研究没有重复的字符序列。 任何字符都不能跟在同一字符之后。 干净利落。

Here, take a look at the solution:

在这里,看看解决方案:

let e=/^(\w)(?!\1)$/; 
e.test("a"); //true 
e.test("ab"); //false 
e.test("aa"); //false
           

Not the one we wanted, but close. The middle one shouldn’t be false. But we threw in a few more symbols that need explaining. That means confronting the most powerful musketeer once again.

不是我们想要的,而是接近的。 中间不应该是错误的。 但是,我们又添加了一些需要说明的符号。 这意味着再次面对最强大的火枪手。

返回问号 (Return of the Question Mark)

Remember the three musketeers we met earlier. The humble question mark is actually the most powerful manipulator that can get other symbols to do his bidding. That is, if you take the backslash for granted.

记住我们先前遇到的三个火枪手。 谦虚的问号实际上是最强大的操纵器 ,可以使其他符号进行竞标。 也就是说,如果您将反斜杠视为理所当然的话。

A combination of brackets, question mark and exclamation mark

(?!)

, is called a look ahead. A negative look ahead to be precise.

a(?!b)

matches

a

only if it is not followed by

b

.

方括号,问号和感叹号

(?!)

的组合称为“ 向前看” 。 确切地说,负面展望。

a(?!b)

匹配

a

只有当它后面没有

b

Across JavaScript ecosystem, the exclamation mark means not. But its cousin CSS takes a u-turn and

!important

means it is actually very important and should not be overridden. I almost scrolled past Chen’s tweet thinking it is marked not important. I digress.

在整个JavaScript生态系统中,感叹号不是 。 但是它的表亲CSS掉头了,

!important

意味着它实际上非常重要,不应该被覆盖。 我差点翻过Chen的推文,以为这并不重要。 我离题了。

On the other hand,

(?=)

is a positive look ahead.

a(?=b)

matches

a

only if it is followed by

b

.

另一方面,

(?=)

是正面的展望 。

a(?=b)

匹配

a

仅当随后

b

We had a solution.

(\w)(?!\1)

looks for a character without recurrence. But only for one character. We need to group it and look for 1 or more occurrences of characters with the use of plus (

+

). That’s all.

我们有一个解决方案。

(\w)(?!\1)

查找没有重复的字符。 但仅适用于一个字符。 我们需要对其进行分组,并使用加号(

+

)查找1个或多个出现的字符。 就这样。

let e=/^((\w)(?!\1))+$/; 
e.test("madam"); //false 
e.test("maam"); //false
           

But it doesn’t seem to be working. If we group the pattern within plain brackets like

((\w)(?!\1))

, the

\1

does not represent

(\w)

, it represents higher level bracket pair that groups the pattern. So it fails.

但这似乎不起作用。 如果我们将模式分组在

((\w)(?!\1))

之类的方括号内,则

\1

不代表

(\w)

,它表示对模式进行分组的高级括号对。 所以失败了。

What we need is a forgetful grouping option. That’s where the question mark,

?

, strikes back. It pairs with a colon,

(?:)

and wipes out any function of memory that the brackets can bring in.

我们需要的是一个健忘的分组选项。 那是问号

?

,反击。 它与冒号

(?:)

配对,并擦除括号可以带入的所有存储功能。

One last time:

最后一次:

let e=/^(?:(\w)(?!\1))+$/; 
e.test("madam"); //true 
e.test("maam"); //false
           

This time, the first level of brackets are not remembered, thanks to

?:

, hence,

\1

remembers the match returned by

\w

.

这次,由于

?:

,第一级括号不被记住,因此

\1

记住

\w

返回的匹配项。

It helps us use the plus

+

against the overall grouping to find similar pairs of characters start to end, which works like magic.

它有助于我们对整个分组使用加号

+

来查找相似的成对字符,从始至终,就像魔术一样。

In English, “Look for a character. Look ahead to ensure it is not followed by the same character. Do this from start to end for all characters.”

用英语,“寻找字符。 向前看以确保它后面没有相同的字符。 从头到尾对所有角色执行此操作。”

评论 (Review)

  • \w

    represents all the alpha-numeric characters

    \w

    代表所有字母数字字符

    If you capitalize ‘w’ and use

    如果将“ w”大写并使用

    \W'

    , that would mean all characters other than alpha-numeric

    \W'

    ,表示除字母数字外的所有字符
  • ( )

    ( )

    The expression within a bracket is remembered for later use

    括号内的表达式会被记住以备后用

  • \1

    remembers and uses the match from first expression that is within brackets

    \1

    记住并使用括号内第一个表达式的匹配项

    \2

    from second set of brackets. And so on.

    第二套括号中的

    \2

    。 等等。
  • a(?!b)

    a(?!b)

    A combination of brackets, question mark and exclamation mark (

    括号,问号和感叹号的组合(

    ?!

    ), is called a look ahead

    ?!

    ),称为前瞻

    This matches

    这个匹配

    a

    only if it is not followed by

    b

    a

    只有当它后面没有

    b

  • a(?=b)

    a(?=b)

    The other side of the coin

    硬币的另一面

    Match

    比赛

    a

    only if it is followed by

    b

    .

    (?:a)

    a

    只有它后跟

    b

    (?:a)

    a

    only if it is followed by

    b

    .

    (?:a)

    Forgetful grouping

    a

    只有它后跟

    b

    (?:a)

    健忘的分组

    Look for

    寻找

    a

    but don’t remember it

    a

    但不记得了

    You can’t use

    你不能用

    \1

    pattern to reuse this match

    \1

    模式以重用此匹配项

4.交替顺序 (4. Alternating Sequence)

The usecase is simple. Match a string that uses only two characters. Those two characters should alternate throughout the length of the string. Two sample tests for “abab” and “xyxyx” will do.

用例很简单。 匹配仅使用两个字符的字符串。 这两个字符应在字符串的整个长度上交替。 将对“ abab”和“ xyxyx”进行两个样本测试。

It wasn’t easy. I got it wrong on several attempts. This answer directed me down the right street.

这并不容易。 我几次尝试都弄错了。 这个答案将我引向正确的街道。

Here is the solution:

解决方法如下:

let e=/^(\S)(?!\1)(\S)(\1\2)*$/; 
e.test("abab"); //true 
e.test("$#$#"); //true 
e.test("#$%"); //false 
e.test("$ $ "); //false 
e.test("xyxyx"); //false
           

This is where you say, “I’ve had enough!” and throw in the towel.

您在这里说:“我受够了!” 然后扔毛巾。

But wait for the Aha moment! You are 3 feet away from the gold ore, not the right time to stop digging.

但是,请等待Aha时刻! 您距离金矿3英尺,而不是停止挖掘的正确时间。

Let’s first make sense out of results before we arrive at ‘how?’

abab

matches.

$#$#

matches, this is no different from

abab

.

让我们首先从结果中弄清楚,然后再得出“ 如何? '

abab

比赛。

$#$#

匹配,这与

abab

没什么不同。

#$%

fails as there is a third character.

$ $

fails though they are pairs, because space is excluded in our pattern.

#$%

由于第三个字符而失败。

$ $

尽管是成对的,但失败了,因为在我们的模式中排除了空格。

All is well except,

xyxyx

fails, because our pattern doesn’t know how to handle that last x. We’ll get there.

一切都很好,除了

xyxyx

失败了,因为我们的模式不知道如何处理最后一个x。 我们到达那里。

Let’s take a look at the tools added to our belt. It’ll start to make sense soon.

让我们来看看添加到皮带上的工具。 它将很快变得有意义。

一次一件 (One piece at a time)

You already know most of the pieces.

\S

is the opposite of

\s

.

\S

looks for non white space characters.

您已经知道大部分内容。

\S

\s

相反。

\S

查找非空格字符。

Now comes the plain English version of

/^(\S)(?!\1)(\S)(\1\2)*$/

.

现在是

/^(\S)(?!\1)(\S)(\1\2)*$/

的普通英语版本。

  • Start from the start

    /^

    从头开始

    /^

  • Look for a non-white space character

    (\S)

    查找非空格字符

    (\S)

  • Remember it as

    \1

    记住为

    \1

  • Look ahead and see if the first character is not followed by the same character

    (?!\1)

    .

    向前看,看是否第一个字符后没有相同的字符

    (?!\1)

    Remember this is a

    记住这是一个

    negative look ahead.

    负面展望 。

  • If we are good so far, look for another character

    (\S)

    如果到目前为止我们还不错,请寻找另一个字符

    (\S)

  • Remember it as

    \2

    记住为

    \2

  • Then look for 0 or more pairs of first two matches

    (\1\2)*

    然后寻找0对或更多对前两个匹配项

    (\1\2)*

  • Look for such pattern until end of the string

    $/

    在字符串

    $/

    末尾查找这种模式

Apply that to our test cases.

"abab"

and

"$#$#"

match.

将其应用于我们的测试用例。

"abab"

"$#$#"

匹配。

尾部 (Tail End)

After looking at the solution you may think this does not demand a separate section. But the simplicity of it is elegant. Let’s fix that one failing case

xyxyx

. As we’ve seen, the last trailing x is the problem. We have a solution for

xyxy

. All we need is a pattern to say “Look for an optional occurrence of first character”.

查看解决方案后,您可能会认为这不需要单独的部分。 但是它的简单是优雅的。 让我们修复一个失败的案例

xyxyx

。 如我们所见,最后一个x是问题所在。 我们有一个

xyxy

解决方案。 我们所需要的只是一个模式,说“寻找可选的第一个字符”。

As usual, let’s start with the solution.

和往常一样,让我们​​从解决方案开始。

let e=/^(\S)(?!\1)(\S)(\1\2)*\1?$/; e.test("xyxyx"); //true e.test("$#$#$"); //true
           

The question mark strikes again. There is no escaping him. It’s better we make him our ally than our enemy. A question mark

?

after a character or pattern means 0 or 1 match for the preceding pattern. It is non-greedy in gobbling up characters.

问号再次响起。 没有逃脱他。 我们最好让他成为我们的盟友,而不是我们的敌人。 问号

?

在一个字符或模式之后,表示与前面的模式匹配为0或1。 贪吃角色是不贪心的。

In our case,

\1?

means, 0 or 1 match of the first character remembered through first set of brackets.

在我们的例子中,

\1?

表示通过第一组括号记住的第一个字符的0或1个匹配项。

Easy. Relax.

简单。 放松。

评论 (Review)

  • \S

    \S

    Represents all characters excluding white space such as a space and new lines

    表示所有字符,不包括空格,例如空格和换行符

    Note that it is capital S

    请注意,它是大写S

  • a*

    a*

    The asterisk or star, looks for 0 or more occurrences of the preceding character. In this case, it is 0 or more

    星号或星号查找0个或多个出现的前一个字符。 在这种情况下,等于或大于0

    a

    a

    Remember plus (

    记住加号(

    +

    ) which looks for 1 or more? Yeah, these guys are cousins.

    +

    )寻找1个或多个? 是的,这些家伙是堂兄。
  • a(?!b)

    a(?!b)

    This combination of brackets, question mark and exclamation mark (

    括号,问号和感叹号的组合(

    ?!

    ) is called a look ahead.

    ?!

    )称为前瞻 。

    This matches

    这个匹配

    a

    only if it is not followed by

    b

    .

    a

    只有当它后面没有

    b

    For example, it matches

    例如,它匹配

    a

    in

    aa

    ,

    ax

    ,

    a$

    but does not match

    ab

    a

    in

    aa

    ax

    a$

    但不匹配

    ab

    Though it uses bracket, it does not remember the matching character after

    尽管使用了方括号,但它不记得后面的匹配字符

    a

    .

    a

  • \s

    \s

    Small caps

    小帽子

    s

    matches a single white space character such as a space or new line.

    s

    匹配单个空格字符,例如空格或换行符。
  • a(?=b)

    a(?=b)

    This matches

    这个匹配

    a

    that is followed by

    b

    .

    a

    之后是

    b

  • ^ab*$

    ^ab*$

    You may think this translates to 0 or more occurrences of

    您可能会认为这意味着0次或多次出现

    ab

    , but it matches

    a

    followed by 0 or more

    b

    ab

    ,但它匹配

    a

    后跟0或更大的

    b

    For example: This matches

    例如:此匹配

    abbb

    ,

    a

    and

    ab

    , but does not match

    abab

    abbb

    a

    ab

    ,但不匹配

    abab

  • ^(ab)*$

    ^(ab)*$

    This matches 0 or more pairs of

    匹配0个或更多对

    ab

    ab

    That means it will match empty string

    这意味着它将匹配空字符串

    ""

    ,

    ab

    and

    abab

    , but not

    abb

    ""

    ab

    abab

    ,但不是

    abb

  • a?

    a?

    a?

    ?

    matches 0 or 1 occurrence of preceding character or pattern

    a?

    ?

    匹配0或1个出现的先前字符或模式

    a?

    ?

    matches 0 or 1 occurrence of preceding character or pattern

    \1?

    matches 0 or 1 recurrence of first remembered match

    a?

    ?

    匹配0或1个在前字符或模式

    \1?

    匹配第一次记忆的0或1次重​​复

5.匹配一个电子邮件地址 (5. Match an email address)

生产警告 (Warning for Production)

Regular expressions alone may not help validate emails. Some would even argue that regular expressions should not be used as it can never match 100% of the emails.

仅正则表达式可能无法帮助验证电子邮件。 甚至有人认为,不应使用正则表达式,因为它永远无法匹配100%的电子邮件。

Think about all the fancy domain names popping up. Also consider inclusion of symbols within email addresses, such as dot (.) and plus (+).

考虑一下所有弹出的精美域名。 还应考虑在电子邮件地址中包含符号,例如点(。)和加号(+)。

You need to validate email twice. Once on the client side to help users avoid misspelled addresses. Start with a semantic input tag type

<input type='emai

l'>. Some of the browsers automatically validate it without any extra scripting on the front end.

您需要两次验证电子邮件。 一旦在客户端,可以帮助用户避免拼写错误的地址。 从语义输入标签类型

<input type='emai

l'>开始。 一些浏览器会自动验证它,而无需在前端添加任何脚本。

Validate it once again on the server by actually sending a confirmation email.

通过实际发送确认电子邮件,再次在服务器上对其进行验证。

Haven’t you seen one such lately? Just try to subscribe to this pineboat. You’ll get an actual email asking you to confirm that it is yours. That confirmation is a solid proof that your email is valid.

你最近没见过这样的人吗? 只需尝试订阅此pineboat 。 您会收到一封实际的电子邮件,要求您确认是否属于您。 该确认充分证明您的电子邮件有效。

That was smooth sailing, wasn’t it?

那是一帆风顺的,不是吗?

电子邮件正则表达式 (RegEx for Email)

Now that we added the disclaimer, you’d actually want to see a pattern right? No, search for regular expression for an email address. One such result from perl module goes for more than a page.

现在我们添加了免责声明,您实际上想看到一种模式吗? 否,搜索电子邮件地址的正则表达式。 来自perl模块的一个这样的结果花费了不止一页。

So, I am not even going to attempt it. Such long regular expressions are generated by computers through pattern builders. Not for mere mortals like us.

因此,我什至不会尝试。 如此长的正则表达式是计算机通过模式构建器生成的。 不是像我们这样的凡人。

6.匹配强密码 (6. Match a Strong Password)

If you are a coffee person, this is the right time to get a strong one. Because we are at last section of this post, but the longest one so far.

如果您是个咖啡人,那么这是获得健康的最佳时机。 因为我们在这篇文章的最后一节,但到目前为止是最长的一节。

It introduces very few new operators and patterns. But it reuses many patterns. As usual, we reserve the shortest optimized one for last.

它引入了很少的新运算符和模式。 但是它重用了许多模式。 与往常一样,我们保留最短的最优化的最后一个。

The ASCII range is the best part of this post. Because, I learned it while researching for this post.

ASCII范围是本文的最佳部分。 因为,我在研究这篇文章的同时学习了它。

Now, the problem. Remember that registration form that took several attempts before you could meet their strong password requirements? Weak, good, strong, and very strong? Yeah, we are going to build that validation.

现在,问题了。 还记得注册表格在满足您的强密码要求之前进行了几次尝试吗? 弱,好,强和非常强? 是的,我们将建立该验证。

The password should:

密码应为:

  • have a minimum of 4 characters

    至少4个字符

  • contain lowercase

    包含小写

  • contain uppercase

    包含大写

  • contain a number

    包含一个数字

  • contain a symbol

    包含符号

This is a tricky one. Once you start consuming letters, you can’t come back to check if they meet any other condition. There in lies our clue. We can’t look back, but we can look ahead!

这是一个棘手的问题。 一旦开始使用字母,就无法再检查它们是否满足其他条件。 我们的线索就在这里。 我们不能回头,但是我们可以向前看!

字符串长度 (Length of the string)

Let’s first test if the string password is 4 characters long. Pretty simple. Use

.length

on the password string. Done, right? No, who needs a simple solution? Let’s spice it up.

让我们首先测试字符串密码是否为4个字符长。 很简单 在密码字符串上使用

.length

。 完成了吧? 不,谁需要一个简单的解决方案? 让我们加香料。

//expression with just lookahead
//wouldn't consume any character
e1=/^(?=.{4,})$/; 
e1.test("abc") //false
e1.test("abcd") //false  

//after lookahead, 
//pattern to consume character is needed.
e2=/^(?=.{4,}).*$/; 
e2.test("abc") //false 
e2.test("abcd") //true
           
  • You may remember

    (?=)

    from our previous work on “no duplicates” That’s a look ahead use

    您可能还记得我们以前关于“无重复”的文章中的

    (?=)

    It does not consume any character

    它不消耗任何字符

  • The dot (

    .

    ) is an interesting character

    点(

    .

    )是一个有趣的字符

    It means,

    它的意思是,

    any character.

    任何字符 。

  • {4,}

    {4,}

    Stands for at least 4 preceding characters with no maximum limit

    代表至少4个前面的字符,没有最大限制

  • \d{4}

    \d{4}

    Would look for exactly 4 numerals

    会寻找正好4个数字

  • \w{4,20}

    \w{4,20}

    Would look for 4 to 20 alpha-numeric characters

    将寻找4到20个字母数字字符

Let’s translate

/^(?=.{4,})$/

. “Start from the beginning of the string. Look ahead for at least 4 characters. Don’t remember the match. Come back to the beginning and check if the string ends there.”

让我们翻译

/^(?=.{4,})$/

。 “从字符串的开头开始。 至少要输入4个字符。 不记得比赛了。 回到开头,检查字符串是否在此处结束。”

Doesn’t sound right. Does it? At least the last bit.

听起来不对。 可以? 至少是最后一点。

Which is why we brought in the variation

/^(?=.{4,}).*$/

. An extra dot and a star. It reads like this, “Start from the beginning. Look ahead for 4 characters. Don’t remember the match. Come back to the beginning. Consume all the characters using

.*

and see if you reach the end of the string.”

这就是为什么我们引入变体

/^(?=.{4,}).*$/

。 一个额外的点和一个星星。 内容如下:“从头开始。 期待4个字符。 不记得比赛了。 回到开始。 使用

.*

消耗所有字符,然后查看是否到达字符串的末尾。”

This makes sense now. Doesn’t it?

现在这很有意义。 是不是

Which is why

abc

fails and

abcd

passes the pattern.

这就是

abc

失败而

abcd

通过模式的原因。

至少一个号码 (At least One Number)

This is going to be easy.

这将很容易。

e=/^(?=.*\d+).*$/ 
e.test(""); //false 
e.test("a"); //false 
e.test("8"); //true 
e.test("a8b"); //true 
e.test("ab890"); //true
           

Start from the beginning of the string

/^

. Look ahead for 0 or more characters

?=.*

. Check if 1 or more numbers follow

\d+

. Once it matches, come back to the beginning (because we were in look ahead). Consume all the characters in the string until end of the string

.*$/

.

从字符串

/^

的开头开始。 提前寻找0个或更多字符

?=.*

。 检查

\d+

是否有1个或多个数字。 匹配之后,请重新开始(因为我们处于领先地位)。 消耗字符串中的所有字符,直到字符串

.*$/

为止。

至少一个小写字母 (At Least One Lowercase Letter)

This one follows the same patter as above.

这个遵循与上面相同的模式。

e=/^(?=.*[a-z]+).*$/; 
e.test(""); //false 
e.test("A"); //false 
e.test("a"); //true
           

Translation? Sure. “Start from the… okay.” Instead of

\d+

, we have

[a-z]+

which is a character set of letters from

a

to

z

.

翻译? 当然。 “从……开始吧。” 代替

\d+

,我们有

[az]+

,它是从

a

z

的字母字符集。

至少有一个大写字母 (At least One Uppercase Letter)

Let’s not overkill.

[A-Z]

instead of

[a-z]

from the previous section will do.

让我们不要过度杀伤力。

[AZ]

代替上一节中的

[az]

至少一个符号 (At least One Symbol)

This is going to be challenging. One way to match symbols is to place a list of symbols in a character set.

/^(?=.*[-+=_)(\*&\^%\$#@!~”’:;|\}]{[/?.>,<]+).*$/.test

(“$”) That’s all the symbols in a character set. Properly escaped where necessary. It’ll take months for me to write it in plain English.

这将是具有挑战性的。 匹配符号的一种方法是将符号列表放置在字符集中。

/^(?=.*[-+=_)(\*&\^%\$#@!~”':;|\}]{[/?.>,<]+).*$/.test

(“ $”)这就是字符集中的所有符号。 必要时适当逃脱。 我用普通的英语写它可能要花几个月的时间。

So to save all of us from eternal pain, here is a simple one:

因此,为了使我们所有人摆脱永恒的痛苦,这里有一个简单的例子:

//considers space as symbol 
let e1; 
e1=/^(?=.*[^a-zA-Z0-9])[ -~]+$/ 
e1.test("_"); //true 
e1.test(" "); //true  

//does not take space 
let e2; 
e2=/^(?=.*[^a-zA-Z0-9])[!-~]+$/ 
e2.test(" "); //false 
e2.test("_"); //true  

//the underscore exception 
let e3; 
e3=/^(?=.*[\W])[!-~]+$/ 
e3.test("_"); //false
           

Wait, what’s that

^

coming again from the middle of no where? If you have reached this far, this is where you realize that unassuming innocent

^

that marks start of a string is a double agent. Which means, the end is not too far. He has been exposed.

等等,那又是什么

^

又从无处传来呢? 如果到此为止,您将在这里意识到,假设无辜

^

表示字符串的开始是双重代理。 这意味着结束不是太远。 他被暴露了。

Within a character set,

^

negates the character set. That is,

[^a-z]

means, any character other than

a

to

z

.

在字符集中,

^

取反字符集。 也就是说,

[^az]

表示

a

z

之外

a

任何字符。

[^a-zA-Z0-9]

then stands for any character other than lower case alphabets, upper case alphabets, and numerals.

[^a-zA-Z0-9]

代表除小写字母,大写字母和数字之外的任何字符。

We could have used

\W

instead of the long character set. But

\W

stands for all alpha-numeric characters including underscore _. As you can see in the third set of examples above, that will not accept underscore as a valid symbol.

我们可以使用

\W

代替长字符集。 但是

\W

代表所有字母数字字符, 包括下划线_。 如您在上面的第三组示例中所看到的那样,它将不会接受下划线作为有效符号。

字符集范围 (CharSet Range)

The curious case of

[!-~]

. They stand next to each other in the keyboard, but their ASCII values are diagonally opposite.

[!-~]

的奇怪情况。 它们在键盘中彼此相邻,但是其ASCII值对角相反。

Remember a-z? A-Z? 0–9? These are not constants. They are actually based on the ASCII range of their values.

还记得az吗? AZ? 0-9? 这些不是常数。 它们实际上是基于其值的ASCII范围。

The ASCII table has 125 characters. zero (0) to 31 are not relevant to us. Space starts from 32 going all the way up to 126 which is tilda(~). The exclamation mark is 33.

ASCII表包含125个字符。 零(0)到31与我们无关。 空格从32开始一直到126,即tilda(〜)。 感叹号是33。

So

[!-~]

covers all the symbols, letters and numbers we need. The seed for this idea came from another solution to the symbol problem.

因此

[!-~]

涵盖了我们需要的所有符号,字母和数字。 这个想法的种子来自于符号问题的另一种解决方案 。

组装部队 (Assemble the Troops)

Bringing it all together, we get this nice looking piece of regular expression

/^(?=.{5,})(?=.*[a-z]+)(?=.*\d+)(?=.*[A-Z]+)(?=.*[^\w])[ -~]+$/

.

/^(?=.{5,})(?=.*[az]+)(?=.*\d+)(?=.*[AZ]+)(?=.*[^\w])[ -~]+$/

,我们得到了一个漂亮的正则表达式

/^(?=.{5,})(?=.*[az]+)(?=.*\d+)(?=.*[AZ]+)(?=.*[^\w])[ -~]+$/

That’s starting to haunt and intimidate us. Though we’ve been studying them individually.

那开始困扰着我们。 尽管我们一直在单独研究它们。

This is where the syntax for dynamically building expression object comes in handy. We are going to build each piece separately and assemble them later.

这是用于动态构建表达式对象的语法派上用场的地方。 我们将分别制造每个零件,然后再组装它们。

//start with prefix 
let p = "^"; 

//look ahead  
// min 4 chars 
p += "(?=.{4,})"; 
// lower case 
p += "(?=.*[a-z]+)"; 
// upper case 
p += "(?=.*[A-Z]+)"; 
// numbers 
p += "(?=.*\\d+)"; 
// symbols 
p += "(?=.*[^ a-zA-Z0-9]+)"; 
//end of lookaheads  

//final consumption 
p += "[ -~]+";  
//suffix 
p += "$"; 

//Construct RegEx 
let e = new RegEx(p); 
// tests 
e.test("aB0#"); //true  
e.test(""); //false 
e.test("aB0"); //false 
e.test("ab0#"); //false 
e.test("AB0#"); //false 
e.test("aB00"); //false 
e.test("aB!!"); //false  

// space is in our control 
e.test("aB 0"); //false 
e.test("aB 0!"); //true
           

If your eyes are not tired yet, you’d have noticed two strange syntax in the above code.

如果您还不疲倦,您会在上面的代码中注意到两种奇怪的语法。

  • One, we didn’t use

    /^

    , instead we used just

    ^

    . We didn’t use

    $/

    to end the expression either, instead just

    $

    .

    一,我们没有使用

    /^

    ,而是只使用了

    ^

    。 我们也没有使用

    $/

    来结束表达式,而是使用

    $

    The reason is that the

    原因是

    RegEx

    constructor automatically adds starting and trailing slashes for us.

    RegEx

    构造函数会自动为我们添加开始和结尾斜杠。
  • Two, to match numbers we used

    \\d

    instead of the usual

    \d

    . This is because the variable

    p

    is just a normal string within double quotes. To insert a backslash, you need to escape the backslash itself.

    第二,为了匹配数字,我们使用

    \\d

    而不是通常的

    \d

    。 这是因为变量

    p

    只是双引号内的普通字符串。 要插入反斜杠,您需要对反斜杠本身进行转义。

    Two, to match numbers we used

    \\d

    instead of the usual

    \d

    . This is because the variable

    p

    is just a normal string within double quotes. To insert a backslash, you need to escape the backslash itself.

    \\d

    resolves to

    \d

    within the

    RegEx

    constructor

    第二,为了匹配数字,我们使用

    \\d

    而不是通常的

    \d

    。 这是因为变量

    p

    只是双引号内的普通字符串。 要插入反斜杠,您需要对反斜杠本身进行转义。

    \\d

    RegEx

    构造函数中解析为

    \d

Apparently, there should be server side validations for passwords too. Think about SQL injection vulnerabilities if your framework or language doesn’t handle it already.

显然,服务器端也应该对密码进行验证。 如果您的框架或语言尚未处理SQL注入漏洞,请考虑一下。

7.结论 (7. Conclusion)

That brings us to the end of the story. But this is the beginning of a journey.

这把我们带到了故事的结尾。 但这是旅程的开始。

We just scratched the pattern matching portion of RegEx with

test

method.

exec

method builds on this foundation to return matched sub-strings based on pattern.

我们只是用

test

方法刮过了RegEx的模式匹配部分。

exec

方法在此基础上构建,以基于模式返回匹配的子字符串。

String object has methods such as

match

,

search

,

replace

, and

split

that widely uses regular expressions.

字符串对象具有

match

search

replace

split

广泛使用正则表达式的方法。

Hope this sets you off to explore those capabilities further with a solid understanding on composing patterns for RegEx.

希望这使您能够对RegEx的组成模式有扎实的了解,从而进一步探索这些功能。

8.号召性用语 (8. Call To Action)

No, after all this difficulty we’ve been through, I am not going to ask you to subscribe.

不,在经历了所有困难之后,我不再要求您订阅。

Just make good software.

只要制作出优质的软件即可。

If any code blocks presented here do not work, leave a comment on this github issue I created specially for this post.

如果此处提供的任何代码块均不起作用,请在我为此帖子专门创建的github问题上发表评论。

Hope it was useful! Share it if others would benefit.

希望它有用! 如果其他人将从中受益,请分享。

You’ve been wonderful. Appreciate your time. This content is far long by recent standards. Thanks for reading.

你真棒。 感谢您的时间。 根据最近的标准,此内容已久。 谢谢阅读。

Originally published at www.pineboat.in.

最初在www.pineboat.in上发布。

翻译自: https://www.freecodecamp.org/news/regular-expressions-demystified-regex-isnt-as-hard-as-it-looks-617b55cf787/

regex 正则表达式