《正则表达式转NFA》NFA是捏着正则式去比文本,吃掉一个字符,就把它跟正则式比较,匹配就记下来:“某年某月某日在某处匹配上了!”,然后接着往下干。一旦不匹配,就把刚吃的这个字符吐出来,一个个的吐,直到回到上一次匹配的地方。
而DFA捏着文本串去比较正则式,看到一个子正则式,就把可能的匹配串全标注出来,然后再看正则式的下一个部分,根据新的匹配结果更新标注。
例如用正则式/perl|perlman/来匹配文本 ‘perlman book’。如果是NFA,则以正则式为导向,手里捏着正则式,眼睛看着文本,一个字符一个字符的吃,吃完 ‘perl’ 以后,跟第一个子正则式/perl/已经匹配上了,于是记录在案,往下再看,吃进一个 ‘m’,这下糟了,跟子式/perl/不匹配了,于是把m吐出来,向上汇报说成功匹配 ‘perl’,不再关心其他,也不尝试后面那个子正则式/perlman/,自然也就看不到那个更好的答案了。
如果是DFA,它是以文本为导向,手里捏着文本,眼睛看着正则式,一口一口的吃。吃到/p/,就在手里的 ‘p’ 上打一个钩,记上一笔,说这个字符已经匹配上了,然后往下吃。当看到 /perl/ 之后,DFA不会停,会尝试再吃一口。这时候,第一个子正则式已经山穷水尽了,没得吃了,于是就甩掉它,去吃第二个子正则式的/m/。这一吃好了,因为又匹配上了,于是接着往下吃。直到把正则式吃完,心满意足往上报告说成功匹配了 ‘perlman’。
确定型有穷自动机
确定型有穷自动机是不确定有穷自动机中的一个特例,其中:
- 没有输出ε之上的转换动作。
- 对每个状态s和每个输入符号a,有且只有一条标号为a的边离开s
确定型有穷自动机的转换
![](https://img.laitimes.com/img/__Qf2AjLwojIjJCLyojI0JCLiAzNfRHLGZkRGZkRfJ3bs92YsYTMfVmepNHL90zZjxmWzImerNDTwYVbiVHNHpleO1GTulzRilWO5xkNNh0YwIFSh9Fd4VGdsATMfd3bkFGazxyaHRGcWdUYuVzVa9GczoVdG1mWfVGc5RHLrJXYtJXZ0F2dvwVZnFWbp1zczV2YvJHctM3cv1Ce-cmbw5SM3MzM0ETM4ETOwkTMwIzLc52YucWbp5GZzNmLn9Gbi1yZtl2Lc9CX6MHc0RHaiojIsJye.png)
- ε-closure的操作就是去ε
- move的操作就是对路径的归并
- Dstates保存着新节点[初始化是ε-closure(s0)]状态
- 发现Dstates还有末被标记的节点进入循环并打上标记
- 对新节点T(T=ε-closure(s))集合进行移边操作和去ε得到新集合U
- 如果发现U没在Dstates中,加入并不打标记
- 创建一条新边Dtran[T,a]
求出ε-closure(s0)]={0,1,3,4,5,6,7,9}并创建表格
I | Ia | Ib |
---|---|---|
{0,1,3,4,5,6,7,9} |
对基移a,b操作
移a或b集合操作的时候得到的新集合如果不第一列中,则新开一列标记上,并再对新集合移a或b.如此循环 。
新的终态的判断方法就是包含原来终态的集合就为终态,例如此题原来终态为9,所以包含9的集合就为终态,[双圈代表终态]
新的初态就是包含原来初态的集合就为初态,例如此题原来初态为0,所以包含0的集合就为初态
将DFA最小化
对于同一个语言,可以存在多个识别此语言的DFA,所以,求出DFA后,通常我们还需要对DFA进行简化操作,求出最简DFA。
简化的本质是合并性质相同的状态,以减少整个图的大小。
无关状态
(1)多于状态:对于一个状态Si,若从开始状态出发,不可能到达该状态Si,则Si为多余(无用)状态。
(2)死状态:对于一个状态Si,对于任意输入符号a,若转到它本身后,不可能从它到达终止状态,则称Si为死状态。
等价状态:若Si为自动机的一个状态,我们把从s出发能导出的所有符号串的集合记为L(s)。设有两个状态s和t,若有L(s)=L(t),则称s和t是等价状态。
(1) 一致性条件:状态s和t必须同时为终态或非终态
(2) 蔓延性条件:对于所有输入符号,状态s和状态t必须转化到等价的状态里
DFA的化简算法:对于DFA M=(S,Σ,f,S0,Z)
(1)首先将DFA的状态集进行初始化,分成Π=(Z,S-Z); S为全部状态,Z为终态。
(2) 用下面的过程对Π构造新的划分Π
for (Π中每个组G) {
把G划分成小组,G中的任意两个状态Si和Sj在同一组中,
当且仅当对于Σ中任意输入符号a ,Si和Sj的a转换是到同一组中,move(Si,a) ∈Gi ,move(Sj,a) ∈Gi。
只要Si和Sj的a转换是到不同的组中,则说明Si和Sj是可区别的,可进行划分。在Π new中用刚完成的对G的划分代替原来的G。
}
(3)重复执行(2),直到Π中每个状态集不能再划分(Π new= Π)为止;
(4)合并等价状态 ,在每个G中,取任意状态作为代表,删去其它状态;
(5)删去无关状态,从其它状态到无关状态的转换都成为无定义。
①首次划分: Π0=({2,3,4,5},{0,1})
②在G={2,3,4,5}中:f(2,a)=1,f(4,a)=0(转向终态集{0,1});f(3,a)=3,f(5,a)=5(转向非终态集{2,3,4,5}),故{2,4}和{3,5}是可区别的,得Π1=({2,4},{3,5},{0,1});
③在G={2,4}中,f(2,a)=1,f(4,a)=0(转向终态子集),而f(2,b)=3,f(4,b)=5(转向非终态子集{3,5}),所以不可区别,不再进行划分;
④考察G={3,5},f(3,a)=3,f(5,a)=5(转向非终态子集{3,5}),f(3,b)=2,f(5,b)=4(转向非终态子集{2,4}), 所以不可区别,不再进行划分;
⑤考察G={0,1},f(0,a)=f(1,a)=1(转向终态集{0,1}); f(0,b)=2f,(1,b)=4(转向非终态子集{2,4}),所以不可区别,不再进行划分;
⑦进一步进行考察,可以发现每个子集都不能再划分了;
⑧消去等价状态:{0,1}用0表示,{2,4}用2表示,{3,5}用3表示,如右图所示
⑨去掉无关状态,因DFA M’中没有无关状态,所以下图即为最后结果。
JAVA代码
package com.moyao.magic;
import com.moyao.magic.SuffixAlgorithm.Operator;
import java.util.*;
import java.util.stream.Collectors;
public class NfaAlgorithm {
private static String prepareString(String _regex) {
String regex = "";
char[] regexs = _regex.replaceAll(" ", "").toCharArray();
for (int i = 0; i < regexs.length; i++) {
if (i == 0)
regex += regexs[i];
else {
if (regexs[i] == '|' || regexs[i] == '*' || regexs[i] == ')') {
regex += regexs[i];
} else {
if (regexs[i - 1] == '(' || regexs[i - 1] == '|')
regex += regexs[i];
else
regex += ("&" + regexs[i]);
}
}
}
return regex;
}
private static String E = "epsllon";
public static Graph transformNFA(String suffixRegex){
char[] rs = suffixRegex.toCharArray();
Stack<Object> operandStack = new Stack();
for (int i =0; i<rs.length; i++){
char o = rs[i];
if(!Operator.isOperator(o)){
operandStack.push(String.valueOf(o));
}else{
Operator operator = Operator.findOperator(o);
Graph graph = null;
switch (operator){
case ADD://'&'
Object av1 = operandStack.pop();
Object av2 = operandStack.pop();
graph = Graph.parse(av2);
graph.add(av1);
break;
case MULTIPLY://'*'
Object mv = operandStack.pop();
graph = Graph.parse(mv);
graph.multiply();
break;
case DIVIDE://'|'
Object dv1 = operandStack.pop();
Object dv2 = operandStack.pop();
graph = Graph.parse(dv2);
graph.divide(dv1);
break;
}
operandStack.push(graph);
}
}
return Graph.parse(operandStack.pop());
}
public static Graph transformDFA(Graph nfa){
List<String> lbs = nfa.getLabels();
List<UNode> us = new ArrayList<>();
List<UNode> doList = new ArrayList<>();
List<Edge> edges = new ArrayList<>();
UNode s = new UNode(nfa.closure(nfa.start));
UNode e = null;
doList.add(s);
us.add(s);
while(!doList.isEmpty()){
UNode u = doList.remove(0);
for (String lb : lbs){
List<Node> nodes = nfa.closure(nfa.move(u.getNodes(),lb));
if(nodes.isEmpty()) continue;
UNode next = new UNode(nodes);
int index = us.indexOf(next);
if(index >= 0){
next = us.get(index);
}
Edge edge = new Edge(u,next,lb);
edges.add(edge);
if(index >= 0) continue;
doList.add(next);
us.add(next);
if(e != null) continue;
e = next.nodes
.stream()
.anyMatch(node -> {return node == nfa.end;}) ?
next : null;
}
}
return new Graph(edges,s, e);
}
public static Graph optimizeDFA(Graph dfa){
List<String> lbs = dfa.getLabels();
int pos = 1;
int end = 2;
Map<Integer, Node> nodeMap = dfa.getNodes();
Map<Integer,Integer> groupMap = new HashMap<>();
nodeMap.values().forEach(n->{
groupMap.put(n.id, n == dfa.end ? 0 : 1);
});
while (pos < end){
final int searchIndex = pos;
List<Node> nodes = groupMap.entrySet().stream()
.filter(e-> e.getValue() == searchIndex).map(e->{return nodeMap.get(e.getKey());})
.collect(Collectors.toList());
boolean success = true;
for (String lb:lbs) {
Map<Integer,List<Integer>> resultMap = new HashMap<>();
for (Node node: nodes) {
Node n = dfa.move(node,lb);
int groupResult = n == null ? -1 : groupMap.get(node.id);
List<Integer> ll = resultMap.get(-1);
if (ll == null){
ll = new ArrayList<>();
ll.add(node.id);
}
resultMap.put(groupResult,ll);
}
int count = resultMap.keySet().size();
if(count == 1)
continue;
int b = pos;
Iterator<List<Integer>> it = resultMap.values().iterator();
while (it.hasNext()){
List<Integer> idList = it.next();
for (Integer id:idList) {
groupMap.put(id,b++);
}
}
end = b;
success = false;
break;
}
if(success) pos++;
}
Map<Integer, Node> newNodeMap = new HashMap<>();
Node s = new Node(groupMap.get(dfa.start.id));
Node e = new Node(groupMap.get(dfa.end.id));
newNodeMap.put(s.id, s);
newNodeMap.put(e.id, e);
List<Edge> edges = dfa.edges.stream().map(edge->{
Node nodeBegin = newNodeMap.getOrDefault(edge.begin.id, new Node(groupMap.get(edge.begin.id)));
Node nodeEnd = newNodeMap.getOrDefault(edge.end.id, new Node(groupMap.get(edge.end.id)));
return new Edge(nodeBegin,nodeEnd,edge.label);
}).distinct().collect(Collectors.toList());
return new Graph(edges,s, e);
}
public static void main(String[] args) {
String exp = prepareString("(A*B|AC)D");
//String exp = prepareString("AB");
System.out.println(exp);
String regex = SuffixAlgorithm.rpn(exp);
System.out.println(regex);
Graph fna = transformNFA(regex);
System.out.println(fna);
Graph dfa = transformDFA(fna);
System.out.println(dfa);
Graph opDfa = optimizeDFA(dfa);
System.out.println(opDfa);
}
static class Graph {
private List<Edge> edges;
private Node start;
private Node end;
public Graph(List<Edge> edges, Node start, Node end) {
this.edges = edges;
this.start = start;
this.end = end;
}
private Graph(String label){
this.start = new Node();
this.end = new Node();
this.edges = new ArrayList<>();
Edge edge = new Edge(start,end,label);
this.edges.add(edge);
}
public static Graph parse(Object obj){
if(obj instanceof Graph){
return (Graph) obj;
}
if(obj instanceof String)
return new Graph((String)obj);
throw new IllegalArgumentException("not support class["+obj.getClass()+"]");
}
public void add(Object obj){
if(obj instanceof String){
add((String)obj);
return;
}
if(obj instanceof Graph) {
add((Graph) obj);
return;
}
throw new IllegalArgumentException("not support class["+obj.getClass()+"]");
}
public void divide(Object obj){
if(obj instanceof String){
divide(parse(obj));
return;
}
if(obj instanceof Graph) {
divide((Graph) obj);
return;
}
throw new IllegalArgumentException("not support class["+obj.getClass()+"]");
}
public void multiply(){
Node nodeStart = this.start;
Node nodeEnd = this.end;
this.start = new Node();
this.end = new Node();
Edge edge1 = new Edge(nodeEnd,nodeStart,E);
Edge edge2 = new Edge(this.start,nodeStart,E);
Edge edge3 = new Edge(nodeEnd,this.end,E);
Edge edge4 = new Edge(this.start,this.end,E);
this.edges.add(edge1);
this.edges.add(edge2);
this.edges.add(edge3);
this.edges.add(edge4);
}
private void divide(Graph graph){
Node nodeStart = this.start;
Node nodeEnd = this.end;
this.start = new Node();
this.end = new Node();
Edge edge1 = new Edge(this.start,graph.start,E);
Edge edge2 = new Edge(this.start,nodeStart,E);
Edge edge3 = new Edge(nodeEnd,this.end,E);
Edge edge4 = new Edge(graph.end,this.end,E);
this.edges.add(edge1);
this.edges.add(edge2);
this.edges.add(edge3);
this.edges.add(edge4);
this.edges.addAll(graph.edges);
}
private void add(String label){
Node mid = end;
this.end = new Node();
Edge edge = new Edge(mid,end,label);
this.edges.add(edge);
}
private void add(Graph graph){
Node node = this.end;
this.end = graph.end;
edges.stream().filter(edge -> edge.end == node)
.forEach(edge -> {
edge.end = graph.start;
});
this.edges.addAll(graph.edges);
}
public List<Node> closure(Node s){
return closure(Arrays.asList(s));
}
public List<Node> closure(List<Node> u){
List<Node> result = new ArrayList<>();
do{
result.addAll(u);
u = move(u,E);
} while (!u.isEmpty());
return result;
}
public Node move(Node t, String lable){
List<Node> result = move(Arrays.asList(t), lable);
if(result.isEmpty()) return null;
return result.get(0);
}
public List<Node> move(List<Node> t, String lable){
return edges.stream()
.filter(edge -> {
if(!edge.label.equals(lable))
return false;
for (Node n : t){
if(edge.begin == n)
return true;
}
return false;
})
.map(edge -> {return edge.end;})
.collect(Collectors.toList());
}
public List<String> getLabels(){
return edges.stream()
.filter(edge -> {return !edge.label.equals(E);})
.map(edge -> {return edge.label;})
.distinct()
.collect(Collectors.toList());
}
public Map<Integer,Node> getNodes(){
Map<Integer, Node> map = new HashMap<>();
edges.forEach(edge -> {
map.put(edge.begin.id, edge.begin);
map.put(edge.end.id, edge.end);
});
return map;
}
@Override
public String toString() {
String printString = "Start=" + this.start + " End=" + this.end + "\n";
for (int i = 0; i < edges.size(); i++) {
printString += edges.get(i) + "\n";
}
return printString;
}
}
//边
static class Edge {
private Node begin;
private Node end;
private String label;
public Edge(Node begin, Node end, String label) {
this.begin = begin;
this.end = end;
this.label = label;
}
@Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Edge edge = (Edge) o;
if (!begin.equals(edge.begin)) return false;
if (!end.equals(edge.end)) return false;
return label.equals(edge.label);
}
@Override
public int hashCode() {
int result = begin.hashCode();
result = 31 * result + end.hashCode();
result = 31 * result + label.hashCode();
return result;
}
@Override
public String toString() {
return "Edge [begin=" + begin + ", end=" + end + ", label=" + label+ "]";
}
}
//节点
static class UNode extends Node {
private List<Node> nodes;
public UNode(List<Node> nodes) {
this.nodes = nodes;
}
public List<Node> getNodes() {
return nodes;
}
@Override
public boolean equals(Object obj) {
if(!(obj instanceof UNode)) return super.equals(obj);
UNode b = (UNode)obj;
if(nodes.size() != b.nodes.size()) return false;
List<Node> desc = new ArrayList<>(Arrays.asList(new UNode[nodes.size()]));
Collections.copy(desc, nodes);
desc.removeAll(b.nodes);
if(desc.isEmpty()) return true;
return false;
}
}
//节点
static class Node {
private int id;
private static int ID=0;
public Node(int id){this.id = id;}
public Node(){
this.id=ID++;
}
@Override
public boolean equals(Object obj) {
if(!(obj instanceof Node))
return super.equals(obj);
Node b = (Node)obj;
return id == b.id;
}
@Override
public String toString() {
return id+"";
}
}
}
·