1: Basic Specifications
1:基本規範
Names refer to either tokens or nonterminal symbols. Yacc requires token names to be declared as such. In addition, for reasons discussed in Section 3, it is often desirable to include the lexical analyzer as part of the specification file; it may be useful to include other programs as well. Thus, every specification file consists of three sections: the declarations, (grammar) rules, and programs. The sections are separated by double percent ``%%'' marks. (The percent ``%'' is generally used in Yacc specifications as an escape character.)
“名字”指token或者非終結符符号。Yacc需要token名字被聲明如是。另外,由于第3節中讨論的原因,詞法分析器經常希望被包含在規範檔案中; 包含其他程式也可能是有用的。是以,每個規範檔案包括三段:聲明,(文法)規則,和程式。各段間用雙百分号“%%”标記。(百分号“%”一般在Yacc規 範中被用做轉義字元)
In other words, a full specification file looks like
換言之,一個完整的規範檔案就像
declarations
%%
rules
%%
programs
The declaration section may be empty. Moreover, if the programs section is omitted, the second %% mark may be omitted also;
聲明段可能為空,另外,如果程式段省略,那麼第二組“%%”标記也可以省略
thus, the smallest legal Yacc specification is
是以,最小的合法Yacc規範是
%%
rules
Blanks, tabs, and newlines are ignored except that they may not appear in names or multi-character reserved symbols. Comments may appear wherever a name is legal; they are enclosed in , as in C and PL/I.
空格和換行符将被忽略,除非它們出現在名字裡或者多字元保留字元号裡。注釋可以出現在任何名字合法的地方,它們在裡面,就像在C和PL/I裡一樣
The rules section is made up of one or more grammar rules. A grammar rule has the form:
規則段由一個或多個文法規則組成。一條文法規則有格式:
A : BODY ;
A represents a nonterminal name, and BODY represents a sequence of zero or more names and literals. The colon and the semicolon are Yacc punctuation.
A代表一個非終結符,BODY代表了0個或多個名字和字元序列。冒号和分号是Yacc标點。
Names may be of arbitrary length, and may be made up of letters, dot ``.'', underscore ``_'', and non-initial digits. Upper and lower case letters are distinct. The names used in the body of a grammar rule may represent tokens or nonterminal symbols.
名字可以是任意長度,可由字母,點“.”,下劃線“_”,和數字(不能作開頭)組成。差別大小寫。用在文法規則的body中的名字可以代表token,或者非終結符
A literal consists of a character enclosed in single quotes ``'''. As in C, the backslash ``/'' is an escape character within literals, and all the C escapes are recognized. Thus
字面字元被單引号“''”引起。就如在C語言裡,反斜線是一個轉義字元,C語言裡所有的轉義字元都可識别,如下
'/n' newline
'/r' return
'/'' single quote ``'''
'//' backslash ``/''
'/t' tab
'/b' backspace
'/f' form feed
'/xxx' ``xxx'' in octal
For a number of technical reasons, the NUL character ('/0' or 0) should never be used in grammar rules.
由于一些技術原因,NUL字元('/0'或者0)不能用在文法規則中
If there are several grammar rules with the same left hand side, the vertical bar ``|'' can be used to avoid rewriting the left hand side. In addition, the semicolon at the end of a rule can be dropped before a vertical bar. Thus the grammar rules
如果有一些文法規則有同樣的左邊,豎直線“|”可以用來避免重寫左邊。另外,規則結尾的分号在豎線前可以省略。是以,文法規則
A : B C D ;
A : E F ;
A : G ;
can be given to Yacc as
可以寫成
A : B C D
| E F
| G
;
It is not necessary that all grammar rules with the same left side appear together in the grammar rules section, although it makes the input much more readable, and easier to change.
文法規則段中,所有左邊相同的文法規則都出現在一起不是必需的,雖然這會使得輸入更加可讀并且更易更改。
If a nonterminal symbol matches the empty string, this can be indicated in the obvious way:
如果一個非終結符比對空串,可以用一個明顯的方法來指定
empty : ;
Names representing tokens must be declared; this is most simply done by writing
代表token的名字必需聲明;最簡單的,可以寫成
%token name1 name2 . . .
in the declarations section. (See Sections 3 , 5, and 6 for much more discussion). Every name not defined in the declarations section is assumed to represent a nonterminal symbol. Every nonterminal symbol must appear on the left side of at least one rule.
在聲明段。(更多讨論見3,5,和6)。每個未在聲明段中定義的名字都被假設為非終結符。每個非終結符必需至少出現在一條文法規則的左邊
Of all the nonterminal symbols, one, called the start symbol, has particular importance. The parser is designed to recognize the start symbol; thus, this symbol represents the largest, most general structure described by the grammar rules. By default, the start symbol is taken to be the left hand side of the first grammar rule in the rules section. It is possible, and in fact desirable, to declare the start symbol explicitly in the declarations section using the %start keyword:
在所有非終結符中,有一個被稱作開始符,有特殊的重要性。文法分析器被設計成可以識别開始符;是以,這個符号代表了文法規則所描述的最大的,最一般的結構。預設情況下,開始符位于文法規則段第一條文法規則的左邊 。在聲明段用%start keyword明确聲明開始符是可能的和可取的。
%start symbol
The end of the input to the parser is signaled by a special token, called the endmarker. If the tokens up to, but not including, the endmarker form a structure which matches the start symbol, the parser function returns to its caller after the endmarker is seen; it accepts the input. If the endmarker is seen in any other context, it is an error.
詞法分析的輸入結尾用一個特殊token來辨別,稱為結束标記。如果XXX
It is the job of the user-supplied lexical analyzer to return the endmarker when appropriate; see section 3, below. Usually the endmarker represents some reasonably obvious I/O status, such as ``end-of-file'' or ``end-of-record''.
當合适時傳回結束符是使用者提供的詞法分析程式的工作;看下面的第3段。通常,結束符表示一些明顯的IO狀态,如“檔案末尾”或者“記錄結束”