The lexer hack The lexer hack

From Wikipedia, the free encyclopedia

<a target="_blank" href="http://en.wikipedia.org/wiki/The_lexer_hack#Problem">1 Problem</a>

<a target="_blank" href="http://en.wikipedia.org/wiki/The_lexer_hack#The_hack_solution">2 The hack solution</a>

<a target="_blank" href="http://en.wikipedia.org/wiki/The_lexer_hack#Alternative_solutions">3 Alternative solutions</a>

<a target="_blank" href="http://en.wikipedia.org/wiki/The_lexer_hack#References">5 References</a>

<a target="_blank" href="http://en.wikipedia.org/wiki/The_lexer_hack#Citations">6 Citations</a>

The problem is that in the following code, the lexical class of <code>A</code> cannot be determined without

further contextual information:

This code could be multiplication of two variables, in which case <code>A</code> is a <code>variable</code>;

written unambiguously:

Alternatively, it could be casting the dereferenced value of <code>B</code> to the type <code>A</code>,

in which case <code>A</code> is a <code>typedef-name</code>; written in

usual human-readable form, but still ambiguously from the point of view of the grammar:

extract meaningful tokens, such as words, numbers, and strings. The parser analyzes sequences of tokens attempting to match them to syntax rules representing language structures, such as loops and variable declarations. A problem occurs

here if a single sequence of tokens can ambiguously match more than one syntax rule.

example, in the C expression:

the lexer may find these tokens:

left parenthesis

identifier 'A'

right parenthesis

operator '*'

identifier 'B'

The problem is precisely that the lexical class of A cannot be determined without further context: the parser can interpret this as variable A multiplied

This mixing of the lexer and parser is generally regarded as inelegant, which is why it is called a "hack".

Without added context, the lexer cannot distinguish type identifiers from other identifiers without extra context because all identifiers have the same format. With the hack in the above example, when the

lexer finds the identifier A it should be able to classify the token as a type identifier. The rules of the language would be clarified by specifying that typecasts require a type identifier and the ambiguity disappears.

and parser in a pipeline.

is also the approach used in most other modern languages, which do not distinguish different classes of identifiers in the lexical grammar, but instead defer them to the parsing or semantic analysis phase, when sufficient information is available.

<a target="_blank" href="http://en.wikipedia.org/wiki/Dangling_else">Dangling else</a>

on yacc with modifications by Chris Dodd and Vadim Maslov.

<a target="_blank" href="http://www.cs.berkeley.edu/~smcpeak/elkhound/sources/elkhound/index.html">http://www.cs.berkeley.edu/~smcpeak/elkhound/sources/elkhound/index.html</a>

<a target="_blank" href="http://cs.nyu.edu/rgrimm/papers/pldi06.pdf">http://cs.nyu.edu/rgrimm/papers/pldi06.pdf</a>

<a target="_blank" href="http://cens.ioc.ee/local/man/CompaqCompilers/ladebug/ladebug-manual-details.html">http://cens.ioc.ee/local/man/CompaqCompilers/ladebug/ladebug-manual-details.html</a>

<a target="_blank" href="http://www.springerlink.com/index/YN4GQ2YMNQUY693L.pdf">http://www.springerlink.com/index/YN4GQ2YMNQUY693L.pdf</a>

<a target="_blank" href="http://news.gmane.org/find-root.php?group=gmane.comp.lang.groovy.jsr&article=843&type=blog">http://news.gmane.org/find-root.php?group=gmane.comp.lang.groovy.jsr&article=843&type=blog</a>

<a target="_blank" href="http://groups.google.com/group/comp.compilers/browse_frm/thread/db7f68e9d8b49002/fa20bf5de9c73472?lnk=st&q=%2B%22the+lexer+hack%22&rnum=1&hl=en#fa20bf5de9c73472">http://groups.google.com/group/comp.compilers/browse_frm/thread/db7f68e9d8b49002/fa20bf5de9c73472?lnk=st&q=%2B%22the+lexer+hack%22&rnum=1&hl=en#fa20bf5de9c73472</a>

The lexer hack The lexer hack

繼續閱讀

C語言第四章自述2第四章選擇結構程式設計

面試題:vector和map的差別，異同。空間分布，100萬資料存哪個比較合适。一、疊代器差別二、vector三、Map、Set四、vector_map 為什麼比map效率高五、如何選擇六、容器選擇原則七、效率對比

C++ 多線程用條件變量确定線程的執行順序而不是使用 sleep(1)

POJ 1284 Primitive Roots (歐拉函數&原根定理)

CQ V1.0分詞bates(基于雙數組tire樹)—應該是目前最快的中文分詞算法

成員函數初始化清單

2021-08-13c++——類之操作符重載

swmm與lisflood-fp源碼如何一起編譯 CMake指令

Windows下VS開發環境環境安裝工程項目設定關于Debug和Release的提示

一文看懂字元串的加減乘除

C++ 第十五周報告1--《冒泡法排序》

C++實作簡單順序表

C經典書籍筆記——C陷阱與缺陷②(文法陷阱之優先級)一、錯誤案列二、優先級規律

線性表之順序表的實作

C++判斷素數、求最大公約數代碼判斷一個數是否為素數求兩個數的最大公約數

SequoiaDB巨杉資料庫C++驅動概述