深入研究Clang（七） Clang Lexer代碼閱讀筆記之Lexer

Lexer.cpp這個檔案，是Clang這個前端的詞法分析器的主要檔案，它的内容是對Lexer這個類的具體實作，原檔案的注釋中：“This file implements the Lexer and Token interfaces.” 這麼解釋這個檔案的，但是Token隻有兩個簡單函數的實作，剩下的都是Lexer的實作。是以要想搞清楚Clang的詞法分析器是怎麼實作的，那麼必須對這個檔案有着深入的了解。

從Lexer的初始化函數開始入手：

void Lexer::InitLexer(const char *BufStart, const char *BufPtr,
   56                       const char *BufEnd) {
   57   BufferStart = BufStart;
   58   BufferPtr = BufPtr;
   59   BufferEnd = BufEnd;
   60 
   61   assert(BufEnd[0] == 0 &&
   62          "We assume that the input buffer has a null character at the end"
   63          " to simplify lexing!");
   64 
   65   // Check whether we have a BOM in the beginning of the buffer. If yes - act
   66   // accordingly. Right now we support only UTF-8 with and without BOM, so, just
   67   // skip the UTF-8 BOM if it's present.
   68   if (BufferStart == BufferPtr) {
   69     // Determine the size of the BOM.
   70     StringRef Buf(BufferStart, BufferEnd - BufferStart);
   71     size_t BOMLength = llvm::StringSwitch<size_t>(Buf)
   72       .StartsWith("\xEF\xBB\xBF", 3) // UTF-8 BOM
   73       .Default(0);
   74 
   75     // Skip the BOM.
   76     BufferPtr += BOMLength;
   77   }
   78 
   79   Is_PragmaLexer = false;
   80   CurrentConflictMarkerState = CMK_None;
   81 
   82   // Start of the file is a start of line.
   83   IsAtStartOfLine = true;
   84   IsAtPhysicalStartOfLine = true;
   85 
   86   HasLeadingSpace = false;
   87   HasLeadingEmptyMacro = false;
   88 
   89   // We are not after parsing a #.
   90   ParsingPreprocessorDirective = false;
   91 
   92   // We are not after parsing #include.
   93   ParsingFilename = false;
   94 
   95   // We are not in raw mode.  Raw mode disables diagnostics and interpretation
   96   // of tokens (e.g. identifiers, thus disabling macro expansion).  It is used
   97   // to quickly lex the tokens of the buffer, e.g. when handling a "#if 0" block
   98   // or otherwise skipping over tokens.
   99   LexingRawMode = false;
  100 
  101   // Default to not keeping comments.
  102   ExtendedTokenMode = 0;
  103 }

這個初始化函數，是在Lexer類的兩個構造函數裡被調用的，具體代碼如下：

104 
  105 /// Lexer constructor - Create a new lexer object for the specified buffer
  106 /// with the specified preprocessor managing the lexing process.  This lexer
  107 /// assumes that the associated file buffer and Preprocessor objects will
  108 /// outlive it, so it doesn't take ownership of either of them.
  109 Lexer::Lexer(FileID FID, const llvm::MemoryBuffer *InputFile, Preprocessor &PP)
  110   : PreprocessorLexer(&PP, FID),
  111     FileLoc(PP.getSourceManager().getLocForStartOfFile(FID)),
  112     LangOpts(PP.getLangOpts()) {
  113 
  114   InitLexer(InputFile->getBufferStart(), InputFile->getBufferStart(),
  115             InputFile->getBufferEnd());
  116 
  117   resetExtendedTokenMode();
  118 }
  119 
  120 void Lexer::resetExtendedTokenMode() {
  121   assert(PP && "Cannot reset token mode without a preprocessor");
  122   if (LangOpts.TraditionalCPP)
  123     SetKeepWhitespaceMode(true);
  124   else
  125     SetCommentRetentionState(PP->getCommentRetentionState());
  126 }
  127 
  128 /// Lexer constructor - Create a new raw lexer object.  This object is only
  129 /// suitable for calls to 'LexFromRawLexer'.  This lexer assumes that the text
  130 /// range will outlive it, so it doesn't take ownership of it.
  131 Lexer::Lexer(SourceLocation fileloc, const LangOptions &langOpts,
  132              const char *BufStart, const char *BufPtr, const char *BufEnd)
  133   : FileLoc(fileloc), LangOpts(langOpts) {
  134 
  135   InitLexer(BufStart, BufPtr, BufEnd);
  136 
  137   // We *are* in raw mode.
  138   LexingRawMode = true;
  139 }

深入研究Clang（七） Clang Lexer代碼閱讀筆記之Lexer

繼續閱讀

趕工心得（一）

一個小小的移動web版音樂播放器

Docker - Dockerfile之ADD、COPY、WORKDIR、USER、EXPOSE指令詳解

PAT (Advanced Level) Practise 1131 Subway Map (30)

ZOJ 3938 Defuse the Bomb

CSU 1565 Word Cloud

ZOJ 3700 Ever Dream

Compile workrave under windows &ndash; My exprience 在Windows上編譯Workrave

ZOJ 1199 Point of Intersection

CSU 1567 Reverse Rot

門戶通專訪草根站長九天狼：做站貴在堅持

tabpanel 使用問題

為什麼把CSS放頭部，script放下面

CSS之折疊菜單

web開發之前後端渲染

403 Forbidden，You don't have permission to access / on this server.Forbidden