[轉載]boost tokenizer (原載： http://www.cppblog.com/zuroc/)

tokenizer - Break of a string or other character sequence into a series of tokens, from John Bandela

tokenizer - 分解字串,提取内容.作者: John Bandela

例一:

// simple_example_1.cpp

#include<iostream>

#include<boost/tokenizer.hpp>

#include<string>

int main(){

using namespace std;

using namespace boost;

string s = "This is, a test";

tokenizer<> tok(s);

for(tokenizer<>::iterator beg=tok.begin(); beg!=tok.end();++beg){

cout << *beg << "/n";

}

輸出

This

test

tokenizer預設将單詞以空格和标點為邊界分開.

例二:

#include<iostream>

#include<boost/tokenizer.hpp>

#include<string>

int main(){

using namespace std;

using namespace boost;

string s = "Field 1,/"putting quotes around fields, allows commas/",Field 3";

tokenizer<escaped_list_separator<char> > tok(s);

for(tokenizer<escaped_list_separator<char> >::iterator beg=tok.begin(); beg!=tok.end();++beg){

cout << *beg << "/n";

}

輸出

Field 1

putting quotes around fields, allows commas

Field 3

雙引号之間可以有标點.

例三:

// simple_example_3.cpp

#include<iostream>

#include<boost/tokenizer.hpp>

#include<string>

int main(){

using namespace std;

using namespace boost;

string s = "12252001";

int offsets[] = {2,2,4};

offset_separator f(offsets, offsets+3);

tokenizer<offset_separator> tok(s,f);

for(tokenizer<offset_separator>::iterator beg=tok.begin(); beg!=tok.end();++beg){

cout << *beg << "/n";

}

把12252001分解為

2001

例4:

// char_sep_example_1.cpp

#include <iostream>

#include <boost/tokenizer.hpp>

#include <string>

int main()

{

std::string str = ";!!;Hello|world||-foo--bar;yow;baz|";

typedef boost::tokenizer<boost::char_separator<char> >

tokenizer;

boost::char_separator<char> sep("-;|");

tokenizer tokens(str, sep);

for (tokenizer::iterator tok_iter = tokens.begin();

tok_iter != tokens.end(); ++tok_iter)

std::cout << "<" << *tok_iter << "> ";

std::cout << "/n";

return EXIT_SUCCESS;

}

輸出

<!!> <Hello> <world> <foo> <bar> <yow> <baz>

自定義分隔的标點

例5:

// char_sep_example_2.cpp

#include <iostream>

#include <boost/tokenizer.hpp>

#include <string>

int main()

{

std::string str = ";;Hello|world||-foo--bar;yow;baz|";

typedef boost::tokenizer<boost::char_separator<char> >

tokenizer;

boost::char_separator<char> sep("-;", "|", boost::keep_empty_tokens);

tokenizer tokens(str, sep);

for (tokenizer::iterator tok_iter = tokens.begin();

tok_iter != tokens.end(); ++tok_iter)

std::cout << "<" << *tok_iter << "> ";

std::cout << "/n";

return EXIT_SUCCESS;

}

The output is:

<> <> <Hello> <|> <world> <|> <> <|> <> <foo> <> <bar> <yow> <baz> <|> <>

去除-; , 保留|但将它看作是分隔符,當兩個分隔符相鄰的時候會自動加空格

例6:

// char_sep_example_3.cpp

#include <iostream>

#include <boost/tokenizer.hpp>

#include <string>

int main()

{

std::string str = "This is, a test";

typedef boost::tokenizer<boost::char_separator<char> > Tok;

boost::char_separator<char> sep; // default constructed

Tok tok(str, sep);

for(Tok::iterator tok_iter = tok.begin(); tok_iter != tok.end(); ++tok_iter)

std::cout << "<" << *tok_iter << "> ";

std::cout << "/n";

return EXIT_SUCCESS;

}

The output is:

保留标點但将它看作分隔符

====================================

程式設計手記

了解DOM- -| 回首頁 | 2004年索引 | - - 關于滑鼠取詞（抓詞）的老文章

如何使用boost中的 tokenizer - -

boost中提供的字元串分割比CRT中的strtok要好用很多使用方法如下 1。typedef 一個自己

tokenizer

，其實這不不是必要的，但是由于boost::

tokenizer

的聲明太長了，typedef比較友善一些 typedef boost::

tokenizer

<boost::char_separator<char> >

tokenizer

; 2。定義一個分割符，比如 “豎杠”，另外注意boost::keep_empty_tokens，這個參數，這是boost的

tokenizer

定義的政策，有兩種drop_empty_tokens 和keep_empty_tokens，不同在于對于空tokens的處理比如：使用 drop_empty_tokens 時，"s234||345"，兩個連續的'|'會被當作一個處理使用 keep_empty_tokens 時，"s234||345"，兩個連續的'|'會切割出一個空字元串 boost::char_separator<char> sep('|', 0, boost::keep_empty_tokens);

3。定義一個tokens執行個體，str是要切分的字元串，sep是分割符

tokenizer

tokens(str, sep); 4。定義一個iterator來通路每個被分割的字串

tokenizer

::iterator tok_iter = tokens.begin();

- 作者： karl 2004年02月20日, 星期五 10:17

[轉載]boost tokenizer程式設計手記

程式設計手記

如何使用boost中的 tokenizer - -

Trackback

繼續閱讀

probe()函數是什麼時候被調用，裝置和驅動是怎麼聯系起來的

invalid byte 1 of 1-byte UTF-8 sequence

（C# 程式設計指南）

轉詳解C#資料庫存取圖檔三大方式

淺談---測試Native Windows Command與Native PowerShell Command哪個效率高

C/C++頭檔案、函數使用說明

SOFTICE 使用說明 (斷點)

java 日期總結

List control NM

c寫檔案

C#多線程——前台線程和背景線程

QName是什麼

Android – ListView 中添加按鈕，動态删除添加ItemView的操作

C++ 第十五周報告1--《冒泡法排序》

GridView終極用法(一)

Linux裝置模型（中）之上層容器