[转载]boost tokenizer (原载： http://www.cppblog.com/zuroc/)

tokenizer - Break of a string or other character sequence into a series of tokens, from John Bandela

tokenizer - 分解字串,提取内容.作者: John Bandela

例一:

// simple_example_1.cpp

#include<iostream>

#include<boost/tokenizer.hpp>

#include<string>

int main(){

using namespace std;

using namespace boost;

string s = "This is, a test";

tokenizer<> tok(s);

for(tokenizer<>::iterator beg=tok.begin(); beg!=tok.end();++beg){

cout << *beg << "/n";

}

输出

This

test

tokenizer默认将单词以空格和标点为边界分开.

例二:

#include<iostream>

#include<boost/tokenizer.hpp>

#include<string>

int main(){

using namespace std;

using namespace boost;

string s = "Field 1,/"putting quotes around fields, allows commas/",Field 3";

tokenizer<escaped_list_separator<char> > tok(s);

for(tokenizer<escaped_list_separator<char> >::iterator beg=tok.begin(); beg!=tok.end();++beg){

cout << *beg << "/n";

}

输出

Field 1

putting quotes around fields, allows commas

Field 3

双引号之间可以有标点.

例三:

// simple_example_3.cpp

#include<iostream>

#include<boost/tokenizer.hpp>

#include<string>

int main(){

using namespace std;

using namespace boost;

string s = "12252001";

int offsets[] = {2,2,4};

offset_separator f(offsets, offsets+3);

tokenizer<offset_separator> tok(s,f);

for(tokenizer<offset_separator>::iterator beg=tok.begin(); beg!=tok.end();++beg){

cout << *beg << "/n";

}

把12252001分解为

2001

例4:

// char_sep_example_1.cpp

#include <iostream>

#include <boost/tokenizer.hpp>

#include <string>

int main()

{

std::string str = ";!!;Hello|world||-foo--bar;yow;baz|";

typedef boost::tokenizer<boost::char_separator<char> >

tokenizer;

boost::char_separator<char> sep("-;|");

tokenizer tokens(str, sep);

for (tokenizer::iterator tok_iter = tokens.begin();

tok_iter != tokens.end(); ++tok_iter)

std::cout << "<" << *tok_iter << "> ";

std::cout << "/n";

return EXIT_SUCCESS;

}

输出

<!!> <Hello> <world> <foo> <bar> <yow> <baz>

自定义分隔的标点

例5:

// char_sep_example_2.cpp

#include <iostream>

#include <boost/tokenizer.hpp>

#include <string>

int main()

{

std::string str = ";;Hello|world||-foo--bar;yow;baz|";

typedef boost::tokenizer<boost::char_separator<char> >

tokenizer;

boost::char_separator<char> sep("-;", "|", boost::keep_empty_tokens);

tokenizer tokens(str, sep);

for (tokenizer::iterator tok_iter = tokens.begin();

tok_iter != tokens.end(); ++tok_iter)

std::cout << "<" << *tok_iter << "> ";

std::cout << "/n";

return EXIT_SUCCESS;

}

The output is:

<> <> <Hello> <|> <world> <|> <> <|> <> <foo> <> <bar> <yow> <baz> <|> <>

去除-; , 保留|但将它看作是分隔符,当两个分隔符相邻的时候会自动加空格

例6:

// char_sep_example_3.cpp

#include <iostream>

#include <boost/tokenizer.hpp>

#include <string>

int main()

{

std::string str = "This is, a test";

typedef boost::tokenizer<boost::char_separator<char> > Tok;

boost::char_separator<char> sep; // default constructed

Tok tok(str, sep);

for(Tok::iterator tok_iter = tok.begin(); tok_iter != tok.end(); ++tok_iter)

std::cout << "<" << *tok_iter << "> ";

std::cout << "/n";

return EXIT_SUCCESS;

}

The output is:

保留标点但将它看作分隔符

====================================

编程手记

理解DOM- -| 回首页 | 2004年索引 | - - 关于鼠标取词（抓词）的老文章

如何使用boost中的 tokenizer - -

boost中提供的字符串分割比CRT中的strtok要好用很多使用方法如下 1。typedef 一个自己

tokenizer

，其实这不不是必要的，但是由于boost::

tokenizer

的声明太长了，typedef比较方便一些 typedef boost::

tokenizer

<boost::char_separator<char> >

tokenizer

; 2。定义一个分割符，比如 “竖杠”，另外注意boost::keep_empty_tokens，这个参数，这是boost的

tokenizer

定义的策略，有两种drop_empty_tokens 和keep_empty_tokens，不同在于对于空tokens的处理比如：使用 drop_empty_tokens 时，"s234||345"，两个连续的'|'会被当作一个处理使用 keep_empty_tokens 时，"s234||345"，两个连续的'|'会切割出一个空字符串 boost::char_separator<char> sep('|', 0, boost::keep_empty_tokens);

3。定义一个tokens实例，str是要切分的字符串，sep是分割符

tokenizer

tokens(str, sep); 4。定义一个iterator来访问每个被分割的字串

tokenizer

::iterator tok_iter = tokens.begin();

- 作者： karl 2004年02月20日, 星期五 10:17

[转载]boost tokenizer编程手记

编程手记

如何使用boost中的 tokenizer - -

Trackback

继续阅读

probe()函数是什么时候被调用，设备和驱动是怎么联系起来的

invalid byte 1 of 1-byte UTF-8 sequence

（C# 编程指南）

转详解C#数据库存取图片三大方式

浅谈---测试Native Windows Command与Native PowerShell Command哪个效率高

C/C++头文件、函数使用说明

SOFTICE 使用说明 (断点)

java 日期总结

List control NM

c写文件

C#多线程——前台线程和后台线程

QName是什么

Android – ListView 中添加按钮，动态删除添加ItemView的操作

C++ 第十五周报告1--《冒泡法排序》

GridView终极用法(一)

Linux设备模型（中）之上层容器