天天看点

[Google Guava] 6-字符串处理:分割,连接,填充

<code>1</code>

<code>joiner joiner = joiner.on(</code><code>"; "</code><code>).skipnulls();</code>

<code>2</code>

<code>return</code> <code>joiner.join(</code><code>"harry"</code><code>,</code><code>null</code><code>,</code><code>"ron"</code><code>,</code><code>"hermione"</code><code>);</code>

上述代码返回”harry; ron; hermione”。另外,usefornull(string)方法可以给定某个字符串来替换null,而不像skipnulls()方法是直接忽略null。 joiner也可以用来连接对象类型,在这种情况下,它会把对象的tostring()值连接起来。

<code>joiner.on(</code><code>","</code><code>).join(arrays.aslist(</code><code>1</code><code>,</code><code>5</code><code>,</code><code>7</code><code>));</code><code>// returns "1,5,7"</code>

警告:joiner实例总是不可变的。用来定义joiner目标语义的配置方法总会返回一个新的joiner实例。这使得joiner实例都是线程安全的,你可以将其定义为static final常量。

jdk内建的字符串拆分工具有一些古怪的特性。比如,string.split悄悄丢弃了尾部的分隔符。 问题:”,a,,b,”.split(“,”)返回?

“”, “a”, “”, “b”, “”

null, “a”, null, “b”, null

“a”, null, “b”

“a”, “b”

以上都不对

<code>splitter.on(</code><code>','</code><code>)</code>

<code>        </code><code>.trimresults()</code>

<code>3</code>

<code>        </code><code>.omitemptystrings()</code>

<code>4</code>

<code>        </code><code>.split(</code><code>"foo,bar,,   qux"</code><code>);</code>

上述代码返回iterable&lt;string&gt;,其中包含”foo”、”bar”和”qux”。splitter可以被设置为按照任何模式、字符、字符串或字符匹配器拆分。

<b>方法</b><b></b>

<b>描述</b><b></b>

<b>范例</b><b></b>

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/splitter.html#on(char)">splitter.on(char)</a>

按单个字符拆分

splitter.on(‘;’)

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/splitter.html#on(com.google.common.base.charmatcher)">splitter.on(charmatcher)</a>

按字符匹配器拆分

splitter.on(charmatcher.breaking_whitespace)

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/splitter.html#on(java.lang.string)">splitter.on(string)</a>

按字符串拆分

splitter.on(“,   “)

按正则表达式拆分

splitter.onpattern(“\r?\n”)

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/splitter.html#fixedlength(int)">splitter.fixedlength(int)</a>

按固定长度拆分;最后一段可能比给定长度短,但不会为空。

splitter.fixedlength(3)

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/splitter.html#omitemptystrings()">omitemptystrings()</a>

从结果中自动忽略空字符串

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/splitter.html#trimresults()">trimresults()</a>

移除结果字符串的前导空白和尾部空白

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/splitter.html#trimresults(com.google.common.base.charmatcher)">trimresults(charmatcher)</a>

给定匹配器,移除结果字符串的前导匹配字符和尾部匹配字符

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/splitter.html#limit(int)">limit(int)</a>

限制拆分出的字符串数量

如果你想要拆分器返回list,只要使用lists.newarraylist(splitter.split(string))或类似方法。 警告:splitter实例总是不可变的。用来定义splitter目标语义的配置方法总会返回一个新的splitter实例。这使得splitter实例都是线程安全的,你可以将其定义为static final常量。

在以前的guava版本中,stringutil类疯狂地膨胀,其拥有很多处理字符串的方法:allascii、collapse、collapsecontrolchars、collapsewhitespace、indexofchars、lastindexnotof、numsharedchars、removechars、removecrlf、replacechars、retainallchars、strip、stripandcollapse、stripnondigits。 所有这些方法指向两个概念上的问题:

怎么才算匹配字符?

如何处理这些匹配字符?

为了收拾这个泥潭,我们开发了charmatcher。

然而使用charmatcher的好处更在于它提供了一系列方法,让你对字符作特定类型的操作:修剪[trim]、折叠[collapse]、移除[remove]、保留[retain]等等。charmatcher实例首先代表概念1:怎么才算匹配字符?然后它还提供了很多操作概念2:如何处理这些匹配字符?这样的设计使得api复杂度的线性增加可以带来灵活性和功能两方面的增长。

<code>string nocontrol = charmatcher.java_iso_control.removefrom(string);</code><code>//移除control字符</code>

<code>string thedigits = charmatcher.digit.retainfrom(string);</code><code>//只保留数字字符</code>

<code>string spaced = charmatcher.whitespace.trimandcollapsefrom(string,</code><code>' '</code><code>);</code>

<code>//去除两端的空格,并把中间的连续空格替换成单个空格</code>

<code>5</code>

<code>string nodigits = charmatcher.java_digit.replacefrom(string,</code><code>"*"</code><code>);</code><code>//用*号替换所有数字</code>

<code>6</code>

<code>string loweranddigit = charmatcher.java_digit.or(charmatcher.java_lower_case).retainfrom(string);</code>

<code>7</code>

<code>// 只保留数字和小写字母</code>

注:charmatcher只处理char类型代表的字符;它不能理解0x10000到0x10ffff的unicode 增补字符。这些逻辑字符以代理对[surrogate pairs]的形式编码进字符串,而charmatcher只能将这种逻辑字符看待成两个独立的字符。

charmatcher中的常量可以满足大多数字符匹配需求:

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/charmatcher.html#any">any</a>

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/charmatcher.html#none">none</a>

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/charmatcher.html#whitespace">whitespace</a>

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/charmatcher.html#breaking_whitespace">breaking_whitespace</a>

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/charmatcher.html#invisible">invisible</a>

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/charmatcher.html#digit">digit</a>

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/charmatcher.html#java_letter">java_letter</a>

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/charmatcher.html#java_digit">java_digit</a>

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/charmatcher.html#java_letter_or_digit">java_letter_or_digit</a>

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/charmatcher.html#java_iso_control">java_iso_control</a>

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/charmatcher.html#java_lower_case">java_lower_case</a>

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/charmatcher.html#java_upper_case">java_upper_case</a>

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/charmatcher.html#ascii">ascii</a>

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/charmatcher.html#single_width">single_width</a>

其他获取字符匹配器的常见方法包括:

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/charmatcher.html#anyof(java.lang.charsequence)">anyof(charsequence)</a>

枚举匹配字符。如charmatcher.anyof(“aeiou”)匹配小写英语元音

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/charmatcher.html#is(char)">is(char)</a>

给定单一字符匹配。

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/charmatcher.html#inrange(char,%20char)">inrange(char, char)</a>

给定字符范围匹配,如charmatcher.inrange(‘a’, ‘z’)

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/charmatcher.html#collapsefrom(java.lang.charsequence,%20char)">collapsefrom(charsequence,   char)</a>

把每组连续的匹配字符替换为特定字符。如whitespace.collapsefrom(string, ‘ ‘)把字符串中的连续空白字符替换为单个空格。

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/charmatcher.html#matchesallof(java.lang.charsequence)">matchesallof(charsequence)</a>

测试是否字符序列中的所有字符都匹配。

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/charmatcher.html#removefrom(java.lang.charsequence)">removefrom(charsequence)</a>

从字符序列中移除所有匹配字符。

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/charmatcher.html#retainfrom(java.lang.charsequence)">retainfrom(charsequence)</a>

在字符序列中保留匹配字符,移除其他字符。

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/charmatcher.html#trimfrom(java.lang.charsequence)">trimfrom(charsequence)</a>

移除字符序列的前导匹配字符和尾部匹配字符。

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/charmatcher.html#replacefrom(java.lang.charsequence,%20java.lang.charsequence)">replacefrom(charsequence,   charsequence)</a>

用特定字符序列替代匹配字符。

所有这些方法返回string,除了matchesallof返回的是boolean。

不要这样做字符集处理:

<code>try</code> <code>{</code>

<code>    </code><code>bytes = string.getbytes(</code><code>"utf-8"</code><code>);</code>

<code>}</code><code>catch</code> <code>(unsupportedencodingexception e) {</code>

<code>    </code><code>// how can this possibly happen?</code>

<code>    </code><code>throw</code> <code>new</code> <code>assertionerror(e);</code>

<code>}</code>

试试这样写:

<code>bytes = string.getbytes(charsets.utf_8);</code>

caseformat被用来方便地在各种ascii大小写规范间转换字符串——比如,编程语言的命名规范。caseformat支持的格式如下:

<b>格式</b><b></b>

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/caseformat.html#lower_camel">lower_camel</a>

lowercamel

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/caseformat.html#lower_hyphen">lower_hyphen</a>

lower-hyphen

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/caseformat.html#lower_underscore">lower_underscore</a>

lower_underscore

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/caseformat.html#upper_camel">upper_camel</a>

uppercamel

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/caseformat.html#upper_underscore">upper_underscore</a>

upper_underscore

caseformat的用法很直接:

<code>caseformat.upper_underscore.to(caseformat.lower_camel,</code><code>"constant_name"</code><code>));</code><code>// returns "constantname"</code>

我们caseformat在某些时候尤其有用,比如编写代码生成器的时候。