天天看點

[Google Guava] 6-字元串處理:分割,連接配接,填充

上述代碼傳回”harry; ron; hermione”。另外,usefornull(string)方法可以給定某個字元串來替換null,而不像skipnulls()方法是直接忽略null。 joiner也可以用來連接配接對象類型,在這種情況下,它會把對象的tostring()值連接配接起來。

警告:joiner執行個體總是不可變的。用來定義joiner目智語義的配置方法總會傳回一個新的joiner執行個體。這使得joiner執行個體都是線程安全的,你可以将其定義為static final常量。

jdk内建的字元串拆分工具有一些古怪的特性。比如,string.split悄悄丢棄了尾部的分隔符。 問題:”,a,,b,”.split(“,”)傳回?

“”, “a”, “”, “b”, “”

null, “a”, null, “b”, null

“a”, null, “b”

“a”, “b”

以上都不對

上述代碼傳回iterable<string>,其中包含”foo”、”bar”和”qux”。splitter可以被設定為按照任何模式、字元、字元串或字元比對器拆分。

<b>方法</b><b></b>

<b>描述</b><b></b>

<b>範例</b><b></b>

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/splitter.html#on(char)">splitter.on(char)</a>

按單個字元拆分

splitter.on(‘;’)

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/splitter.html#on(com.google.common.base.charmatcher)">splitter.on(charmatcher)</a>

按字元比對器拆分

splitter.on(charmatcher.breaking_whitespace)

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/splitter.html#on(java.lang.string)">splitter.on(string)</a>

按字元串拆分

splitter.on(“,   “)

按正規表達式拆分

splitter.onpattern(“\r?\n”)

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/splitter.html#fixedlength(int)">splitter.fixedlength(int)</a>

按固定長度拆分;最後一段可能比給定長度短,但不會為空。

splitter.fixedlength(3)

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/splitter.html#omitemptystrings()">omitemptystrings()</a>

從結果中自動忽略空字元串

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/splitter.html#trimresults()">trimresults()</a>

移除結果字元串的前導空白和尾部空白

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/splitter.html#trimresults(com.google.common.base.charmatcher)">trimresults(charmatcher)</a>

給定比對器,移除結果字元串的前導比對字元和尾部比對字元

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/splitter.html#limit(int)">limit(int)</a>

限制拆分出的字元串數量

如果你想要拆分器傳回list,隻要使用lists.newarraylist(splitter.split(string))或類似方法。 警告:splitter執行個體總是不可變的。用來定義splitter目智語義的配置方法總會傳回一個新的splitter執行個體。這使得splitter執行個體都是線程安全的,你可以将其定義為static final常量。

在以前的guava版本中,stringutil類瘋狂地膨脹,其擁有很多處理字元串的方法:allascii、collapse、collapsecontrolchars、collapsewhitespace、indexofchars、lastindexnotof、numsharedchars、removechars、removecrlf、replacechars、retainallchars、strip、stripandcollapse、stripnondigits。 所有這些方法指向兩個概念上的問題:

怎麼才算比對字元?

如何處理這些比對字元?

為了收拾這個泥潭,我們開發了charmatcher。

然而使用charmatcher的好處更在于它提供了一系列方法,讓你對字元作特定類型的操作:修剪[trim]、折疊[collapse]、移除[remove]、保留[retain]等等。charmatcher執行個體首先代表概念1:怎麼才算比對字元?然後它還提供了很多操作概念2:如何處理這些比對字元?這樣的設計使得api複雜度的線性增加可以帶來靈活性和功能兩方面的增長。

注:charmatcher隻處理char類型代表的字元;它不能了解0x10000到0x10ffff的unicode 增補字元。這些邏輯字元以代理對[surrogate pairs]的形式編碼進字元串,而charmatcher隻能将這種邏輯字元看待成兩個獨立的字元。

charmatcher中的常量可以滿足大多數字元比對需求:

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/charmatcher.html#any">any</a>

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/charmatcher.html#none">none</a>

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/charmatcher.html#whitespace">whitespace</a>

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/charmatcher.html#breaking_whitespace">breaking_whitespace</a>

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/charmatcher.html#invisible">invisible</a>

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/charmatcher.html#digit">digit</a>

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/charmatcher.html#java_letter">java_letter</a>

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/charmatcher.html#java_digit">java_digit</a>

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/charmatcher.html#java_letter_or_digit">java_letter_or_digit</a>

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/charmatcher.html#java_iso_control">java_iso_control</a>

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/charmatcher.html#java_lower_case">java_lower_case</a>

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/charmatcher.html#java_upper_case">java_upper_case</a>

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/charmatcher.html#ascii">ascii</a>

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/charmatcher.html#single_width">single_width</a>

其他擷取字元比對器的常見方法包括:

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/charmatcher.html#anyof(java.lang.charsequence)">anyof(charsequence)</a>

枚舉比對字元。如charmatcher.anyof(“aeiou”)比對小寫英語元音

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/charmatcher.html#is(char)">is(char)</a>

給定單一字元比對。

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/charmatcher.html#inrange(char,%20char)">inrange(char, char)</a>

給定字元範圍比對,如charmatcher.inrange(‘a’, ‘z’)

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/charmatcher.html#collapsefrom(java.lang.charsequence,%20char)">collapsefrom(charsequence,   char)</a>

把每組連續的比對字元替換為特定字元。如whitespace.collapsefrom(string, ‘ ‘)把字元串中的連續空白字元替換為單個空格。

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/charmatcher.html#matchesallof(java.lang.charsequence)">matchesallof(charsequence)</a>

測試是否字元序列中的所有字元都比對。

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/charmatcher.html#removefrom(java.lang.charsequence)">removefrom(charsequence)</a>

從字元序列中移除所有比對字元。

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/charmatcher.html#retainfrom(java.lang.charsequence)">retainfrom(charsequence)</a>

在字元序列中保留比對字元,移除其他字元。

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/charmatcher.html#trimfrom(java.lang.charsequence)">trimfrom(charsequence)</a>

移除字元序列的前導比對字元和尾部比對字元。

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/charmatcher.html#replacefrom(java.lang.charsequence,%20java.lang.charsequence)">replacefrom(charsequence,   charsequence)</a>

用特定字元序列替代比對字元。

所有這些方法傳回string,除了matchesallof傳回的是boolean。

不要這樣做字元集處理:

試試這樣寫:

caseformat被用來友善地在各種ascii大小寫規範間轉換字元串——比如,程式設計語言的命名規範。caseformat支援的格式如下:

<b>格式</b><b></b>

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/caseformat.html#lower_camel">lower_camel</a>

lowercamel

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/caseformat.html#lower_hyphen">lower_hyphen</a>

lower-hyphen

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/caseformat.html#lower_underscore">lower_underscore</a>

lower_underscore

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/caseformat.html#upper_camel">upper_camel</a>

uppercamel

<a href="http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/base/caseformat.html#upper_underscore">upper_underscore</a>

upper_underscore

caseformat的用法很直接:

我們caseformat在某些時候尤其有用,比如編寫代碼生成器的時候。