Java中有些常用的API其實值得仔細研究一下,比如String.replace()和String.replaceAll()。以Android7.1源代碼為例,仔細研究一下這兩個API。定義如下:
/**
* Replaces each substring of this string that matches the literal target
* sequence with the specified literal replacement sequence. The
* replacement proceeds from the beginning of the string to the end, for
* example, replacing "aa" with "b" in the string "aaa" will result in
* "ba" rather than "ab".
*
* @param target The sequence of char values to be replaced
* @param replacement The replacement sequence of char values
* @return The resulting string
* @throws NullPointerException if <code>target</code> or
* <code>replacement</code> is <code>null</code>.
* @since
*/
public String replace(CharSequence target, CharSequence replacement)
/**
* Replaces each substring of this string that matches the given <a
* href="../util/regex/Pattern.html#sum">regular expression</a> with the
* given replacement.
*
* <p> An invocation of this method of the form
* <i>str</i><tt>.replaceAll(</tt><i>regex</i><tt>,</tt> <i>repl</i><tt>)</tt>
* yields exactly the same result as the expression
*
* <blockquote><tt>
* {@link java.util.regex.Pattern}.{@link java.util.regex.Pattern#compile
* compile}(</tt><i>regex</i><tt>).{@link
* java.util.regex.Pattern#matcher(java.lang.CharSequence)
* matcher}(</tt><i>str</i><tt>).{@link java.util.regex.Matcher#replaceAll
* replaceAll}(</tt><i>repl</i><tt>)</tt></blockquote>
*
*<p>
* Note that backslashes (<tt>\</tt>) and dollar signs (<tt>$</tt>) in the
* replacement string may cause the results to be different than if it were
* being treated as a literal replacement string; see
* {@link java.util.regex.Matcher#replaceAll Matcher.replaceAll}.
* Use {@link java.util.regex.Matcher#quoteReplacement} to suppress the special
* meaning of these characters, if desired.
*
* @param regex
* the regular expression to which this string is to be matched
* @param replacement
* the string to be substituted for each match
*
* @return The resulting <tt>String</tt>
*
* @throws PatternSyntaxException
* if the regular expression's syntax is invalid
*
* @see java.util.regex.Pattern
*
* @since
* @spec JSR-
*/
public String replaceAll(String regex, String replacement)
可以看到,
(1)replace():傳回輸入字元串的一個副本,該副本将字元串中所有出現的target子字元串都替換成replacement。
(2)replaceAll():傳回輸入字元串的一個副本,改副本将字元串中所有出現的滿足正規表達式regex的子字元串都替換成replacement。
顯然,兩個API的使用場景不同。replaceAll()的功能更強大一些。同時,因為replaceAll()需要處理正規表達式,性能上應該會弱于replace()。但對于同樣的需求,性能上有多大的差别的?用一個簡單的例子來試驗:
long now = System.currentTimeMillis();
for (int i = ; i < ; i++) {
"aabbbc".replace("b", "a");
}
Log.i("TEST", "replace() : " + (System.currentTimeMillis() - now));
now = System.currentTimeMillis();
for (int i = ; i < ; i++) {
"aabbbc".replaceAll("b", "a");
}
Log.i("TEST", "replaceAll() : " + (System.currentTimeMillis() - now));
結果:
10-20 16:26:08.401 19518 19670 I TEST : replace() : 2170
10-20 16:27:47.828 19518 19670 I TEST : replaceAll() : 99427
可以看到,在将規模放大到100萬量級,replaceAll()耗時是replace()的接近50倍。(這裡暫時不考慮系統線程排程,僅以開始、結束的系統時間戳作為計時依據。另外,上述例子僅僅是為了對于,實際使用中如果是長度為1的字元串的替換,更合适的API當然是replace(char,char)。)
當然,對于固定字元串的替換,一般情況下都會使用replace();對于複雜的正規表達式,也不能不用replaceAll。兩者的交叉點往往在于簡單的組合,譬如
replace(“a”,”1”).replace(“b”,”1”) vs replaceAll(“[ab]”,”1”)
先不考慮代碼整潔與否,隻關注性能。從前面的50倍差距來看,直覺感覺,是否50是一個臨界點呢?也就是說,使用replace()需要調用50次,而replaceAll()實際上需要調用1次。臨界點是否存在?繼續用試驗來探讨。
先用這樣一段代碼來試探:
long now = System.currentTimeMillis();
for (int i = ; i < ; i++) {
"abcdefghijklmnopqrstuvwxyz"
.replace("a", "1")
.replace("b", "1")
.replace("c", "1")
.replace("d", "1")
.replace("e", "1")
.replace("f", "1")
.replace("g", "1")
.replace("h", "1")
.replace("i", "1")
.replace("j", "1");
}
Log.i("TEST", "replace() : " + (System.currentTimeMillis() - now));
now = System.currentTimeMillis();
for (int i = ; i < ; i++) {
"abcdefghijklmnopqrstuvwxyz".replaceAll("[a-j]", "1");
}
Log.i("TEST", "replaceAll() : " + (System.currentTimeMillis() - now));
規模10萬,測試臨界點10,結果:
10-20 17:04:55.656 24206 24326 I TEST : replace() : 3274
10-20 17:05:13.844 24206 24326 I TEST : replaceAll() : 18188
依然有約6倍的耗時,具體到單個replace()就是60倍的耗時。
好,臨界點擴大到26(滿26個英文字母):
long now = System.currentTimeMillis();
for (int i = ; i < ; i++) {
"abcdefghijklmnopqrstuvwxyz"
.replace("a", "1")
.replace("b", "1")
.replace("c", "1")
.replace("d", "1")
.replace("e", "1")
.replace("f", "1")
.replace("g", "1")
.replace("h", "1")
.replace("i", "1")
.replace("j", "1")
.replace("k", "1")
.replace("l", "1")
.replace("m", "1")
.replace("n", "1")
.replace("o", "1")
.replace("p", "1")
.replace("q", "1")
.replace("r", "1")
.replace("s", "1")
.replace("t", "1")
.replace("u", "1")
.replace("v", "1")
.replace("w", "1")
.replace("x", "1")
.replace("y", "1")
.replace("z", "1");
}
Log.i("TEST", "replace() : " + (System.currentTimeMillis() - now));
now = System.currentTimeMillis();
for (int i = ; i < ; i++) {
"abcdefghijklmnopqrstuvwxyz".replaceAll("[a-z]", "1");
}
Log.i("TEST", "replaceAll() : " + (System.currentTimeMillis() - now));
結果:
10-20 17:02:50.440 22178 22248 I TEST : replace() : 8232
10-20 17:03:21.954 22178 22248 I TEST : replaceAll() : 31514
仍有接近4倍的耗時,具體到單個replace()就是100倍的耗時。
随着正規表達式本身的膨脹,replaceAll()的耗時也在增加。
以上已經将26個字母都涵蓋到,依然沒有達到臨界點,是以基本上可以得到結論,對于簡單型的替換而言,單以性能考慮,顯然replace()是更好的選擇。