Java中有些常用的API其实值得仔细研究一下,比如String.replace()和String.replaceAll()。以Android7.1源代码为例,仔细研究一下这两个API。定义如下:
/**
* Replaces each substring of this string that matches the literal target
* sequence with the specified literal replacement sequence. The
* replacement proceeds from the beginning of the string to the end, for
* example, replacing "aa" with "b" in the string "aaa" will result in
* "ba" rather than "ab".
*
* @param target The sequence of char values to be replaced
* @param replacement The replacement sequence of char values
* @return The resulting string
* @throws NullPointerException if <code>target</code> or
* <code>replacement</code> is <code>null</code>.
* @since
*/
public String replace(CharSequence target, CharSequence replacement)
/**
* Replaces each substring of this string that matches the given <a
* href="../util/regex/Pattern.html#sum">regular expression</a> with the
* given replacement.
*
* <p> An invocation of this method of the form
* <i>str</i><tt>.replaceAll(</tt><i>regex</i><tt>,</tt> <i>repl</i><tt>)</tt>
* yields exactly the same result as the expression
*
* <blockquote><tt>
* {@link java.util.regex.Pattern}.{@link java.util.regex.Pattern#compile
* compile}(</tt><i>regex</i><tt>).{@link
* java.util.regex.Pattern#matcher(java.lang.CharSequence)
* matcher}(</tt><i>str</i><tt>).{@link java.util.regex.Matcher#replaceAll
* replaceAll}(</tt><i>repl</i><tt>)</tt></blockquote>
*
*<p>
* Note that backslashes (<tt>\</tt>) and dollar signs (<tt>$</tt>) in the
* replacement string may cause the results to be different than if it were
* being treated as a literal replacement string; see
* {@link java.util.regex.Matcher#replaceAll Matcher.replaceAll}.
* Use {@link java.util.regex.Matcher#quoteReplacement} to suppress the special
* meaning of these characters, if desired.
*
* @param regex
* the regular expression to which this string is to be matched
* @param replacement
* the string to be substituted for each match
*
* @return The resulting <tt>String</tt>
*
* @throws PatternSyntaxException
* if the regular expression's syntax is invalid
*
* @see java.util.regex.Pattern
*
* @since
* @spec JSR-
*/
public String replaceAll(String regex, String replacement)
可以看到,
(1)replace():返回输入字符串的一个副本,该副本将字符串中所有出现的target子字符串都替换成replacement。
(2)replaceAll():返回输入字符串的一个副本,改副本将字符串中所有出现的满足正则表达式regex的子字符串都替换成replacement。
显然,两个API的使用场景不同。replaceAll()的功能更强大一些。同时,因为replaceAll()需要处理正则表达式,性能上应该会弱于replace()。但对于同样的需求,性能上有多大的差别的?用一个简单的例子来试验:
long now = System.currentTimeMillis();
for (int i = ; i < ; i++) {
"aabbbc".replace("b", "a");
}
Log.i("TEST", "replace() : " + (System.currentTimeMillis() - now));
now = System.currentTimeMillis();
for (int i = ; i < ; i++) {
"aabbbc".replaceAll("b", "a");
}
Log.i("TEST", "replaceAll() : " + (System.currentTimeMillis() - now));
结果:
10-20 16:26:08.401 19518 19670 I TEST : replace() : 2170
10-20 16:27:47.828 19518 19670 I TEST : replaceAll() : 99427
可以看到,在将规模放大到100万量级,replaceAll()耗时是replace()的接近50倍。(这里暂时不考虑系统线程调度,仅以开始、结束的系统时间戳作为计时依据。另外,上述例子仅仅是为了对于,实际使用中如果是长度为1的字符串的替换,更合适的API当然是replace(char,char)。)
当然,对于固定字符串的替换,一般情况下都会使用replace();对于复杂的正则表达式,也不能不用replaceAll。两者的交叉点往往在于简单的组合,譬如
replace(“a”,”1”).replace(“b”,”1”) vs replaceAll(“[ab]”,”1”)
先不考虑代码整洁与否,只关注性能。从前面的50倍差距来看,直观感觉,是否50是一个临界点呢?也就是说,使用replace()需要调用50次,而replaceAll()实际上需要调用1次。临界点是否存在?继续用试验来探讨。
先用这样一段代码来试探:
long now = System.currentTimeMillis();
for (int i = ; i < ; i++) {
"abcdefghijklmnopqrstuvwxyz"
.replace("a", "1")
.replace("b", "1")
.replace("c", "1")
.replace("d", "1")
.replace("e", "1")
.replace("f", "1")
.replace("g", "1")
.replace("h", "1")
.replace("i", "1")
.replace("j", "1");
}
Log.i("TEST", "replace() : " + (System.currentTimeMillis() - now));
now = System.currentTimeMillis();
for (int i = ; i < ; i++) {
"abcdefghijklmnopqrstuvwxyz".replaceAll("[a-j]", "1");
}
Log.i("TEST", "replaceAll() : " + (System.currentTimeMillis() - now));
规模10万,测试临界点10,结果:
10-20 17:04:55.656 24206 24326 I TEST : replace() : 3274
10-20 17:05:13.844 24206 24326 I TEST : replaceAll() : 18188
依然有约6倍的耗时,具体到单个replace()就是60倍的耗时。
好,临界点扩大到26(满26个英文字母):
long now = System.currentTimeMillis();
for (int i = ; i < ; i++) {
"abcdefghijklmnopqrstuvwxyz"
.replace("a", "1")
.replace("b", "1")
.replace("c", "1")
.replace("d", "1")
.replace("e", "1")
.replace("f", "1")
.replace("g", "1")
.replace("h", "1")
.replace("i", "1")
.replace("j", "1")
.replace("k", "1")
.replace("l", "1")
.replace("m", "1")
.replace("n", "1")
.replace("o", "1")
.replace("p", "1")
.replace("q", "1")
.replace("r", "1")
.replace("s", "1")
.replace("t", "1")
.replace("u", "1")
.replace("v", "1")
.replace("w", "1")
.replace("x", "1")
.replace("y", "1")
.replace("z", "1");
}
Log.i("TEST", "replace() : " + (System.currentTimeMillis() - now));
now = System.currentTimeMillis();
for (int i = ; i < ; i++) {
"abcdefghijklmnopqrstuvwxyz".replaceAll("[a-z]", "1");
}
Log.i("TEST", "replaceAll() : " + (System.currentTimeMillis() - now));
结果:
10-20 17:02:50.440 22178 22248 I TEST : replace() : 8232
10-20 17:03:21.954 22178 22248 I TEST : replaceAll() : 31514
仍有接近4倍的耗时,具体到单个replace()就是100倍的耗时。
随着正则表达式本身的膨胀,replaceAll()的耗时也在增加。
以上已经将26个字母都涵盖到,依然没有达到临界点,所以基本上可以得到结论,对于简单型的替换而言,单以性能考虑,显然replace()是更好的选择。