天天看点

String.replace()和String.replaceAll()性能对比

Java中有些常用的API其实值得仔细研究一下,比如String.replace()和String.replaceAll()。以Android7.1源代码为例,仔细研究一下这两个API。定义如下:

/**
     * Replaces each substring of this string that matches the literal target
     * sequence with the specified literal replacement sequence. The
     * replacement proceeds from the beginning of the string to the end, for
     * example, replacing "aa" with "b" in the string "aaa" will result in
     * "ba" rather than "ab".
     *
     * @param  target The sequence of char values to be replaced
     * @param  replacement The replacement sequence of char values
     * @return  The resulting string
     * @throws NullPointerException if <code>target</code> or
     *         <code>replacement</code> is <code>null</code>.
     * @since 
     */
    public String replace(CharSequence target, CharSequence replacement)
           
/**
     * Replaces each substring of this string that matches the given <a
     * href="../util/regex/Pattern.html#sum">regular expression</a> with the
     * given replacement.
     *
     * <p> An invocation of this method of the form
     * <i>str</i><tt>.replaceAll(</tt><i>regex</i><tt>,</tt> <i>repl</i><tt>)</tt>
     * yields exactly the same result as the expression
     *
     * <blockquote><tt>
     * {@link java.util.regex.Pattern}.{@link java.util.regex.Pattern#compile
     * compile}(</tt><i>regex</i><tt>).{@link
     * java.util.regex.Pattern#matcher(java.lang.CharSequence)
     * matcher}(</tt><i>str</i><tt>).{@link java.util.regex.Matcher#replaceAll
     * replaceAll}(</tt><i>repl</i><tt>)</tt></blockquote>
     *
     *<p>
     * Note that backslashes (<tt>\</tt>) and dollar signs (<tt>$</tt>) in the
     * replacement string may cause the results to be different than if it were
     * being treated as a literal replacement string; see
     * {@link java.util.regex.Matcher#replaceAll Matcher.replaceAll}.
     * Use {@link java.util.regex.Matcher#quoteReplacement} to suppress the special
     * meaning of these characters, if desired.
     *
     * @param   regex
     *          the regular expression to which this string is to be matched
     * @param   replacement
     *          the string to be substituted for each match
     *
     * @return  The resulting <tt>String</tt>
     *
     * @throws  PatternSyntaxException
     *          if the regular expression's syntax is invalid
     *
     * @see java.util.regex.Pattern
     *
     * @since 
     * @spec JSR-
     */
    public String replaceAll(String regex, String replacement) 
           

可以看到,

(1)replace():返回输入字符串的一个副本,该副本将字符串中所有出现的target子字符串都替换成replacement。

(2)replaceAll():返回输入字符串的一个副本,改副本将字符串中所有出现的满足正则表达式regex的子字符串都替换成replacement。

显然,两个API的使用场景不同。replaceAll()的功能更强大一些。同时,因为replaceAll()需要处理正则表达式,性能上应该会弱于replace()。但对于同样的需求,性能上有多大的差别的?用一个简单的例子来试验:

long now = System.currentTimeMillis();
        for (int i =  ; i <  ; i++) {
            "aabbbc".replace("b", "a");
        }
        Log.i("TEST", "replace() : " + (System.currentTimeMillis() - now));

        now = System.currentTimeMillis();
        for (int i =  ; i <  ; i++) {
            "aabbbc".replaceAll("b", "a");
        }
        Log.i("TEST", "replaceAll() : " + (System.currentTimeMillis() - now));
           

结果:

10-20 16:26:08.401 19518 19670 I TEST    : replace() : 2170
10-20 16:27:47.828 19518 19670 I TEST    : replaceAll() : 99427
           

可以看到,在将规模放大到100万量级,replaceAll()耗时是replace()的接近50倍。(这里暂时不考虑系统线程调度,仅以开始、结束的系统时间戳作为计时依据。另外,上述例子仅仅是为了对于,实际使用中如果是长度为1的字符串的替换,更合适的API当然是replace(char,char)。)

当然,对于固定字符串的替换,一般情况下都会使用replace();对于复杂的正则表达式,也不能不用replaceAll。两者的交叉点往往在于简单的组合,譬如

replace(“a”,”1”).replace(“b”,”1”) vs replaceAll(“[ab]”,”1”)

先不考虑代码整洁与否,只关注性能。从前面的50倍差距来看,直观感觉,是否50是一个临界点呢?也就是说,使用replace()需要调用50次,而replaceAll()实际上需要调用1次。临界点是否存在?继续用试验来探讨。

先用这样一段代码来试探:

long now = System.currentTimeMillis();
                for (int i =  ; i <  ; i++) {
                    "abcdefghijklmnopqrstuvwxyz"
                            .replace("a", "1")
                            .replace("b", "1")
                            .replace("c", "1")
                            .replace("d", "1")
                            .replace("e", "1")
                            .replace("f", "1")
                            .replace("g", "1")
                            .replace("h", "1")
                            .replace("i", "1")
                            .replace("j", "1");
                }
                Log.i("TEST", "replace() : " + (System.currentTimeMillis() - now));

                now = System.currentTimeMillis();
                for (int i =  ; i <  ; i++) {
                    "abcdefghijklmnopqrstuvwxyz".replaceAll("[a-j]", "1");
                }
                Log.i("TEST", "replaceAll() : " + (System.currentTimeMillis() - now));
           

规模10万,测试临界点10,结果:

10-20 17:04:55.656 24206 24326 I TEST    : replace() : 3274
10-20 17:05:13.844 24206 24326 I TEST    : replaceAll() : 18188
           

依然有约6倍的耗时,具体到单个replace()就是60倍的耗时。

好,临界点扩大到26(满26个英文字母):

long now = System.currentTimeMillis();
                for (int i =  ; i <  ; i++) {
                    "abcdefghijklmnopqrstuvwxyz"
                            .replace("a", "1")
                            .replace("b", "1")
                            .replace("c", "1")
                            .replace("d", "1")
                            .replace("e", "1")
                            .replace("f", "1")
                            .replace("g", "1")
                            .replace("h", "1")
                            .replace("i", "1")
                            .replace("j", "1")
                            .replace("k", "1")
                            .replace("l", "1")
                            .replace("m", "1")
                            .replace("n", "1")
                            .replace("o", "1")
                            .replace("p", "1")
                            .replace("q", "1")
                            .replace("r", "1")
                            .replace("s", "1")
                            .replace("t", "1")
                            .replace("u", "1")
                            .replace("v", "1")
                            .replace("w", "1")
                            .replace("x", "1")
                            .replace("y", "1")
                            .replace("z", "1");
                }
                Log.i("TEST", "replace() : " + (System.currentTimeMillis() - now));

                now = System.currentTimeMillis();
                for (int i =  ; i <  ; i++) {
                    "abcdefghijklmnopqrstuvwxyz".replaceAll("[a-z]", "1");
                }
                Log.i("TEST", "replaceAll() : " + (System.currentTimeMillis() - now));
           

结果:

10-20 17:02:50.440 22178 22248 I TEST    : replace() : 8232
10-20 17:03:21.954 22178 22248 I TEST    : replaceAll() : 31514
           

仍有接近4倍的耗时,具体到单个replace()就是100倍的耗时。

随着正则表达式本身的膨胀,replaceAll()的耗时也在增加。

以上已经将26个字母都涵盖到,依然没有达到临界点,所以基本上可以得到结论,对于简单型的替换而言,单以性能考虑,显然replace()是更好的选择。