天天看點

String.replace()和String.replaceAll()性能對比

Java中有些常用的API其實值得仔細研究一下,比如String.replace()和String.replaceAll()。以Android7.1源代碼為例,仔細研究一下這兩個API。定義如下:

/**
     * Replaces each substring of this string that matches the literal target
     * sequence with the specified literal replacement sequence. The
     * replacement proceeds from the beginning of the string to the end, for
     * example, replacing "aa" with "b" in the string "aaa" will result in
     * "ba" rather than "ab".
     *
     * @param  target The sequence of char values to be replaced
     * @param  replacement The replacement sequence of char values
     * @return  The resulting string
     * @throws NullPointerException if <code>target</code> or
     *         <code>replacement</code> is <code>null</code>.
     * @since 
     */
    public String replace(CharSequence target, CharSequence replacement)
           
/**
     * Replaces each substring of this string that matches the given <a
     * href="../util/regex/Pattern.html#sum">regular expression</a> with the
     * given replacement.
     *
     * <p> An invocation of this method of the form
     * <i>str</i><tt>.replaceAll(</tt><i>regex</i><tt>,</tt> <i>repl</i><tt>)</tt>
     * yields exactly the same result as the expression
     *
     * <blockquote><tt>
     * {@link java.util.regex.Pattern}.{@link java.util.regex.Pattern#compile
     * compile}(</tt><i>regex</i><tt>).{@link
     * java.util.regex.Pattern#matcher(java.lang.CharSequence)
     * matcher}(</tt><i>str</i><tt>).{@link java.util.regex.Matcher#replaceAll
     * replaceAll}(</tt><i>repl</i><tt>)</tt></blockquote>
     *
     *<p>
     * Note that backslashes (<tt>\</tt>) and dollar signs (<tt>$</tt>) in the
     * replacement string may cause the results to be different than if it were
     * being treated as a literal replacement string; see
     * {@link java.util.regex.Matcher#replaceAll Matcher.replaceAll}.
     * Use {@link java.util.regex.Matcher#quoteReplacement} to suppress the special
     * meaning of these characters, if desired.
     *
     * @param   regex
     *          the regular expression to which this string is to be matched
     * @param   replacement
     *          the string to be substituted for each match
     *
     * @return  The resulting <tt>String</tt>
     *
     * @throws  PatternSyntaxException
     *          if the regular expression's syntax is invalid
     *
     * @see java.util.regex.Pattern
     *
     * @since 
     * @spec JSR-
     */
    public String replaceAll(String regex, String replacement) 
           

可以看到,

(1)replace():傳回輸入字元串的一個副本,該副本将字元串中所有出現的target子字元串都替換成replacement。

(2)replaceAll():傳回輸入字元串的一個副本,改副本将字元串中所有出現的滿足正規表達式regex的子字元串都替換成replacement。

顯然,兩個API的使用場景不同。replaceAll()的功能更強大一些。同時,因為replaceAll()需要處理正規表達式,性能上應該會弱于replace()。但對于同樣的需求,性能上有多大的差别的?用一個簡單的例子來試驗:

long now = System.currentTimeMillis();
        for (int i =  ; i <  ; i++) {
            "aabbbc".replace("b", "a");
        }
        Log.i("TEST", "replace() : " + (System.currentTimeMillis() - now));

        now = System.currentTimeMillis();
        for (int i =  ; i <  ; i++) {
            "aabbbc".replaceAll("b", "a");
        }
        Log.i("TEST", "replaceAll() : " + (System.currentTimeMillis() - now));
           

結果:

10-20 16:26:08.401 19518 19670 I TEST    : replace() : 2170
10-20 16:27:47.828 19518 19670 I TEST    : replaceAll() : 99427
           

可以看到,在将規模放大到100萬量級,replaceAll()耗時是replace()的接近50倍。(這裡暫時不考慮系統線程排程,僅以開始、結束的系統時間戳作為計時依據。另外,上述例子僅僅是為了對于,實際使用中如果是長度為1的字元串的替換,更合适的API當然是replace(char,char)。)

當然,對于固定字元串的替換,一般情況下都會使用replace();對于複雜的正規表達式,也不能不用replaceAll。兩者的交叉點往往在于簡單的組合,譬如

replace(“a”,”1”).replace(“b”,”1”) vs replaceAll(“[ab]”,”1”)

先不考慮代碼整潔與否,隻關注性能。從前面的50倍差距來看,直覺感覺,是否50是一個臨界點呢?也就是說,使用replace()需要調用50次,而replaceAll()實際上需要調用1次。臨界點是否存在?繼續用試驗來探讨。

先用這樣一段代碼來試探:

long now = System.currentTimeMillis();
                for (int i =  ; i <  ; i++) {
                    "abcdefghijklmnopqrstuvwxyz"
                            .replace("a", "1")
                            .replace("b", "1")
                            .replace("c", "1")
                            .replace("d", "1")
                            .replace("e", "1")
                            .replace("f", "1")
                            .replace("g", "1")
                            .replace("h", "1")
                            .replace("i", "1")
                            .replace("j", "1");
                }
                Log.i("TEST", "replace() : " + (System.currentTimeMillis() - now));

                now = System.currentTimeMillis();
                for (int i =  ; i <  ; i++) {
                    "abcdefghijklmnopqrstuvwxyz".replaceAll("[a-j]", "1");
                }
                Log.i("TEST", "replaceAll() : " + (System.currentTimeMillis() - now));
           

規模10萬,測試臨界點10,結果:

10-20 17:04:55.656 24206 24326 I TEST    : replace() : 3274
10-20 17:05:13.844 24206 24326 I TEST    : replaceAll() : 18188
           

依然有約6倍的耗時,具體到單個replace()就是60倍的耗時。

好,臨界點擴大到26(滿26個英文字母):

long now = System.currentTimeMillis();
                for (int i =  ; i <  ; i++) {
                    "abcdefghijklmnopqrstuvwxyz"
                            .replace("a", "1")
                            .replace("b", "1")
                            .replace("c", "1")
                            .replace("d", "1")
                            .replace("e", "1")
                            .replace("f", "1")
                            .replace("g", "1")
                            .replace("h", "1")
                            .replace("i", "1")
                            .replace("j", "1")
                            .replace("k", "1")
                            .replace("l", "1")
                            .replace("m", "1")
                            .replace("n", "1")
                            .replace("o", "1")
                            .replace("p", "1")
                            .replace("q", "1")
                            .replace("r", "1")
                            .replace("s", "1")
                            .replace("t", "1")
                            .replace("u", "1")
                            .replace("v", "1")
                            .replace("w", "1")
                            .replace("x", "1")
                            .replace("y", "1")
                            .replace("z", "1");
                }
                Log.i("TEST", "replace() : " + (System.currentTimeMillis() - now));

                now = System.currentTimeMillis();
                for (int i =  ; i <  ; i++) {
                    "abcdefghijklmnopqrstuvwxyz".replaceAll("[a-z]", "1");
                }
                Log.i("TEST", "replaceAll() : " + (System.currentTimeMillis() - now));
           

結果:

10-20 17:02:50.440 22178 22248 I TEST    : replace() : 8232
10-20 17:03:21.954 22178 22248 I TEST    : replaceAll() : 31514
           

仍有接近4倍的耗時,具體到單個replace()就是100倍的耗時。

随着正規表達式本身的膨脹,replaceAll()的耗時也在增加。

以上已經将26個字母都涵蓋到,依然沒有達到臨界點,是以基本上可以得到結論,對于簡單型的替換而言,單以性能考慮,顯然replace()是更好的選擇。