replace和replaceAll是String类中提供的两种用于字符/字符串替换的方法。如果只从字面意思理解,很容易误解为replace表示替换单个匹配项,而replaceAll表示替换所有匹配项;而事实上则完全不是这样:P
1、概述
2、相关类String、Pattern、Matcher
3、相关方法
3.1、Matcher
3.2、Pattern
3.3、String
4、结论
1、概述
String类中一共提供了四种替换字符/字符串相关的方法,分别是replace的两个重载方法、replaceAll方法和replaceFirst方法。
- replace(字符):全部匹配的都替换;参数为字符(char)类型;不调用Pattern和Matcher方法。
- replace(字串接口实现类):全部匹配的都替换;参数为字串接口实现类(如String);不支持正则匹配,调用Pattern(不匹配正则模式)和Matcher的replaceAll方法。
- replaceAll:全部匹配的都替换,参数为String类型,支持正则匹配;调用Pattern(匹配正则模式)和Matcher的replaceAll方法。
- replaceFirst:第一个匹配到的替换,参数为String类型,支持正则匹配;调用Pattern(匹配正则模式)和Matcher的replaceFirst方法。
2、相关类String、Pattern、Matcher
- String类:
public final class String implements java.io.Serializable, Comparable<String>, CharSequence
字符串和相关方法的类:The String class represents character strings. All string literals in Java programs, such as "abc", are implemented as instances of this class.
详细介绍见以下两篇博客:
- Pattern && Matcher
正则表达式捕获组的概念:https://blog.csdn.net/kofandlizi/article/details/7323863
Pattern和Matcher大概介绍:https://blog.csdn.net/yin380697242/article/details/52049999
总的来说,Pattern类的作用在于编译正则表达式后创建一个匹配模式,Matcher类使用Pattern实例提供的模式信息对正则表达式进行匹配。
- String、Pattern、Matcher类的相关方法调用图
3、相关方法
3.1、Matcher
详细见这篇博文:https://www.cnblogs.com/SQP51312/p/6134324.html
- Matcher(Pattern parent, CharSequence text);
Matcher的构造函数,包访问权限,不允许外部生成Matcher的实例
/**
* All matchers have the state used by Pattern during a match.
*/
Matcher(Pattern parent, CharSequence text) {
this.parentPattern = parent;
this.text = text;
// Allocate state storage
int parentGroupCount = Math.max(parent.capturingGroupCount, 10);
groups = new int[parentGroupCount * 2]; //数组groups是组使用的存储。存储的是当前匹配的各捕获组的first和last信息。
locals = new int[parent.localCount];
// Put fields into initial states
reset();
}
- public Matcher appendReplacement(StringBuffer sb, String replacement);
将当前匹配子串替换为指定字符串,并将从上次匹配结束后到本次匹配结束后之间的字符串添加到一个StringBuffer对象中,最后返回其字符串表示形式。
/**
* Implements a non-terminal append-and-replace step.
*
* <p> This method performs the following actions: </p>
*
* <ol>
*
* <li><p> It reads characters from the input sequence, starting at the
* append position, and appends them to the given string buffer. It
* stops after reading the last character preceding the previous match,
* that is, the character at index {@link
* #start()} <tt>-</tt> <tt>1</tt>. </p></li>
*
* <li><p> It appends the given replacement string to the string buffer.
* </p></li>
*
* <li><p> It sets the append position of this matcher to the index of
* the last character matched, plus one, that is, to {@link #end()}.
* </p></li>
*
* </ol>
*
* <p> The replacement string may contain references to subsequences
* captured during the previous match: Each occurrence of
* <tt>${</tt><i>name</i><tt>}</tt> or <tt>$</tt><i>g</i>
* will be replaced by the result of evaluating the corresponding
* {@link #group(String) group(name)} or {@link #group(int) group(g)</tt>}
* respectively. For <tt>$</tt><i>g</i><tt></tt>,
* the first number after the <tt>$</tt> is always treated as part of
* the group reference. Subsequent numbers are incorporated into g if
* they would form a legal group reference. Only the numerals '0'
* through '9' are considered as potential components of the group
* reference. If the second group matched the string <tt>"foo"</tt>, for
* example, then passing the replacement string <tt>"$2bar"</tt> would
* cause <tt>"foobar"</tt> to be appended to the string buffer. A dollar
* sign (<tt>$</tt>) may be included as a literal in the replacement
* string by preceding it with a backslash (<tt>\$</tt>).
*
* <p> Note that backslashes (<tt>\</tt>) and dollar signs (<tt>$</tt>) in
* the replacement string may cause the results to be different than if it
* were being treated as a literal replacement string. Dollar signs may be
* treated as references to captured subsequences as described above, and
* backslashes are used to escape literal characters in the replacement
* string.
*
* <p> This method is intended to be used in a loop together with the
* {@link #appendTail appendTail} and {@link #find find} methods. The
* following code, for example, writes <tt>one dog two dogs in the
* yard</tt> to the standard-output stream: </p>
*
* <blockquote><pre>
* Pattern p = Pattern.compile("cat");
* Matcher m = p.matcher("one cat two cats in the yard");
* StringBuffer sb = new StringBuffer();
* while (m.find()) {
* m.appendReplacement(sb, "dog");
* }
* m.appendTail(sb);
* System.out.println(sb.toString());</pre></blockquote>
*
* @param sb
* The target string buffer
*
* @param replacement
* The replacement string
*
* @return This matcher
*
* @throws IllegalStateException
* If no match has yet been attempted,
* or if the previous match operation failed
*
* @throws IllegalArgumentException
* If the replacement string refers to a named-capturing
* group that does not exist in the pattern
*
* @throws IndexOutOfBoundsException
* If the replacement string refers to a capturing group
* that does not exist in the pattern
*/
public Matcher appendReplacement(StringBuffer sb, String replacement) {
// If no match, return error
if (first < 0)
throw new IllegalStateException("No match available");
// Process substitution string to replace group references with groups
int cursor = 0;
StringBuilder result = new StringBuilder();
while (cursor < replacement.length()) { // 1start
char nextChar = replacement.charAt(cursor);
if (nextChar == '\\') { // 2start
cursor++;
nextChar = replacement.charAt(cursor);
result.append(nextChar);
cursor++;
} else if (nextChar == '$') { // 2end,3start
// Skip past $
cursor++;
// A StringIndexOutOfBoundsException is thrown if
// this "$" is the last character in replacement
// string in current implementation, a IAE might be
// more appropriate.
nextChar = replacement.charAt(cursor);
int refNum = -1;
if (nextChar == '{') { // 4start
cursor++;
StringBuilder gsb = new StringBuilder();
while (cursor < replacement.length()) { // 5start
nextChar = replacement.charAt(cursor);
if (ASCII.isLower(nextChar) || ASCII.isUpper(nextChar) || ASCII.isDigit(nextChar)) { // 6start
gsb.append(nextChar);
cursor++;
} else { // 6end,7start
break;
} // 7end
} // 5end
if (gsb.length() == 0)
throw new IllegalArgumentException("named capturing group has 0 length name");
if (nextChar != '}')
throw new IllegalArgumentException("named capturing group is missing trailing '}'");
String gname = gsb.toString();
if (ASCII.isDigit(gname.charAt(0)))
throw new IllegalArgumentException("capturing group name {" + gname + "} starts with digit character");
if (!parentPattern.namedGroups().containsKey(gname))
throw new IllegalArgumentException("No group with name {" + gname + "}");
refNum = parentPattern.namedGroups().get(gname);
cursor++;
} else { // 4end,8start
// The first number is always a group
refNum = (int)nextChar - '0';
if ((refNum < 0)||(refNum > 9))
throw new IllegalArgumentException("Illegal group reference");
cursor++;
// Capture the largest legal group string
boolean done = false;
while (!done) { // 9start
if (cursor >= replacement.length()) { // 10start
break;
} // 10end
int nextDigit = replacement.charAt(cursor) - '0';
if ((nextDigit < 0)||(nextDigit > 9)) { // 11start
// not a number
break;
} // 11end
int newRefNum = (refNum * 10) + nextDigit;
if (groupCount() < newRefNum) { // 12start
done = true;
} else { // 12end,13start
refNum = newRefNum;
cursor++;
} // 13end
} // 9end
} // 8end
// Append group
if (start(refNum) != -1 && end(refNum) != -1)
result.append(text, start(refNum), end(refNum));
} else { // 3end,14start
result.append(nextChar);
cursor++;
} // 14end
} // 1end
// Append the intervening text
sb.append(text, lastAppendPosition, first);
// Append the match substitution
sb.append(result);
lastAppendPosition = last;
return this;
}
- public StringBuffer appendTail(StringBuffer sb);
将最后一次匹配工作后剩余的字符串添加到一个StringBuffer对象里。
/**
* Implements a terminal append-and-replace step.
*
* <p> This method reads characters from the input sequence, starting at
* the append position, and appends them to the given string buffer. It is
* intended to be invoked after one or more invocations of the {@link
* #appendReplacement appendReplacement} method in order to copy the
* remainder of the input sequence. </p>
*
* @param sb
* The target string buffer
*
* @return The target string buffer
*/
public StringBuffer appendTail(StringBuffer sb) {
sb.append(text, lastAppendPosition, getTextLength());
return sb;
}
- public String replaceAll(String replacement);
将匹配的子串用指定的字符串替换。此方法首先重置匹配器,然后判断是否有匹配,若有,则创建StringBuffer 对象,然后循环调用appendReplacement方法进行替换,最后调用 appendTail方法并返回StringBuffer 对象的字符串形式。
/**
* Replaces every subsequence of the input sequence that matches the
* pattern with the given replacement string.
*
* <p> This method first resets this matcher. It then scans the input
* sequence looking for matches of the pattern. Characters that are not
* part of any match are appended directly to the result string; each match
* is replaced in the result by the replacement string. The replacement
* string may contain references to captured subsequences as in the {@link
* #appendReplacement appendReplacement} method.
*
* <p> Note that backslashes (<tt>\</tt>) and dollar signs (<tt>$</tt>) in
* the replacement string may cause the results to be different than if it
* were being treated as a literal replacement string. Dollar signs may be
* treated as references to captured subsequences as described above, and
* backslashes are used to escape literal characters in the replacement
* string.
*
* <p> Given the regular expression <tt>a*b</tt>, the input
* <tt>"aabfooaabfooabfoob"</tt>, and the replacement string
* <tt>"-"</tt>, an invocation of this method on a matcher for that
* expression would yield the string <tt>"-foo-foo-foo-"</tt>.
*
* <p> Invoking this method changes this matcher's state. If the matcher
* is to be used in further matching operations then it should first be
* reset. </p>
*
* @param replacement
* The replacement string
*
* @return The string constructed by replacing each matching subsequence
* by the replacement string, substituting captured subsequences
* as needed
*/
public String replaceAll(String replacement) {
reset();
boolean result = find();
if (result) {
StringBuffer sb = new StringBuffer();
do {
appendReplacement(sb, replacement);
result = find();
} while (result);
appendTail(sb);
return sb.toString();
}
return text.toString();
}
- public String replaceFirst(String replacement);
将匹配的第一个子串用指定的字符串替换。
/**
* Replaces the first subsequence of the input sequence that matches the
* pattern with the given replacement string.
*
* <p> This method first resets this matcher. It then scans the input
* sequence looking for a match of the pattern. Characters that are not
* part of the match are appended directly to the result string; the match
* is replaced in the result by the replacement string. The replacement
* string may contain references to captured subsequences as in the {@link
* #appendReplacement appendReplacement} method.
*
* <p>Note that backslashes (<tt>\</tt>) and dollar signs (<tt>$</tt>) in
* the replacement string may cause the results to be different than if it
* were being treated as a literal replacement string. Dollar signs may be
* treated as references to captured subsequences as described above, and
* backslashes are used to escape literal characters in the replacement
* string.
*
* <p> Given the regular expression <tt>dog</tt>, the input
* <tt>"zzzdogzzzdogzzz"</tt>, and the replacement string
* <tt>"cat"</tt>, an invocation of this method on a matcher for that
* expression would yield the string <tt>"zzzcatzzzdogzzz"</tt>. </p>
*
* <p> Invoking this method changes this matcher's state. If the matcher
* is to be used in further matching operations then it should first be
* reset. </p>
*
* @param replacement
* The replacement string
* @return The string constructed by replacing the first matching
* subsequence by the replacement string, substituting captured
* subsequences as needed
*/
public String replaceFirst(String replacement) {
if (replacement == null)
throw new NullPointerException("replacement");
reset();
if (!find())
return text.toString();
StringBuffer sb = new StringBuffer();
appendReplacement(sb, replacement);
appendTail(sb);
return sb.toString();
}
3.2、Pattern
详细见这篇博文:http://www.cnblogs.com/SQP51312/p/6136304.html
- private Pattern(String p, int f);
Pattern类的构造函数,由于私有,所以外部不能创造其实例,而是通过Pattern.compile(regex)创建pattern实例。
/**
* This private constructor is used to create all Patterns. The pattern
* string and match flags are all that is needed to completely describe
* a Pattern. An empty pattern string results in an object tree with
* only a Start node and a LastNode node.
*/
private Pattern(String p, int f) {
pattern = p;
flags = f;
// to use UNICODE_CASE if UNICODE_CHARACTER_CLASS present
if ((flags & UNICODE_CHARACTER_CLASS) != 0)
flags |= UNICODE_CASE;
// Reset group index count
capturingGroupCount = 1;
localCount = 0;
if (pattern.length() > 0) {
compile();
} else {
root = new Start(lastAccept);
matchRoot = lastAccept;
}
}
- public Matcher matcher(CharSequence input);
供外部获取生成的Matcher实例。
/**
* Creates a matcher that will match the given input against this pattern.
* </p>
*
* @param input
* The character sequence to be matched
*
* @return A new matcher for this pattern
*/
public Matcher matcher(CharSequence input) {
if (!compiled) {
synchronized(this) {
if (!compiled)
compile();
}
}
Matcher m = new Matcher(this, input);
return m;
}
- public static Pattern compile(String regex, int flags);
调用Pattern构造函数,生成pattern实例。
public static Pattern compile(String regex, int flags) {
return new Pattern(regex, flags);
}
- public static Pattern compile(String regex);
public static Pattern compile(String regex) {
return new Pattern(regex, 0);
}
3.3、String
- public String replace(char oldChar, char newChar);
String类中对replace方法进行了重载,参数可以为单个字符,也可以为实现了CharSequence接口的类(String类是其中之一);而replace在字符替换中,采用的是新建buf数组,然后遍历源数组将需要替换的字符用新字符写入buf数组。
注意:不要望文生义,从源代码来看,replace方法仍然是替换了所有的目标字符!!!
/**
* Returns a new string resulting from replacing all occurrences of
* <code>oldChar</code> in this string with <code>newChar</code>.
* <p>
* If the character <code>oldChar</code> does not occur in the
* character sequence represented by this <code>String</code> object,
* then a reference to this <code>String</code> object is returned.
* Otherwise, a new <code>String</code> object is created that
* represents a character sequence identical to the character sequence
* represented by this <code>String</code> object, except that every
* occurrence of <code>oldChar</code> is replaced by an occurrence
* of <code>newChar</code>.
* <p>
* Examples:
* <blockquote><pre>
* "mesquite in your cellar".replace('e', 'o')
* returns "mosquito in your collar"
* "the war of baronets".replace('r', 'y')
* returns "the way of bayonets"
* "sparring with a purple porpoise".replace('p', 't')
* returns "starring with a turtle tortoise"
* "JonL".replace('q', 'x') returns "JonL" (no change)
* </pre></blockquote>
*
* @param oldChar the old character.
* @param newChar the new character.
* @return a string derived from this string by replacing every
* occurrence of <code>oldChar</code> with <code>newChar</code>.
*/
public String replace(char oldChar, char newChar) {
if (oldChar != newChar) {
int len = value.length;
int i = -1;
char[] val = value; /* avoid getfield opcode */
while (++i < len) {
if (val[i] == oldChar) {
break;
}
}
if (i < len) {
char buf[] = new char[len];
for (int j = 0; j < i; j++) {
buf[j] = val[j];
}
while (i < len) {
char c = val[i];
buf[i] = (c == oldChar) ? newChar : c;
i++;
}
return new String(buf, true);
}
}
return this;
}
- public String replace(CharSequence target, CharSequence replacement);
这是replace方法的重载,用于字符串的全部替换。实际上是调用了Matcher的replaceAll方法。
注意:通过源码可以知道,虽然调用了Pattern.compile()方法,但是flag值为Pattern.LITERAL,即不使用正则表达式进行匹配!!!
/**
* Replaces each substring of this string that matches the literal target
* sequence with the specified literal replacement sequence. The
* replacement proceeds from the beginning of the string to the end, for
* example, replacing "aa" with "b" in the string "aaa" will result in
* "ba" rather than "ab".
*
* @param target The sequence of char values to be replaced
* @param replacement The replacement sequence of char values
* @return The resulting string
* @throws NullPointerException if <code>target</code> or
* <code>replacement</code> is <code>null</code>.
* @since 1.5
*/
public String replace(CharSequence target, CharSequence replacement) {
return Pattern.compile(target.toString(), Pattern.LITERAL).matcher(this).replaceAll(Matcher.quoteReplacement(replacement.toString()));
}
- public String replaceAll(String regex, String replacement);
replaceAll方法,用于String类型字符串之间的全部替换。
注意:通过源码可以知道,该方法使用正则表达式进行匹配!!!
/**
* Replaces each substring of this string that matches the given <a
* href="../util/regex/Pattern.html#sum" target="_blank" rel="external nofollow" >regular expression</a> with the
* given replacement.
*
* <p> An invocation of this method of the form
* <i>str</i><tt>.replaceAll(</tt><i>regex</i><tt>,</tt> <i>repl</i><tt>)</tt>
* yields exactly the same result as the expression
*
* <blockquote><tt>
* {@link java.util.regex.Pattern}.{@link java.util.regex.Pattern#compile
* compile}(</tt><i>regex</i><tt>).{@link
* java.util.regex.Pattern#matcher(java.lang.CharSequence)
* matcher}(</tt><i>str</i><tt>).{@link java.util.regex.Matcher#replaceAll
* replaceAll}(</tt><i>repl</i><tt>)</tt></blockquote>
*
*<p>
* Note that backslashes (<tt>\</tt>) and dollar signs (<tt>$</tt>) in the
* replacement string may cause the results to be different than if it were
* being treated as a literal replacement string; see
* {@link java.util.regex.Matcher#replaceAll Matcher.replaceAll}.
* Use {@link java.util.regex.Matcher#quoteReplacement} to suppress the special
* meaning of these characters, if desired.
*
* @param regex
* the regular expression to which this string is to be matched
* @param replacement
* the string to be substituted for each match
*
* @return The resulting <tt>String</tt>
*
* @throws PatternSyntaxException
* if the regular expression's syntax is invalid
*
* @see java.util.regex.Pattern
*
* @since 1.4
* @spec JSR-51
*/
public String replaceAll(String regex, String replacement) {
return Pattern.compile(regex).matcher(this).replaceAll(replacement);
}
- public String replaceFirst(String regex, String replacement);
replaceFirst方法才是String类提供的局部替换的方法,替换第一个匹配到的字符串,调用的是Matcher的replaceFirst方法。
注意:通过源码可以知道,该方法使用正则表达式进行匹配!!!
public String replaceFirst(String regex, String replacement) {
return Pattern.compile(regex).matcher(this).replaceFirst(replacement);
}
4、结论
String中的方法 | 参数 | 替换个数 | 是否正则 | 调用Pattern类方法 | 调用Matcher类方法 |
replace(char) | char | 全部替换 | 否 | 无 | 无 |
replace(charSequence) | charSequence | 全部替换 | 否 | Pattern.compile(非正则模式) | replaceAll |
replaceAll | String | 全部替换 | 是 | Pattern.compile(正则模式) | replaceAll |
replaceFirst | String | 替换第一个匹配的 | 是 | Pattern.compile(正则模式) | replaceFirst |