2024-07-12
한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina
In regular expressions, ? can represent a quantity, 0 or 1 times, which is equivalent to {0, 1}, or it can be used as a special character to represent other meanings.
? Following other quantity qualifiers, it indicates non-greedy matching, that is, matching the shortest possible string found during the search.
Let's look at an example:
- @Test
- public void test() {
- Pattern pattern = Pattern.compile("a.*?");
- Matcher matcher = pattern.matcher("abcabc");
- if (matcher.matches()) {
- System.out.println(matcher.group());
- }
- }
Output after execution: abcabc
Isn't it the shortest match? Why does it fail?
This actually involves the rules of non-greedy matching:Non-greedy matching: before the next rule, the shortest path is matched. If there is no next rule, it is processed as greedy matching.
That is to say, if only "a.*?" appears, it will still be processed as greedy matching.
Here's the correct usage:
- @Test
- public void test() {
- Pattern pattern = Pattern.compile("(a.*?)(.*)");
- Matcher matcher = pattern.matcher("afcafc");
- if (matcher.matches()) {
- System.out.println(matcher.group(0));
- System.out.println(matcher.group(1));
- System.out.println(matcher.group(2));
- }
- }
Output after execution:
- afcafc
- a
- fcafc
You can see that the first capture group captures the shortest string "a", and the second capture group captures "fcafc".
Let's look at two other situations:
When used in a capture group, ?: is placed before the regular expression to indicate matching but not capturing, that is, the value of this group of matches cannot be obtained through the group method.
Let's look at an example
- @Test
- public void test0() {
- Pattern pattern = Pattern.compile("\d{4}-(?:[a-z]+)");
- Matcher matcher = pattern.matcher("3214-opo");
- if (matcher.matches()) {
- System.out.println(matcher.group());
- System.out.println(matcher.group(1)); // 报错
- }
- }
When capturing with group(1), an error message is displayed, that is, the pattern can be matched but cannot be captured. If ?: is removed, the pattern can be captured with group(1).
(?s) turns on single-line mode on the right, making . match any character, including the newline character n.
Let's look at an example:
- private static final String DEFAULT_VARIABLE_PATTERN = "((?s).*)";
-
-
- /**
- * 从输出结果可知,匹配到了换行符 'n'
- */
- @Test
- public void test4() {
- Pattern pattern = Pattern.compile(DEFAULT_VARIABLE_PATTERN);
- Matcher matcher = pattern.matcher("abcnsdf");
- if (matcher.matches()) {
- System.out.println(matcher.group());
- System.out.println(matcher.group(1));
- System.out.println(matcher.group(2)); // (?s) 不能作为捕获组,报错
- }
- }
When capturing, (?s) cannot be used as a capture group, so "((?s).*)" can capture up to group(1) at most, and an error will be reported when capturing group(2).
- @Test
- public void test5() {
- Pattern pattern = Pattern.compile("(.*)");
- Matcher matcher = pattern.matcher("abcnsdf");
- if (matcher.matches()) {
- System.out.println(matcher.group());
- System.out.println(matcher.group(1));
- }
- }
After removing (?s), we try to match "abcnsdf" again. However, since there is a newline character, the match cannot be completed and nothing will be output.