Regular Expression - Java Programming Tutorial
Regular Expression - Java Programming Tutorial
1. Introduction
Regular Expression (regex) is extremely useful in programming, especially in processing text files.
I assume that you are familiar with regex and Java. Otherwise, read up the regex syntax at:
1. My article on "Regular Expressions".
2. The Online Java Tutorial Trail on "Regular Expressions".
3. JavaDoc for java.util.regex Package.
1 import java.util.regex.Pattern;
2 import java.util.regex.Matcher;
3
4 public class TestRegexFindText {
5 public static void main(String[] args) {
6
7 // Input String for matching the regex pattern
8 String inputStr = "This is an apple. These are 33 (thirty-three) apples.";
9 // Regex to be matched
10 String regexStr = "Th";
11
12 // Step 1: Compile a regex via static method Pattern.compile(), default is case-sensitive
13 Pattern pattern = Pattern.compile(regexStr);
14 // Pattern.compile(regex, Pattern.CASE_INSENSITIVE); // for case-insensitive matching
15
16 // Step 2: Allocate a matching engine from the compiled regex pattern,
17 // and bind to the input string
18 Matcher matcher = pattern.matcher(inputStr);
19
20 // Step 3: Perform matching and process the matching results
21
22 // Try Matcher.find(), which finds the next match
23 while (matcher.find()) {
24 System.out.println("find() found substring \"" + matcher.group()
25 + "\" starting at index " + matcher.start()
26 + " and ending at index " + matcher.end());
27 }
28
29 // Try Matcher.matches(), which tries to match the entrie input string
30 if (matcher.matches()) {
31 System.out.println("matches() found substring \"" + matcher.group()
32 + "\" starting at index " + matcher.start()
33 + " and ending at index " + matcher.end());
34 } else {
35 System.out.println("matches() found nothing");
36 }
37
38 // Try Matcher.lookingAt(), which tries to match from the beginning of the input string
39 if (matcher.lookingAt()) {
40 System.out.println("lookingAt() found substring \"" + matcher.group()
41 + "\" starting at index " + matcher.start()
42 + " and ending at index " + matcher.end());
43 } else {
44 System.out.println("lookingAt() found nothing");
45 }
46 }
47 }
Output
find() found substring "Th" starting at index 0 and ending at index 2
find() found substring "Th" starting at index 18 and ending at index 20
matches() found nothing
lookingAt() found substring "Th" starting at index 0 and ending at index 2
How It Works
Three steps are required to perform regex matching:
Allocate a Pattern object. There is no constructor for the Pattern class. Instead, you invoke the static method
Pattern.compile(regexStr) to compile the regexStr, which returns a Pattern instance.
https://www.ntu.edu.sg/home/ehchua/programming/java/Java_Regexe.html 2/6
1/24/2020 Regular Expression - Java Programming Tutorial
Allocate a Matcher object (an matching engine). Again, there is no constructor for the Matcher class. Instead, you invoke the
matcher(inputStr) method from the Pattern instance (created in Step 1), and bind the input string to this Matcher.
Use the Matcher instance (created in Step 2) to perform the matching and process the matching result. The Matcher class
provides a few boolean methods for performing the matches:
boolean find(): scans the input sequence to look for the next subsequence that matches the pattern. If match is found, you
can use the group(), start() and end() to retrieve the matched subsequence and its starting and ending indices, as shown
in the above example.
boolean matches(): try to match the entire input sequence against the regex pattern. It returns true if the entire input
sequence matches the pattern. That is, include regex's begin and end position anchors ^ and $ to the pattern.
boolean lookingAt(): try to match the input sequence, starting from the beginning, against the regex pattern. It returns
true if a prefix of the input sequence matches the pattern. That is, include regex's begin position anchors ^ to the pattern.
To perform case-insensitive matching, use Pattern.compile(regexStr, Pattern.CASE_INSENSITIVE) to create the Pattern
instance (as commented out in the above example).
Try changing the regex pattern of the above example to the followings and observe the outputs. Take not that you need to use escape
sequence '\\' for '\' inside a Java's string.
Check out the Javadoc for the Class java.util.regex.Pattern for the list of regular expression constructs supported by Java.
1 import java.util.regex.Pattern;
https://www.ntu.edu.sg/home/ehchua/programming/java/Java_Regexe.html 3/6
1/24/2020 Regular Expression - Java Programming Tutorial
2 import java.util.regex.Matcher;
3
4 public class TestRegexFindReplace {
5 public static void main(String[] args) {
6 String inputStr = "This is an apple. These are 33 (Thirty-three) apples";
7 String regexStr = "apple"; // pattern to be matched
8 String replacementStr = "orange"; // replacement pattern
9
10 // Step 1: Allocate a Pattern object to compile a regex
11 Pattern pattern = Pattern.compile(regexStr, Pattern.CASE_INSENSITIVE);
12
13 // Step 2: Allocate a Matcher object from the pattern, and provide the input
14 Matcher matcher = pattern.matcher(inputStr);
15
16 // Step 3: Perform the matching and process the matching result
17 //String outputStr = matcher.replaceAll(replacementStr); // all matches
18 String outputStr = matcher.replaceFirst(replacementStr); // first match only
19 System.out.println(outputStr);
20 }
21 }
How It Works
First, create a Pattern object to compile a regex pattern. Next, create a Matcher object from the Pattern and bind to the input string.
The Matcher class provides a replaceAll(replacementStr) to replace all the matched subsequence with the replacementStr; or
replaceFirst(replacementStr) to replace the first match only.
1 import java.util.regex.Pattern;
2 import java.util.regex.Matcher;
3
4 public class TestRegexBackReference {
5 public static void main(String[] args) {
6 String inputStr = "One:two:three:four";
7 String regexStr = "(.+):(.+):(.+):(.+)"; // pattern to be matched
8 String replacementStr = "$4-$3-$2-$1"; // replacement pattern with back references
9
10 // Step 1: Allocate a Pattern object to compile a regex
11 Pattern pattern = Pattern.compile(regexStr);
12
13 // Step 2: Allocate a Matcher object from the Pattern, and provide the input
14 Matcher matcher = pattern.matcher(inputStr);
15
16 // Step 3: Perform the matching and process the matching result
17 String outputStr = matcher.replaceAll(replacementStr); // all matches
18 //String outputStr = matcher.replaceFirst(replacementStr); // first match only
19 System.out.println(outputStr); // Output: four-three-two-One
20 }
21 }
https://www.ntu.edu.sg/home/ehchua/programming/java/Java_Regexe.html 4/6
1/24/2020 Regular Expression - Java Programming Tutorial
2. Parenthesized Back Reference: Provide back references to the matched subsequences. The matched subsequence of the first pair of
parentheses can be referred to as $1, second pair of patentee as $2, and so on. In the above example, there are 4 pairs of parentheses,
which were referenced in the replacement pattern as $1, $2, $3, and $4. You can use groupCount() (of the Matcher) to get the
number of groups captured, and group(groupNumber), start(groupNumber), end(groupNumber) to retrieve the matched
subsequence and their indices. In Java, $0 denotes the entire regular expression. Try the following codes and check the output:
while (matcher.find()) {
System.out.println("find() found substring \"" + matcher.group()
+ "\" starting at index " + matcher.start()
+ " and ending at index " + matcher.end());
System.out.println("Group count is: " + matcher.groupCount());
for (int i = 0; i < matcher.groupCount(); ++i) {
System.out.println("Group " + i + ": substring="
+ matcher.group(i) + ", start=" + matcher.start(i)
+ ", end=" + matcher.end(i));
}
}
1 import java.util.regex.Pattern;
2 import java.util.regex.Matcher;
3 import java.io.File;
4
5 public class RegexRenameFiles {
6 public static void main(String[] args) {
7 String regexStr = ".class$"; // ending with ".class"
8 String replacementStr = ".out"; // replace with ".out"
9
10 // Allocate a Pattern object to compile a regex
11 Pattern pattern = Pattern.compile(regexStr, Pattern.CASE_INSENSITIVE);
12 Matcher matcher;
13
14 File dir = new File("."); // directory to be processed
15 int count = 0;
16 File[] files = dir.listFiles(); // list all files and directories
17 for (File file : files) {
18 if (file.isFile()) { // file only, not directory
19 String inFilename = file.getName(); // get filename, exclude path
20 matcher = pattern.matcher(inFilename); // allocate Matches with input
21 if (matcher.find()) {
22 ++count;
23 String outFilename = matcher.replaceFirst(replacementStr);
24 System.out.print(inFilename + " -> " + outFilename);
25
26 if (file.renameTo(new File(dir + "\\" + outFilename))) { // execute rename
27 System.out.println(" SUCCESS");
28 } else {
29 System.out.println(" FAIL");
30 }
31 }
32 }
33 }
34 System.out.println(count + " files processed");
35 }
36 }
You can use regex to specify the pattern, and back references in the replacement, as in the previous example.
https://www.ntu.edu.sg/home/ehchua/programming/java/Java_Regexe.html 5/6
1/24/2020 Regular Expression - Java Programming Tutorial
// In String class
public String[] split(String regexStr)
For example,
There
are
thirty
three
big
apple
For example,
import java.util.Scanner;
public class ScannerUseDelimiterTest {
public static void main(String[] args) {
String source = "There are thirty-three big-apple";
Scanner in = new Scanner(source);
in.useDelimiter("\\s+|-"); // whitespace(s) or -
while (in.hasNext()) {
System.out.println(in.next());
}
}
}
Feedback, comments, corrections, and errata can be sent to Chua Hock-Chuan ([email protected]) | HOME
https://www.ntu.edu.sg/home/ehchua/programming/java/Java_Regexe.html 6/6