Table of contents
1.
Introduction
2.
Regular expressions: Classes and Interfaces
3.
Pattern class
3.1.
Methods
3.2.
Flags
3.3.
Metacharacters 
3.4.
Character classes
3.5.
Quantifiers
4.
Matcher class
4.1.
Methods
5.
PatternSyntaxException class
5.1.
Methods
6.
MatchResult Interface
6.1.
Methods
7.
Example of Java Regular Expressions: 
8.
Regular Expression Character Classes and Quantifiers Example : 
9.
Summary
9.1.
Example
10.
Frequently Asked Questions
10.1.
What is the difference between an ordinary character and a metacharacter?
10.2.
What is the difference between matches() and find() methods? 
11.
Conclusion
Last Updated: Oct 7, 2024
Easy

Regular Expressions in Java

Author Yashesvinee V
0 upvote
Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

Regular expressions, which are often termed regex, are a powerful tool in Java for working with strings. A regular expression in Java serves as a search pattern, helping you find, replace, or modify strings in a structured way. The java.util.regex API is the backbone for handling all regex operations, providing robust support for the full range of regular expression syntax. 

Regular Expressions in Java

This API includes three classes and an interface that facilitates the use of regex in Java, making it easier to manage and manipulate string patterns effectively.

Regular expressions: Classes and Interfaces

Following are the classes and interfaces of the regex package in Java. 

  • Pattern class is used for defining the regular expression pattern.
     
  • Matcher class interprets the pattern and performs match operations on a string.
     
  • PatternSyntaxException class throws an unchecked exception when there is a syntax error in the regular expression pattern.
     
  • MatchResult interface shows the result of the match operation.

Pattern class

The pattern class gives a compiled representation of the regular expression. The invocation sequence typically starts by compiling a regular expression into an instance of this class. This results in the creation of a pattern object that is passed to the Matcher class to find matching string patterns.

Methods

Method

Description

compile(String Regex) It compiles the given regular expression into a pattern.
compile(String Regex,  flag) It compiles the given regular expression into a pattern with flags.
flags() It returns the pattern's match flags.
matcher(CharSequence input) It creates a matcher to match the given input against the pattern. 
matches(String Regex, CharSequence input) It compiles the given regular expression and matches the given input against it.
pattern() It returns the regular expression from which this pattern was compiled.
quote(String x) It returns a literal pattern String for the given string.
split(CharSequence input) It splits the given input sequence around the matches of this pattern.
toString() It returns a string representation of this pattern.

Flags

Flag

Description

CASE_INSENSITIVE Ignores the case of all letter characters in the string.
COMMENTS It permits the use of comments and whitespace in the pattern.
DOTALL It allows the expression ( . ) to match any character, including a line terminator.
LITERAL It allows the pattern to be treated as a sequence of literal characters.
MULTILINE It enables regular expressions to be written in multiple lines.
UNIX_LINES It enables Unix lines mode where only the '\n' line terminator is recognized in the behaviour of ., ^, and $.

Metacharacters 

Character

Description

[ ]

To define a set of characters

{ }

To specify the exact number of occurrences of a character from a set.

( )

To capture and group patterns.

+

To search for one or more occurrences.

-

To specify a range of characters.

*

To search for zero or more occurrences.

.

To find any character except the newline character.

^

To exclude a given set of characters if placed within [ ] or to imply that the string starts with a given pattern. 

$

To signify the string at the end of a pattern.

?

To search for zero or one occurrence.

/

To escape the / character or signal a special sequence. It is used in predefined character classes and boundary matchers.

Character classes

Character Class

Description

\d Set of all digits from 0 to 9. 
\D Set of all non-digit characters.
\w Set of all letters, digits and the underscore character.
\W Set of all characters excluding letters, digits and the underscore.
\s Set of all whitespace characters (tab space, newline, carriage return, form-feed and space).
\S Set of all characters excluding whitespace characters.
\b A word boundary.
\B A non-word boundary.
[abc] Set of a, b and c characters.
[a-m] Set of all characters from a to m.
[^abc] Set of all characters except a, b and c.

 

Quantifiers

Quantifier

Description

X? It matches a string that contains one or no occurrence of X.
X* It matches a string that contains zero or more occurrences of X.
X+ It matches a string that contains at least one occurrence of X.
X{n} It matches a string that contains n occurrences of X.
X{n,} It matches a string that contains at least n occurrences of X.
X{n,m} It matches a string whose count of X is greater than n but less than m.

Matcher class

The Matcher class performs a match operation on a given character sequence after interpreting a pattern. It creates a match object by invoking the matcher function on the pattern object created by the pattern class.

Methods

Method

Description

end() It returns the offset of the last character matched.
find() It finds the next substring of the input sequence that matches the pattern.
find(int start) It finds the next substring of the input sequence that matches the pattern from a given index.
group() It returns the substring matched by the previous match.
group(int group) It returns the substring matched by a given group in the previous match.
groupCount() It returns the number of matching groups in the matcher’s pattern.
lookingAt() It matches the input sequence, starting at the beginning of the region, against the pattern.
matches() It matches the entire region against the pattern.
pattern() It returns the interpreted matcher.
quoteReplacement(String s) It returns a literal replacement String for the given string.
region(int start, int end) It sets the limit of the matcher’s region.
regionEnd() It returns the starting index of the matcher’s region.
regionStart() It returns the last index of the matcher’s region.
replaceAll(String replacement) It replaces every occurrence of a matched string pattern in the sequence with the given string.
replaceFirst() It replaces the first occurrence of a matched string pattern in the sequence with the given string.
reset() It resets the matcher.
reset(CharSequence input) It resets the matcher with a new sequence.
start() It returns the starting index of the previous match.
start(int group) It returns the starting index of a given group in the previous match.
toString() It returns the string representation of the matcher.
usePattern(Pattern newPattern) It changes the existing pattern of the matcher to a new one.

PatternSyntaxException class

It extends the IllegalArgumentException class, which indicates that an invalid or illegal argument has been passed to a method. It has a single constructor to create a new instance.

PatternSyntaxException(String desc, String regex, int index)
You can also try this code with Online Java Compiler
Run Code

 

  • desc- A description of the error
  • regex- The pattern with the error
  • index- An approximate position (index) of the error in the pattern

Methods

Method

Description

getDescription() It retrieves the description of the error
getIndex() It retrieves the approximate index position of the error in the pattern. If index is not found, it returns -1.
getMessage() It returns a multiline string that contains the description of the error in detail.
getPattern() It retrieves the pattern that has the error.

MatchResult Interface

This interface contains the queries methods to get the results of the match against the given regular expression. The boundaries, groups and group boundaries are read-only and cannot be modified by this interface.

Methods

Method

Description

end() It return the offset of the lasr character matched.
group() It returns the substring matched by the previous match.
groupCount() It returns the number of matching groups in th matcher’s pattern.
start() It returns the starting index of the previous match.
start(int group) It returns the starting index of a given group in the previous match.

Example of Java Regular Expressions: 

Suppose we want to check if a string contains a phone number in a standard US format (e.g., 123-456-7890). Here’s how you could write a regex to find this:


import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class RegexExample {
   public static void main(String[] args) {
       String text = "Call me at 123-456-7890.";
       String patternString = "\\d{3}-\\d{3}-\\d{4}";
       Pattern pattern = Pattern.compile(patternString);
       Matcher matcher = pattern.matcher(text);
       boolean matches = matcher.find();
       System.out.println("Does the text contain a phone number? " + matches);
   }
}

 

In this example:

  • `\\d{3}-\\d{3}-\\d{4}` is the regex. `\\d` means any digit, and `{3}` specifies exactly three digits, so `\\d{3}` means three digits.
  • The `Pattern` class compiles the regex into a pattern that can be used by the `Matcher` class.
  • The `Matcher` tries to find the pattern in the text.

Regular Expression Character Classes and Quantifiers Example : 

Character classes allow you to define a set of characters you wish to match, while quantifiers specify how many times a character or group of characters should appear.

For example : 

public class RegexCharacterClass {
   public static void main(String[] args) {
       String text = "My email is contact@example.com";
       String patternString = "[a-zA-Z]+@[a-zA-Z]+\\.[a-zA-Z]{2,}";
       Pattern pattern = Pattern.compile(patternString);
       Matcher matcher = pattern.matcher(text);
       boolean matches = matcher.find();
       System.out.println("Does the text contain an email? " + matches);
   }
}

In this regex:

  • `[a-zA-Z]+` matches one or more letters. The `+` quantifier means "one or more".
  • `@` is a literal character that must appear exactly as it is.
  • `\\.[a-zA-Z]{2,}` matches a period (escaped as `\\.` because a plain period means "any character" in regex) followed by two or more letters. `{2,}` means "two or more".

Summary

  1. Create a pattern object using the compile() method of the Pattern class.
     
  2. Create a match object using the matcher() method of the Matcher class on the created pattern object.
     
  3. Find all the matches of the pattern in a given string. Methods are provided to split the sequence and replace patterns in the sequence.
     
  4. Any errors in the pattern prompts the PatternSyntaxException to throw an appropriate message that describes the error.

Example

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main 
{
    public static void main(String args[])
    {
      StringBuffer sb = new StringBuffer("AbbenjoyAAByouraaBdayAbb");

      Pattern pattern = Pattern.compile("a.b", Pattern.CASE_INSENSITIVE);

      Matcher m = pattern.matcher(sb);

      while (m.find())
         System.out.println("Pattern found from " + m.start() + " to " + (m.end() - 1) + " : " + m.group());

      System.out.println();
      System.out.println(m.replaceAll("-"));

      String[] list = pattern.split(sb);
      for (String temp : list)
         System.out.println(temp);
    }
} 
You can also try this code with Online Java Compiler
Run Code

 

Output:

Pattern found from 0 to 2 : Abb
Pattern found from 8 to 10 : AAB
Pattern found from 15 to 17 :aaB
Pattern found from 21 to 23 :Abb

-enjoy-your-day-

enjoy
your
day

Frequently Asked Questions

What is the difference between an ordinary character and a metacharacter?

An ordinary character in a regular expression can match itself. A metacharacter is a special character that affects how a pattern is matched. For example, @ is an ordinary character. The hyphen (-) is a metacharacter used to define a range of characters. 

What is the difference between matches() and find() methods? 

The matches() method returns true only if the whole sequence matches the regular expression. On the other hand, the find() method returns true even if a substring of the sequence matches the regular expression.

Conclusion

Majority of the data handled by application programmers consist of strings. Regular expressions are used in text processing, data validation, web scraping and many more applications. This blog briefly explains regular expressions in Java and the related classes and interface. It also discusses the characters, flags, quantifiers and methods used for working with regular expressions.

Related Links:

Even number program in java

String operations in Java

Basic String problems

Advanced String problems
 

You can also check out our other blogs on Code360.

Live masterclass