Code360 powered by Coding Ninjas X Naukri.com. Code360 powered by Coding Ninjas X Naukri.com
Table of contents
1.
Introduction
2.
Regular expressions - Classes and Interfaces
2.1.
Pattern class
2.1.1.
Methods
2.1.2.
Flags
2.1.3.
Metacharacters 
2.1.4.
Character classes
2.1.5.
Quantifiers
2.2.
Matcher class
2.2.1.
Methods
2.3.
 
2.4.
PatternSyntaxException class
2.4.1.
Methods
2.5.
MatchResult Interface
2.5.1.
Methods
3.
Summary
3.1.
Example
4.
Frequently Asked Questions
4.1.
Q: What is the difference between an ordinary character and a metacharacter?
4.2.
Q: What is the difference between matches() and find() methods? 
5.
Conclusion
Last Updated: Mar 27, 2024

Regular Expressions in Java

Author Yashesvinee V
0 upvote
Crack Google SDE interview : Essential projects
Speaker
Saurav Prateek
SDE-2 @
20 Jun, 2024 @ 01:30 PM

Introduction

Regular expressions are one of the most popular and well-known features of String operations in Java. Regular expressions are strings that act as a search template to find similar patterns of the same in a given string. They can be used to replace, search or manipulate strings patterns. The java.util.regex API supports the use of all regular expression syntax. This API provides three classes and one interface, all of which equally participate in operating with Regular expressions.

Regular expressions - Classes and Interfaces

Following are the classes and interfaces of the regex package in Java. 

  • Pattern class is used for defining the regular expression pattern.
     
  • Matcher class interprets the pattern and performs match operations on a string.
     
  • PatternSyntaxException class throws an unchecked exception when there is a syntax error in the regular expression pattern.
     
  • MatchResult interface shows the result of the match operation.

Pattern class

The pattern class gives a compiled representation of the regular expression. The invocation sequence typically starts by compiling a regular expression into an instance of this class. This results in the creation of a pattern object that is passed to the Matcher class to find matching string patterns.

Methods

Method

Description

compile(String Regex) It compiles the given regular expression into a pattern.
compile(String Regex,  flag) It compiles the given regular expression into a pattern with flags.
flags() It returns the pattern's match flags.
matcher(CharSequence input) It creates a matcher to match the given input against the pattern. 
matches(String Regex, CharSequence input) It compiles the given regular expression and matches the given input against it.
pattern() It returns the regular expression from which this pattern was compiled.
quote(String x) It returns a literal pattern String for the given string.
split(CharSequence input) It splits the given input sequence around the matches of this pattern.
toString() It returns a string representation of this pattern.

Flags

Flag

Description

CASE_INSENSITIVE Ignores the case of all letter characters in the string.
COMMENTS It permits the use of comments and whitespace in the pattern.
DOTALL It allows the expression ( . ) to match any character, including a line terminator.
LITERAL It allows the pattern to be treated as a sequence of literal characters.
MULTILINE It enables regular expressions to be written in multiple lines.
UNIX_LINES It enables Unix lines mode where only the '\n' line terminator is recognized in the behaviour of ., ^, and $.

Metacharacters 

Character

Description

[ ]

To define a set of characters

{ }

To specify the exact number of occurrences of a character from a set.

( )

To capture and group patterns.

+

To search for one or more occurrences.

-

To specify a range of characters.

*

To search for zero or more occurrences.

.

To find any character except the newline character.

^

To exclude a given set of characters if placed within [ ] or to imply that the string starts with a given pattern. 

$

To signify the string at the end of a pattern.

?

To search for zero or one occurrence.

/

To escape the / character or signal a special sequence. It is used in predefined character classes and boundary matchers.

Character classes

Character Class

Description

\d Set of all digits from 0 to 9. 
\D Set of all non-digit characters.
\w Set of all letters, digits and the underscore character.
\W Set of all characters excluding letters, digits and the underscore.
\s Set of all whitespace characters (tab space, newline, carriage return, form-feed and space).
\S Set of all characters excluding whitespace characters.
\b A word boundary.
\B A non-word boundary.
[abc] Set of a, b and c characters.
[a-m] Set of all characters from a to m.
[^abc] Set of all characters except a, b and c.

 

Quantifiers

Quantifier

Description

X? It matches a string that contains one or no occurrence of X.
X* It matches a string that contains zero or more occurrences of X.
X+ It matches a string that contains at least one occurrence of X.
X{n} It matches a string that contains n occurrences of X.
X{n,} It matches a string that contains at least n occurrences of X.
X{n,m} It matches a string whose count of X is greater than n but less than m.

Matcher class

The Matcher class performs a match operation on a given character sequence after interpreting a pattern. It creates a match object by invoking the matcher function on the pattern object created by the pattern class.

Methods

Method

Description

end() It returns the offset of the last character matched.
find() It finds the next substring of the input sequence that matches the pattern.
find(int start) It finds the next substring of the input sequence that matches the pattern from a given index.
group() It returns the substring matched by the previous match.
group(int group) It returns the substring matched by a given group in the previous match.
groupCount() It returns the number of matching groups in the matcher’s pattern.
lookingAt() It matches the input sequence, starting at the beginning of the region, against the pattern.
matches() It matches the entire region against the pattern.
pattern() It returns the interpreted matcher.
quoteReplacement(String s) It returns a literal replacement String for the given string.
region(int start, int end) It sets the limit of the matcher’s region.
regionEnd() It returns the starting index of the matcher’s region.
regionStart() It returns the last index of the matcher’s region.
replaceAll(String replacement) It replaces every occurrence of a matched string pattern in the sequence with the given string.
replaceFirst() It replaces the first occurrence of a matched string pattern in the sequence with the given string.
reset() It resets the matcher.
reset(CharSequence input) It resets the matcher with a new sequence.
start() It returns the starting index of the previous match.
start(int group) It returns the starting index of a given group in the previous match.
toString() It returns the string representation of the matcher.
usePattern(Pattern newPattern) It changes the existing pattern of the matcher to a new one.

 

 

PatternSyntaxException class

It extends the IllegalArgumentException class, which indicates that an invalid or illegal argument has been passed to a method. It has a single constructor to create a new instance.

PatternSyntaxException(String desc, String regex, int index)

 

desc- A description of the error

regex- The pattern with the error

index- An approximate position (index) of the error in the pattern

Methods

Method

Description

getDescription() It retrieves the description of the error
getIndex() It retrieves the approximate index position of the error in the pattern. If index is not found, it returns -1.
getMessage() It returns a multiline string that contains the description of the error in detail.
getPattern() It retrieves the pattern that has the error.

MatchResult Interface

This interface contains the queries methods to get the results of the match against the given regular expression. The boundaries, groups and group boundaries are read-only and cannot be modified by this interface.

Methods

Method

Description

end() It return the offset of the lasr character matched.
group() It returns the substring matched by the previous match.
groupCount() It returns the number of matching groups in th matcher’s pattern.
start() It returns the starting index of the previous match.
start(int group) It returns the starting index of a given group in the previous match.

 

Get the tech career you deserve, faster!
Connect with our expert counsellors to understand how to hack your way to success
User rating 4.7/5
1:1 doubt support
95% placement record
Akash Pal
Senior Software Engineer
326% Hike After Job Bootcamp
Himanshu Gusain
Programmer Analyst
32 LPA After Job Bootcamp
After Job
Bootcamp

Summary

  1. Create a pattern object using the compile() method of the Pattern class.
     
  2. Create a match object using the matcher() method of the Matcher class on the created pattern object.
     
  3. Find all the matches of the pattern in a given string. Methods are provided to split the sequence and replace patterns in the sequence.
     
  4. Any errors in the pattern prompts the PatternSyntaxException to throw an appropriate message that describes the error.

Example

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main 
{
    public static void main(String args[])
    {
      StringBuffer sb = new StringBuffer("AbbenjoyAAByouraaBdayAbb");

      Pattern pattern = Pattern.compile("a.b", Pattern.CASE_INSENSITIVE);

      Matcher m = pattern.matcher(sb);

      while (m.find())
         System.out.println("Pattern found from " + m.start() + " to " + (m.end() - 1) + " : " + m.group());

      System.out.println();
      System.out.println(m.replaceAll("-"));

      String[] list = pattern.split(sb);
      for (String temp : list)
         System.out.println(temp);
    }
} 

 

Output:
Pattern found from 0 to 2 : Abb
Pattern found from 8 to 10 : AAB
Pattern found from 15 to 17 :aaB
Pattern found from 21 to 23 :Abb

-enjoy-your-day-

enjoy
your
day

Frequently Asked Questions

Q: What is the difference between an ordinary character and a metacharacter?

An ordinary character in a regular expression can match itself. A metacharacter is a special character that affects how a pattern is matched. For example, @ is an ordinary character. The hyphen (-) is a metacharacter used to define a range of characters. 

Q: What is the difference between matches() and find() methods? 

The matches() method returns true only if the whole sequence matches the regular expression. On the other hand, the find() method returns true even if a substring of the sequence matches the regular expression.

Conclusion

Majority of the data handled by application programmers consist of strings. Regular expressions are used in text processing, data validation, web scraping and many more applications. This blog briefly explains regular expressions in Java and the related classes and interface. It also discusses the characters, flags, quantifiers and methods used for working with regular expressions.

Related Links:

Even number program in java

String operations in Java

Basic String problems

Advanced String problems

Check out the official Coding Ninjas occurrences and visit our Library for more.

Previous article
Arrays tostring Java
Next article
Generic LinkedList in Java
Live masterclass