Code360 powered by Coding Ninjas X Naukri.com. Code360 powered by Coding Ninjas X Naukri.com
Table of contents
1.
Introduction
2.
What is Python Regular Expression (RegEx)?
2.1.
Example
2.2.
Python
3.
Python RegEx Functions
3.1.
re.search(pattern, string)
3.2.
re.match(pattern, string)
3.3.
re.findall(pattern, string)
3.4.
re.finditer(pattern, string)
4.
MetaCharacters
5.
RegEx: Special Sequences
6.
Python RegEx - Sets
7.
Frequently Asked Questions
7.1.
What is Python regular expression? 
7.2.
What is the R in Python RegEx?
7.3.
What are the applications of regular expressions in Python?
8.
Conclusion
Last Updated: Mar 27, 2024
Easy

Python Regular Expression

Author Nikunj Goel
0 upvote

Introduction

Regular expressions (regex) are a powerful tool for manipulating text data, and Python, a highly versatile programming language, has robust support for regex through its re-module. By harnessing the power of regular expressions, one can perform complex pattern matching, text processing, and data extraction tasks with ease and efficiency.

python regular expression

This article aims to unravel the intricacies of Python regular expressions, showcasing their relevance and utility in real-world scenarios.

What is Python Regular Expression (RegEx)?

Python Regular Expression, or RegEx, is a powerful sequence of characters that forms a search pattern. It is used for matching, searching, and manipulating strings based on a specified pattern. Python's re module provides support for regular expressions.

Example

  • Python

Python

import re

# Define a simple pattern to match email addresses
pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'

# Sample text containing email addresses
text = "Contact us at info@example.com or support@company.com for assistance."

# Use re.findall() to find all matches in the text
matches = re.findall(pattern, text)

# Print the matches
print(matches)

Output:

['info@example.com', 'support@company.com']

 

Explanation

The example searches for email addresses in the given text using a regular expression pattern. The pattern \b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b represents a basic email address pattern. re.findall() finds all matches in the text. The result is a list of matched email addresses.

Python RegEx Functions

Python provides several functions in the re module for working with Regular Expressions (RegEx). Here are some key functions:

re.search(pattern, string)

Searches for the first occurrence of a pattern in a string.

import re
result = re.search(r'pattern', 'search this pattern')

re.match(pattern, string)

Checks if the pattern matches at the beginning of the string.

import re
result = re.match(r'pattern', 'pattern at the start')

re.findall(pattern, string)

Finds all occurrences of a pattern in a string and returns them as a list.

import re
results = re.findall(r'pattern', 'find all instances of this pattern')

re.finditer(pattern, string)

Returns an iterator yielding match objects for all occurrences of a pattern in a string.

import re
matches = re.finditer(r'pattern', 'find all instances of this pattern')

MetaCharacters

In Python Regular Expressions (RegEx), metacharacters are special characters that have a unique meaning and are used to define the search pattern. Here are some common metacharacters in Python RegEx:

  • . (Dot): Matches any character except a newline.
  • ^ (Caret): Matches the start of a string.
  • $ (Dollar): Matches the end of a string.
  • * (Asterisk): Matches 0 or more occurrences of the preceding character.
  • + (Plus): Matches 1 or more occurrences of the preceding character.
  • ? (Question Mark): Matches 0 or 1 occurrence of the preceding character. 
  • [] (Square Brackets): Matches any one of the characters inside the brackets.
  • | (Pipe): Acts like a logical OR. Matches either the pattern on the left or the pattern on the right.
  • `()` (Round Brackets): Groups patterns together. Capturing groups allow you to extract parts of the matched text.
  • \ (Backslash): Escapes a metacharacter, allowing you to match it as a literal character.

RegEx: Special Sequences

In Python Regular Expressions (RegEx), special sequences are escape sequences that represent predefined sets of characters or behaviors. Here are some common special sequences in Python RegEx:

  • \d: Matches any digit (0-9). 
  • \D: Matches any non-digit character. 
  • \w: Matches any alphanumeric character (word character: a-z, A-Z, 0-9, _). 
  • \W: Matches any non-alphanumeric character.
  • \s: Matches any whitespace character (space, tab, newline).
  • \S: Matches any non-whitespace character.
  • \b: Matches a word boundary. It is used to perform a whole-word match.
  • \B: Matches a non-word boundary.

Python RegEx - Sets

In Python Regular Expressions (RegEx), sets are used to define a specific set of characters that you want to match. Sets are specified using square brackets []. Here are some examples of using sets in Python RegEx:

  • [abc]: Matches any one of the characters 'a', 'b', or 'c'. 
  • [0-9]: Matches any digit from 0 to 9. 
  • [a-z]: Matches any lowercase letter from 'a' to 'z'.
  • [A-Z]: Matches any uppercase letter from 'A' to 'Z'.
  • [a-zA-Z]: Matches any uppercase or lowercase letter. 
  • [^0-9]: Matches any character that is not a digit. 
  • [^a-z]: Matches any character that is not a lowercase letter.
  • [a-zA-Z0-9]: Matches any alphanumeric character.

Frequently Asked Questions

What is Python regular expression? 

A powerful tool for pattern matching and text manipulation in Python. It provides a concise and flexible means to search, match, and manipulate strings.

What is the R in Python RegEx?

In Python Regular Expressions (RegEx), the "R" stands for "Raw." When you use the r prefix before a string containing a regular expression pattern, it denotes a raw string. Raw strings treat backslashes (\) as literal characters, preventing them from being interpreted as escape characters. 

What are the applications of regular expressions in Python?

The applications of regular expression in Python include data validation, text processing, web scraping, and more.

Conclusion

Python regular expressions are a cornerstone for text processing tasks, offering a blend of flexibility and efficiency. Despite the initial learning curve, mastering regex opens up a realm of possibilities in data manipulation, making it a worthy endeavor for anyone delving into the world of programming and data analysis.

You can refer to our guided paths on the Coding Ninjas. You can check our course to learn more about DSADBMSCompetitive ProgrammingPythonJavaJavaScript, etc. 

You can also consider our paid courses such as DSA in Python to give your career an edge over others!
 

Live masterclass