Python RegEx

Regular Expressions (RegEx) are powerful tools in Python for searching, matching and manipulating text. They allow you to define search patterns to find specific strings in a larger text or validate input formats like email addresses, phone numbers or dates. Python provides the re module to work with RegEx efficiently.

Python RegEx?

  1. Text Processing: Essential for parsing logs, cleaning data and extracting information from unstructured text.
  2. Input Validation: Verify formats like emails, passwords, and IDs.
  3. Search and Replace: Perform advanced search-and-replace operations with patterns.

Importing the re Module

To use RegEx, you must first import the re module.

import re

Basic RegEx Functions in Python

The re module provides several functions for working with patterns. Here are the most commonly used ones:

  1. re.search(): Searches for a match anywhere in the string.
  2. re.match(): Matches the pattern only at the beginning of the string.
  3. re.findall(): Returns all non-overlapping matches in a string.
  4. re.finditer(): Returns an iterator of match objects for all matches.
  5. re.sub(): Replaces occurrences of a pattern with a replacement string.

Understanding RegEx Patterns

A RegEx pattern is a string containing special characters, known as metacharacters, that define the search criteria. Some important metacharacters include:

MetacharacterDescriptionExample
.Matches any character (except newline)a.b → acb, a0b
^Matches the start of a string^hello → hello world
$Matches the end of a stringworld$ → hello world
*Matches zero or more occurrencesa* → aaa, a
+Matches one or more occurrencesa+ → aaa
?Matches zero or one occurrencea? → a, “
[ ]Matches any character inside brackets[abc] → a, b, c
{ }Matches specific repetitionsa{2} → aa
``Logical OR
\Escapes special characters\. → .

Using re.search()

The re.search() function scans the entire string for a match.

import re

text = "The price is $100."
match = re.search(r"\$\d+", text)

if match:
print("Match found:", match.group())

Output:

Match found: $100

Using re.match()

The re.match() function checks for a match at the start of the string.

import re

text = "hello world"
match = re.match(r"hello", text)

if match:
print("Match found:", match.group())

Output:

Match found: hello

Using re.findall()

The re.findall() function returns all matches as a list.

import re

text = "cat bat rat mat"
matches = re.findall(r"\b\w+at\b", text)

print("Matches:", matches)

Output:

Matches: ['cat', 'bat', 'rat', 'mat']

Using re.sub()

The re.sub() function replaces occurrences of a pattern with a specified string.

import re

text = "I have 2 cats and 3 dogs."
result = re.sub(r"\d+", "many", text)

print("Modified text:", result)

Output:

Modified text: I have many cats and many dogs.

Using Groups in RegEx

Groups allow you to extract specific parts of a match.

import re

text = "My email is example@test.com."
match = re.search(r"(\w+)@(\w+\.\w+)", text)

if match:
print("Username:", match.group(1))
print("Domain:", match.group(2))

Output:

Username: example
Domain: test.com

Flags in RegEx

Flags modify the behavior of a pattern. Common flags include:

  • re.IGNORECASE (re.I): Makes the pattern case-insensitive.
  • re.MULTILINE (re.M): Allows ^ and $ to match the start and end of lines.
  • re.DOTALL (re.S): Allows . to match newline characters.

Example: Using Flags

import re

text = "Hello\nWorld"
match = re.search(r"world", text, re.IGNORECASE)

if match:
print("Match found:", match.group())

Leave a Comment