Regular Expressions (RegEx) are powerful tools in Python for searching, matching and manipulating text. They allow you to define search patterns to find specific strings in a larger text or validate input formats like email addresses, phone numbers or dates. Python provides the re module to work with RegEx efficiently.
Python RegEx?
- Text Processing: Essential for parsing logs, cleaning data and extracting information from unstructured text.
- Input Validation: Verify formats like emails, passwords, and IDs.
- Search and Replace: Perform advanced search-and-replace operations with patterns.
Importing the re Module
To use RegEx, you must first import the re module.
import re
Basic RegEx Functions in Python
The re module provides several functions for working with patterns. Here are the most commonly used ones:
- re.search(): Searches for a match anywhere in the string.
- re.match(): Matches the pattern only at the beginning of the string.
- re.findall(): Returns all non-overlapping matches in a string.
- re.finditer(): Returns an iterator of match objects for all matches.
- re.sub(): Replaces occurrences of a pattern with a replacement string.
Understanding RegEx Patterns
A RegEx pattern is a string containing special characters, known as metacharacters, that define the search criteria. Some important metacharacters include:
Metacharacter | Description | Example |
---|---|---|
. | Matches any character (except newline) | a.b → acb, a0b |
^ | Matches the start of a string | ^hello → hello world |
$ | Matches the end of a string | world$ → hello world |
* | Matches zero or more occurrences | a* → aaa, a |
+ | Matches one or more occurrences | a+ → aaa |
? | Matches zero or one occurrence | a? → a, “ |
[ ] | Matches any character inside brackets | [abc] → a, b, c |
{ } | Matches specific repetitions | a{2} → aa |
` | ` | Logical OR |
\ | Escapes special characters | \. → . |
Using re.search()
The re.search() function scans the entire string for a match.
import re
text = "The price is $100."
match = re.search(r"\$\d+", text)
if match:
print("Match found:", match.group())
Output:
Match found: $100
Using re.match()
The re.match() function checks for a match at the start of the string.
import re
text = "hello world"
match = re.match(r"hello", text)
if match:
print("Match found:", match.group())
Output:
Match found: hello
Using re.findall()
The re.findall() function returns all matches as a list.
import re
text = "cat bat rat mat"
matches = re.findall(r"\b\w+at\b", text)
print("Matches:", matches)
Output:
Matches: ['cat', 'bat', 'rat', 'mat']
Using re.sub()
The re.sub() function replaces occurrences of a pattern with a specified string.
import re
text = "I have 2 cats and 3 dogs."
result = re.sub(r"\d+", "many", text)
print("Modified text:", result)
Output:
Modified text: I have many cats and many dogs.
Using Groups in RegEx
Groups allow you to extract specific parts of a match.
import re
text = "My email is example@test.com."
match = re.search(r"(\w+)@(\w+\.\w+)", text)
if match:
print("Username:", match.group(1))
print("Domain:", match.group(2))
Output:
Username: example
Domain: test.com
Flags in RegEx
Flags modify the behavior of a pattern. Common flags include:
- re.IGNORECASE (re.I): Makes the pattern case-insensitive.
- re.MULTILINE (re.M): Allows ^ and $ to match the start and end of lines.
- re.DOTALL (re.S): Allows . to match newline characters.
Example: Using Flags
import re
text = "Hello\nWorld"
match = re.search(r"world", text, re.IGNORECASE)
if match:
print("Match found:", match.group())