Introduction
Advanced regular expression techniques for complex text processing tasks.
Lookahead and Lookbehind
import re
# Positive lookahead
pattern = r"\d+(?=px)" # Digits followed by px
re.findall(pattern, "100px 200px 300em")
# Negative lookahead
pattern = r"\d+(?!px)" # Digits NOT followed by px
re.findall(pattern, "100px 200px 300em")
# Positive lookbehind
pattern = r"(?<=\$)\d+" # Digits preceded by $
re.findall(pattern, "$100 $200 $300")
# Negative lookbehind
pattern = r"(?<!\$)\d+" # Digits NOT preceded by $
Named Groups
pattern = r"(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})"
match = re.match(pattern, "2024-06-15")
print(match.group("year")) # 2024
print(match.groupdict()) # {'year': '2024', 'month': '06', 'day': '15'}
Greedy vs Non-Greedy
# Greedy (matches as much as possible)
re.findall(r"<.+>", "<a><b>") # ['<a><b>']
# Non-greedy (matches as little as possible)
re.findall(r"<.+?>", "<a><b>") # ['<a>', '<b>']
Practice Problems
- Extract dates with lookahead
- Match passwords using lookbehind
- Parse log entries with named groups
- Replace using backreferences
- Validate complex patterns