Python's Regular Expression Library (re)
Putting aside for a moment how much I value clean, readable, well-tested code, I admit I'm envious of those regular expression aces on Stackoverflow who seem to read (and type!) regular expressions as a second language.
Regular expressions don't come naturally to me, not yet anyway, perhaps in part because I don't use them on a regular (pun acknowledged) basis.
Hence these notes, which I update as needed.
Quick tip: I recommend making your own notes on this or any other topic you're learning. They help greatly with retention and quick reference.
Example: Find contents within matching pairs of angle braces
Find all contents within matching angle brackets.
<hello><world> ^^^^^ ^^^^^
In the example above, we want to retrieve 'hello' and 'world'.
Build up a regular expression in conceptual pieces.
[^<>] means any single character except < or >. The ^ character is a logical "not" when used between square brackets.
[^<>]+ means the same as the above, but stipulates one or more such characters.
([^<>]+) adds enclosing parentheses, which means define the match as a subexpression which can be referenced independently.
Use findall() to return all matches. Notice how only the subexpression is matched, not the bounding < and >.
Note also the r' prefix when opening the string. This indicates a raw string expression which ensures that e.g. '\n' is treated as two separate characters, not a single new-line character.
>>> import re >>> exp = r'<([^<>]+)>' >>> re.findall(exp, '<hello><world>') ['hello', 'world']
On the other hand, omitting use of a subexpression will include the bounding < and >.
>>> import re >>> exp = r'<[^<>]+>' >>> re.findall(exp, '<hello><world>') ['<hello>', '<world>']
* 0 or more + 1 or more ? 0 or 1
Mentioned in several places: "note that this is different from finding a zero-length match". Huh?
Differences between findall, match, search
re.findall() will find all matches:
>>> re.findall(r'[\d]+', 'a1b2c3') ['1', '2', '3']
re.match() requires the match be at the beginning of the string.
>>> re.match(r'\d+', '123abc') <_sre.SRE_Match object; span=(0, 3), match='123'> >>> >>> re.match(r'\d+', 'foo123abc') >>>
re.search() checks for a match anywhere in the string. It returns only the first result.
>>> re.search(r'\d+', 'a1b2c3') <_sre.SRE_Match object; span=(1, 2), match='1'>
Contact: hello at escapefromsql.net