Ben Chuanlong Du's Blog

It is never too late to learn.

Hands on the re.match object in Python

In [1]:
s = 'It is "a" good "day" today.'
s
Out[1]:
'It is "a" good "day" today.'
In [2]:
import re

m = re.search('".*?"', s)
m
Out[2]:
<_sre.SRE_Match object; span=(6, 9), match='"a"'>
In [3]:
m.group(0)
Out[3]:
'"a"'
In [6]:
help(m.group)
Help on built-in function group:

group(...) method of _sre.SRE_Match instance
    group([group1, ...]) -> str or tuple.
    Return subgroup(s) of the match by indices or names.
    For 0 returns the entire match.

expand

In [6]:
m.expand()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-205cc63d2100> in <module>()
----> 1 m.expand()

TypeError: Required argument 'template' (pos 1) not found

group

Returns one or more subgroups of the match. If there is a single argument, the result is a single string; if there are multiple arguments, the result is a tuple with one item per argument. Without arguments, group1 defaults to zero (the whole match is returned). If a groupN argument is zero, the corresponding return value is the entire matching string; if it is in the inclusive range [1..99], it is the string matching the corresponding parenthesized group. If a group number is negative or larger than the number of groups defined in the pattern, an IndexError exception is raised. If a group is contained in a part of the pattern that did not match, the corresponding result is None. If a group is contained in a part of the pattern that matched multiple times, the last match is returned.

In [9]:
m = re.match(r"(\w+) (\w+)", "Isaac Newton, physicist")

The entire match.

In [10]:
m.group(0)
Out[10]:
'Isaac Newton'

The first paraenthesized subgroup.

In [13]:
m.group(1)
Out[13]:
'Isaac'

The second parenthesized subgroup.

In [14]:
m.group(2)
Out[14]:
'Newton'

Multiple arguments give us a tuple.

In [16]:
m.group(1, 2)
Out[16]:
('Isaac', 'Newton')

If the regular expression uses the (?P...) syntax, the groupN arguments may also be strings identifying groups by their group name. If a string argument is not used as a group name in the pattern, an IndexError exception is raised.

A moderately complicated example:

In [19]:
m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcolm Reynolds")
In [20]:
m.group("first_name")
Out[20]:
'Malcolm'
In [21]:
m.group("last_name")
Out[21]:
'Reynolds'

Named groups can also be referred to by their index:

In [22]:
m.group(1)
Out[22]:
'Malcolm'
In [23]:
m.group(2)
Out[23]:
'Reynolds'

If a group matches multiple times, only the last match is accessible:

In [25]:
m = re.match(r"(..)+", "a1b2c3")  # Matches 3 times.
In [27]:
m.group(1)  # Returns only the last match.
Out[27]:
'c3'

groups

In [ ]:
 

start

In [ ]:
 

end

In [ ]:
 

span

In [ ]:
 

Comments