-
The order of precedence of operators in POSIX extended regular expression is as follows.
- Collation-related bracket symbols
[==]
,[::]
,[..]
- Escaped characters
\
- Character set (bracket expression)
[]
- Grouping
()
- Single-character-ERE duplication
*
,+
,?
,{m,n}
- Concatenation
- Anchoring
^
,$
- Alternation
|
- Collation-related bracket symbols
-
Some regular expression patterns are defined using a single leading backslash, e.g.,
\s
,\b
, etc. However, since special characters (e.g.,\
) need to be escaped in strings in most programming languages, you will need the string"\\s"
to represent the regular expression pattern\s
, and similar for other regular expression patterns with a leading backslash. Python is specialy as it provides raw strings (without escaping) to make it easier to write regular expression patterns. It even goes one step further to auto correct non-properly escape strings. For more discussions on Python regular expressions, pleaser fer to Regular Expression in Python . -
It becomes tricky if you use a programming language to call another programming language to perform regular expression operations. Taking
\s
for example, since\
needs to be escaped in both programming languages, you will end up using\\\\s
to represent\s
. If you use Python to call other languages to perform regular expression patterns, things can be simplifed by using raw strings in Python. For example, instead of"\\\\s"
, you can user"\\s"
in Python. -
In some programming languages, you have to compile a plain/text pattern into a regular expression pattern object before using it. The Python module
re
automatically compiles a plain/text pattern (usingre.compile
) and caches it, so there is not much benefit to compile regular expressions by yourself in Python. -
\W
does not include^
and$
. -
Regular expression modifiers makes regular expression more flexible and powerful. It is also a more universal way than remembering different options in different programming languages or tools. It is suggested that you use regular expression modifiers when possible.
-
Word boundry (
\b
) is a super set of white spaces (\s
). -
[[:alnum:]]
contains all letters and numbers while\w
contains not only letters and numbers but also some special character such as_
. So in short\w
is a super set of[[:alnum:]]
.
Vim search | Python | JavaScript | Teradata SQL | Oracle SQL | grep | sed | |
---|---|---|---|---|---|---|---|
Modifiers | Partial[1] | Partial[1] | Full | No[2] | Full[3] | ||
Greedy or not |
Both[4] | ||||||
Popular functions |
re.search, re.sub | regexp_instr | |||||
White spaces |
\s
|
"\\s" or r"\s"
[5]
|
[[:blank:]] [[:space:]] |
\s or [[:space:]]
|
[[:space:]] (recommended) or \s
|
||
Non-white space |
\S
|
"\\S" or r"\S"
|
[[:blank:]] [[:space:]] |
\S
|
[^[:space:]] or \S
|
||
Lower-case letters |
[a-z] or \l
|
[a-z]
|
[a-z]
|
[a-z]
|
|||
Non lower-case characters |
[^a-z] or \L
|
[^a-z]
|
[^a-z]
|
[^a-z]
|
|||
Upper-case letters |
[A-Z] or \u
|
[A-Z]
|
[A-Z]
|
[A-Z]
|
|||
Non upper-case characters |
[^A-Z] or \U
|
[^A-Z]
|
[^A-Z]
|
[^A-Z]
|
|||
Letters |
[a-zA-Z] or \a
|
[a-zA-Z]
|
[a-zA-Z]
|
[a-zA-Z]
|
|||
Non letters |
[^a-zA-Z] or \A
|
[^a-zA-Z]
|
[^a-zA-Z]
|
[^a-zA-Z]
|
|||
Digits |
\d
|
"\\d" or r"\d"
|
[[:digit:]]
|
\d
|
|||
Non digits |
\D
|
"\\D" or r"\D"
|
[^[:digit:]]
|
\D
|
|||
Hex digits |
[0-9a-fA-F] or \x
|
[0-9a-fA-F]
|
[0-9a-fA-F]
|
[0-9a-fA-F]
|
|||
Non-Hex digit characters |
[^0-9a-fA-F] or \X
|
[^0-9a-fA-F]
|
[^0-9a-fA-F]
|
[^0-9a-fA-F]
|
|||
Octal digits |
[0-7] or \o
|
[0-7]
|
[0-7]
|
[0-7]
|
|||
Non-octal digit Characters |
[^0-7] or \O
|
[^0-7]
|
[^0-7]
|
[^0-7]
|
|||
Head of word |
[a-zA-Z_] or \h
|
[a-zA-Z_]
|
[a-zA-Z_]
|
[a-zA-Z_]
|
|||
Non-head of word |
[^a-zA-Z_] or \H
|
[^a-zA-Z_]
|
[^a-zA-Z_]
|
[^a-zA-Z_]
|
|||
Printable Characters |
\p
|
||||||
Non printable Characters |
\P
|
||||||
Word characters |
\w
|
"\\w" or r"\w"
|
\w
|
\w
|
|||
Word boundry |
\b
|
"\\b" or r"\b"
|
\b
|
\b
|
|||
Non word characters |
\W
|
\W
|
\W
|
\W
|
|||
grouping | \(\) | () | () | () | () | \(\) | () |
0 or more
matches |
* | * | * | * | |||
0 or more matches
(as few as possible) |
\\{-\\} | ||||||
0 or 1
matches |
\= | ? | ? | ? | |||
1 or more
matches |
\+ | + | + | + | |||
Exactly m
matches |
\\{m\\} | {m} | {m} | {m} | |||
m or more
matches |
\\{m,\\} | {m,} | {m,} | {m,} | |||
m or more matches
(as few as possible) |
\\{-m,\\} | ||||||
m to n
matches |
\\{m,n\\} | {m,n} | {m,n} | {m,n} | |||
m to n matches
(as few as possible) |
\\{-m,n\\} | ||||||
up to n
matches |
\\{,n\\} | {,n} | {,n} | {,n} | |||
up to n matches
(as few as possible) |
\\{-,n\\} | ||||||
Any character
except a newline |
. | . | . | . | |||
Start of
a line |
^ | ^ | ^ | ^ | |||
End of
a line |
$ | $ | $ | $ | |||
Literal / |
\/
(need to escape) |
/
(no need to escape) |
|||||
Literal dot | \\. | ||||||
Lookahead | (?=...) | \\. | |||||
Negative lookahead | (?!...) | \\. | |||||
Positive lookbehind | (?<=...) | \\. | |||||
Negative lookbehind | (?<!...) | \\. |
[1]: Python/JavaScript partially supports regular expression modifiers. To be more specifically, turning modifiers on is supported but turning modifiers off is not supported. Modifiers (once turned on) are applied to the entire regular expression and cannot be turned off.
[2]: Behavior of regular expressions in Oracle SQL is control via parameters of regular expression functions instead of via regular expression modifiers.
[3]: grep
fully supports regular expression modifiers
via Perl style regular (the -P
option) expressions.
[4]: grep
matches pattern greedly by default.
However,
in Perl style syntax you can use the modifer ?
after a quantifier to perform a non-greedy match.
For example,
instead of .*
you can use .*?
to do a non-greedy match.
[5]: As a matter of fact,
"\s"
also works in Python and it is equivalent to "\\s"
and r"\s"
.
However,
it is suggested that you avoid using "\s"
as causes confusions
especially when you call other programming languges (e.g., Spark SQL)
to run regular expression operations from Python.
The raw string pattern r"\s"
is preferred for its unambiguity and simplicity.
For more discussions on Python regular expressions,
please refer to
Regular Expression in Python
.