Python Basic Tutorial

Python Flow Control

Python Functions

Python Data Types

Python File Operations

Python Objects and Classes

Python Date and Time

Advanced Python Knowledge

Python Reference Manual

Python Date and Time (datetime)Python @property

Python Regular Expressions (RegEx)

In this tutorial, you will learn about regular expressions (RegEx) and use Python's re module with RegEx (with the help of examples).

Regular expressions (RegEx) are a sequence of characters that define search patterns. For example,

^a...s$

The above code defines a RegEx pattern. The pattern is:withastarting withsending withAny five-letter string.

Patterns defined using RegEx can be used for string matching.

Expression	String	Match?
^a...s$	abs	No match
	alias	Matches
	abyss	Matches
	Alias	No match
	An abacus	No match

Python has a module named reRegEx. Here is an example:

import re
pattern = '^a...s$'
test_string = 'abyss'
result = re.match(pattern, test_string)
if result:
　　print("Search successful.")
else:
　　print("Search not successful.")

Here, we use the re.match() function to search for patterns in the test string. If the search is successful, this method will return a match object. If not, it will return None.

reThe module defines some other functions that can be used with RegEx. Before we delve into that, let's learn about regular expressions themselves.

If you are already familiar with the basics of RegEx, please skip toPython RegEx.

Specify the pattern using regular expressions

To specify a regular expression, meta-characters are used. In the above example, ^ and $ are meta-characters.

Meta characters

Meta characters are characters that the RegEx engine interprets in a special way. Here is a list of meta characters:

[]　. ^ $ * +　? {}　() \ |

[] - Square brackets

Square brackets specify the set of characters you want to match.

Expression	String	Match?
[abc]	a	1matches
	ac	2matches
	Hey Jude	No match
	abc de ca	5matches

In this case, [abc] will match, if you want to match a string that contains any of the a, b, or c.

You can also use-The characters within the square brackets represent a range of characters.

[a-e] is the same as [abcde].
[1-4]] is the same as [1234].
[0-39]] is the same as [01239].

You can use the insertion symbol ^ at the beginning of the square brackets to complement (invert) the character set.

[^abc] indicates any character exceptaorborcoutsideofany character.
[^0-9] indicates any non-digit character.

.- Dot

The dot matches any single character (except newline '\n').

Expression	String	Match?
..	a	No match
	ac	1matches
	acd	1matches
	acde	2matches (contains4characters)

^- Insertion symbol

The insertion symbol ^ is used to check if a string starts with a specific character.

Expression	String	Match?
^a	a	1matches
	abc	1matches
	bac	No match
^ab	abc	1matches
^ab	acb	no matches (starts with a but not followed by b)

$- Dollar

The dollar sign $ is used to check if a string iswitha specific characterend.

Expression	String	Match?
a$	a	1matches
	formula	1matches
	cab	No match

*- Asterisk

The asterisk symbol **Matcheszero or morethe remaining pattern.

Expression	String	Match?
ma*n	mn	1matches
	man	1matches
	maaan	1matches
	main	no matches (no n after a)
	woman	1matches

+- Plus sign

The plus sign + will+Matchesone or morethe remaining pattern.

Expression	String	Match?
ma+n	mn	no matches (no a character)
	man	1matches
	maaan	1matches
	main	no matches (a followed by n)
	woman	1matches

?- Question mark

The question mark ? will matchzero or one occurrencethe remaining pattern.

Expression	String	Match?
ma?n	mn	1matches
	man	1matches
	maaan	no matches (more than one a character)
	main	no matches (a followed by n)
	woman	1matches

{}- Braces

Consider the following code: {n,m}. This means that at leastn timesstyle, and at mostm timesstyle.

Expression	String	Match?
a{2,3}	abc dat	No match
	abc daat	1matches (at) daat
	aabc daaat	2matches (locatedaabc and )daaat
	aabc daaaat	2matches (locatedaabc and )daaaat

Let's try another example. RegEx [0-9]{2, 4matches at least2digit but not more than4digit

Expression	String	Match?
[0-9]{2,4}	ab123csde	1a match (matches at) ab123csde
	12 and 345673	2matches (at)12 and 345673
	1 and 2	No match

|- Vertical bar

The vertical bar | is used for alternation (or operator).

Expression	String	Match?
a\|b	cde	No match
	ade	1a match (matches atade)
	acdbea	3matches (at)acdbea

In this case, a|b matches any string that includesaorb'sString

()- Parentheses

Parentheses () are used to group subpatterns. For example, (a|b|c)xz matches any string that containsaorborcMatches and is followed byof xzString

Expression	String	Match?
(a\|b\|c)xz	ab xz	No match
	abxz	1matches (matching at) abxz
	axz cabxz	2matches (at)axzbc cabxz

\- Backslash

The backslash \ is used to escape various characters, including all meta-characters. For example,

\$aIf the string contains $ followed by a then matches a. In this case, the $RegEx engine does not interpret it in a special way.

If you are unsure whether a character has a special meaning, you can place a \ in front of it. This ensures that the character is not treated as a special character.

Special sequences

Special sequences make common patterns easier to write. Here is a list of special sequences:

\A -matches if the specified character is at the beginning of the string.

Expression	String	Match?
\Athe	the sun	Matches
\Athe	In the sun	No match

\b -matches if the specified character is at the beginning or end of a word.

Expression	String	Match?
\bfoo	football	Matches
	a football	Matches
	afootball	No match
foo\b	the foo	Matches
	the afoo test	Matches
	the afootest	No match

\B-with \b. If the specified characternot inat the beginning or end of a word, then matches.

Expression	String	Match?
\Bfoo	football	No match
	a football	No match
	afootball	Matches
foo\B	the foo	No match
	the afoo test	No match
	the afootest	Matches

\d-Matches any decimal digit. Equivalent to [0-9]

Expression	String	Match?
\d	12abc3	3matches (at)12abc3
\d	Python	No match

\D-Matches any non-decimal digit. Equivalent to [^0-9]

Expression	String	Match?
\D	1ab34"50	3matches (at)1ab34"50
\D	1345	No match

\s-Matches any position in the string that contains a whitespace character. Equivalent to [ \t\n\r\f\v].

Expression	String	Match?
\s	Python RegEx	1matches
\s	PythonRegEx	No match

\S-Matches any position in the string that contains a non-whitespace character. Equivalent to [^ \t\n\r\f\v].

Expression	String	Match?
\S	a b	2matches (at) a b
\S		No match

\w-Matches any alphanumeric character (numbers and letters). Equivalent to [a-zA-Z0-9_]. By the way, the underscore _ is also considered an alphanumeric character.

Expression	String	Match?
\w	12&": ;c	3matches (at)12&": ;c
\w	%"> !	No match

\W-Matches any non-alphanumeric character. Equivalent to [^a-zA-Z0-9_]

Expression	String	Match?
\W	1a2%c	1matches (in)1a2%c
\W	Python	No match

\Z -Matches if the specified character is at the end of the string.

Expression	String	Match?
\ZPython	I like Python	1matches
	I like Python	No match
	Python is fun.	No match

Tip:To build and test regular expressions, you can use a RegEx tester tool, such asregexThis tool can not only help you create regular expressions but also help you learn them.

Now that you have learned the basics of RegEx, let's discuss how to use RegEx in Python code.

Python regular expressions

Python has a module named re for regular expressions. To use it, we need to import the module.

import re

This module defines some functions and constants that can be used with RegEx.

re.findall()

re.findall() method returns a list of strings containing all the matches.

Example1:re.findall()

# Program to extract numbers from a string
import re
string = 'hello　12　hi　89. Howdy　34'
pattern = '\d'+'
result = re.findall(pattern, string)　
print(result)
# Output: ['12>>> match.groups()89>>> match.groups()34]

If the pattern is not found, re.findall() returns an empty list.

re.split()

The split method splits the matched string and returns a list of strings where the split occurs.

Example2:re.split()

import re
string = 'Twelve:12　Eighty nine:89.
pattern = '\d'+'
result = re.split(pattern, string)　
print(result)
# Output: ['Twelve:', 'Eighty nine:', '.']

If the pattern is not found, re.split() returns a list containing an empty string.

You can pass the maxsplit parameter to the re.split() method. This is the maximum number of splits to be performed.

import re
string = 'Twelve:12　Eighty nine:89　Nine:9.
pattern = '\d'+'
# maxsplit =　1
# Split only at the first occurrence
result = re.split(pattern, string,)　1(　
print(result)
# Output: ['Twelve:', 'Eighty nine:']89　Nine:9.']

By the way, the default value of maxsplit is 0; the default value is 0. This means splitting all matching results.

re.sub()

Syntax of re.sub():

re.sub(pattern, replace, string)

This method returns a string in which the matched items are replaced with the content of the replace variable.

Example3:re.sub()

# Program to remove all spaces
import re
# Multiline string
string = 'abc　12\
de　23　\n f45　6'
# Match all whitespace characters
pattern = '\s+'
# Empty string
replace = ''
new_string = re.sub(pattern, replace, string)　
print(new_string)
# Output: abc12de23f456

If the pattern is not found, re.sub() returns the original string.

You can passcountIt is passed as the fourth argument to the re.sub() method. If omitted, the result is 0. This will replace all occurrences of the match.

import re
# Multiline string
string = 'abc　12\
de　23　\n f45　6'
# Match all whitespace characters
pattern = '\s+'
replace = ''
new_string = re.sub(r'\s}}+', replace, string,　1(　
print(new_string)
# Output:
# abc12de　23
# f45　6

re.subn()

re.subn() is similar to re.sub(), expecting it to return a tuple containing2a tuple containing the new string and the number of replacements made.

Example4: re.subn()

# Program to remove all spaces
import re
# Multiline string
string = 'abc　12\
de　23　\n f45　6'
# Match all whitespace characters
pattern = '\s+'
# Empty string
replace = ''
new_string = re.subn(pattern, replace, string)　
print(new_string)
# Output: ('abc12de23f456',　4(

re.search()

The re.search() method takes two parameters: pattern and string. This method searches for the first occurrence of the RegEx pattern in the string.

If the search is successful, re.search() returns a match object. If not, it returns None.

match = re.search(pattern, str)

Example5: re.search()

import re
string = "Python is fun"
# Check if "Python" is at the beginning
match = re.search('\APython', string)
if match:
　　print("pattern found inside the string")
else:
　　print("pattern not found")　　
# Output: pattern found inside the string

Here,matchcontains a match object.

match object

You can usedir()The function retrieves methods and properties of the match object.

Some commonly used methods and properties of the match object are:

match.group()

The group() method returns the matching part of the string.

Example6: Match object

import re
string = '39801　356,　2102　1111'
# Three digits, followed by a space, then two digits
pattern = '(\d{3}) (\d{2})'
# The match variable contains a Match object.
match = re.search(pattern, string)　
if match:
　　print(match.group())
else:
　　print("pattern not found")
# Output:　801　35

Here,match变量包含一个match对象。

match3}) Variable contains a match object.2}) Our pattern (\d{3}) (\d{2}) have two subgroups (\d{

}) and (\d{1(
'801'
}) and (\d{2(
'35'
}) and (\d{1,　2(
>>> match.group(801>>> match.groups()35('
}) You can get a part of the string of these bracketed subgroups. That's it:
>>> match.group(801>>> match.groups()35('

', ''

)

The start() function returns the index of the beginning of the matched substring. Similarly, end() returns the end index of the matched substring. match.start(), match.end() and match.span()
2
>>> match.start()
8

>>> match.end()

The span() function returns a tuple containing the start and end indices of the matched part.
>>> match.span()2,　8(

)

The re attribute of the match object returns a regular expression object. Similarly, the string attribute returns the passed string. match.re and match.string

>>> match.re
re.compile('(\\d{3}) (\d{2})')
>>> match.string
'39801　356,　2102　1111'

We have introduced all the commonly used methods defined in the re module. If you want to learn more information, please visitPython 3 re module.

Use the r prefix before RegEx

If you use the r prefix before the regular expressionrorRthe prefix, indicates a raw string. For example, '\n' is a new line, while r'\n' represents two characters: a backslash \ followed by n.

The backslash \ is used to escape various characters, including all meta characters. However, usingrThe prefix \ treats it as a normal character.

Example7Use the r prefix for raw strings

import re
string = '\n and \r are escape sequences.'
result = re.findall(r'[\n\r]', string)　
print(result)
# Output: ['\n', '\r']

Python Date and Time (datetime)Python @property