Summary Python 1
Summary Python 1
.: Matches any single character except newline
^: Matches the start of the string
$: Matches the end of the string
*: Matches 0 or more repetitions
+: Matches 1 or more repetitions
?: Matches 0 or 1 repetitions
\: Used to escape various characters including all metacharacters
EXAMPLES:
Here are some ways you can use the re module with the above metacharacters:
1. Basic matching:
import re
pattern = r"Cookie"
sequence = "I want a Cookie"
if re.match(pattern, sequence):
print("Match!")
else: print("Not a match!")
Here the match() function checks for a match of the string in the beginning. If we have the word "Cookie" at the
start of the string, it will return a Match object. Otherwise, it will return None.
2. Searching:
import re
pattern = r"Cook"
sequence = "I want a Cookie"
print(re.search(pattern, sequence))
The search() function scans through the string, looking for any location where the REGEX pattern matches. If it
finds a match, it returns a match object. Otherwise, it returns None.
3. Finding all instances:
import re
pattern = r"Cook"
sequence = "Cookie Cooker cooks cookies while looking cool"
print(re.findall(pattern, sequence))
In this case, findall() function returns all non-overlapping matches of pattern in string, as a list of strings.
These are just basics! Regular expressions can get much more complex and powerful, allowing you to define
intricate patterns to match.
Remember, when you're writing regular expressions in Python, we typically use raw strings (r"mystring"). This
makes writing and interpreting them easier, since you don't have to worry about Python also using the backslash
as an escape character in its regular strings.
print(re.sub(r'colour', 'color', 'I love the colour red')) # substitutes 'colour' with 'color'
17. re.split() to split by a pattern
print(re.search(r'fox(?= hunts)', 'The quick fox hunts')) # Matches 'fox' if followed by ' hunts'
5. Negative look ahead (?!
print(re.search(r'fox(?! hunts)', 'The quick fox')) # Matches 'fox' if not followed by ' hunts'
6. Positive look behind (?<=
print(re.search(r'(?<=\bThe\b).+', 'The quick fox')) # Matches ' quick fox' if it is preceded by 'The'
7. Negative look behind (?<!
print(re.search(r'(?<!\bThe\b).+', ' quick fox')) # Matches ' quick fox' if it is not preceded by 'The'
8. Word boundary \b, non-word boundary \B
print(re.search(r'(?P<animal>cat|dog)', 'I like dogs')) # Matches 'dog' and names the group 'animal'
13. Named backreferences \g<name>
print(re.search(r'(?:cat|dog)', 'I like dogs')) # Matches 'dog' but does not create a backreference
15. Atomic groups (?>)
print(re.search(r'fox(?#Matches fox)', 'The quick fox')) # Matches 'fox' (comments are ignored)
18. Unicode property escapes \p, \P
print(re.findall(r'\X', 'naïve')) # Matches ['n', 'a', 'ï', 'v', 'e'] (includes the accent as part of 'ï')
20. Sub-routines (?(DEFINE)...), \g<name>, etc.
pattern = re.compile(r'''
(?(DEFINE)
(?P<letter>[a-z])
)
\g<letter>+
''', re.VERBOSE)
print(pattern.search('hello')) # Matches 'hello'
print(re.search(r'ha{2}', "haha")) # Matches 'haha' since 'a' appears twice successive to 'h'
2. Matching at least n repetitions {n,}
print(re.findall(r'\d', '123')) # Matches ['1', '2', '3'] since these are digits
7. Matching any non-digit \D
print(re.findall(r'\D', '123abc')) # Matches ['a', 'b', 'c'] since these are non-digits
8. Matching any white space character \s
print(re.search(r'Eat\sCake', 'Eat Cake')) # Matches 'Eat Cake' since there is a whitespace character
9. Matching any non-white space character \S
print(re.findall(r'\bThe\b', 'The cat in The hat')) # Matches both 'The' since they are whole words separated by
word boundaries
13. \A for matching only at start of string
print(re.search(r'\AThe', 'The cat')) # Matches because 'The' is at the beginning of the string
14. \Z for matching only at end of string
print(re.search(r'cat\Z', 'The cat')) # Matches because 'cat' is at the end of the string
15. Matching a character n number of times until another character is met .*?
Certainly, here are more regex examples and explanations of their functionality:
1. Inverse Matching with ^
print(re.findall(r'[^A-Za-z ]', "Hello World 123!")) # Find all characters that are not A-Z, a-z or a space
2. Matching 0 or More Repetitions with *
pattern = re.compile(r'[A-Z]')
print(pattern.search('Hello')) # Matches 'H'
7. Escape Special Characters \\
print(re.search(r'\D', '123 Rocky 456')) # Matches ' ' the first non digit character
11. match() to Determine if the RE matches at the start of the string
String
LIST
TUPLE
SET
DICT
--------------------------------------------------------------------------------------------------------------------------------------------------------
SUMMARY OF ALL
REGEX
Symbol/Method Description Input Example Output
Example
\s Matches any whitespace re.findall(r'\s', 'a b c') [' ', ' ']
character
[abc] Matches any character in the set re.findall(r'[abc]', 'abcdef') ['a', 'b', 'c']
[^abc] Matches any character not in the re.findall(r'[^abc]', 'abcdef') ['d', 'e', 'f']
set
re.IGNORECASE / re.I Ignore case re.findall('a', 'abc ABC', re.I) ['a', 'A', 'A']
re.DOTALL / re.S Make . match newlines as well re.findall('a.b', 'a\nb a b', re.S) ['a\nb', 'a b']
STRING
capitalize() Returns the string with its first character 'HELLO'.capitalize() 'Hello'
capitalized and the rest lowercased
replace(old, new[, Returns a copy of the string with all 'hello'.replace('l','y') 'heyyo'
count]) occurrences of substring old replaced by new
split([sep[, maxsplit]]) Returns a list of words in the string, 'hello world'.split(' ') ['hello',
using sep as the delimiter string 'world']
strip([chars]) Returns a copy of the string with the leading ' hello '.strip() 'hello'
and trailing characters removed
find(sub[, start[, end]]) Returns the lowest index in the string where 'hello'.find('l') 2
substring sub is found, -1 if not found
index(sub[, start[, Like find(), but raises ValueError when the 'hello'.index('l') 2
end]]) substring is not found
startswith(prefix[, Returns True if string starts with the prefix, 'hello'.startswith('h') True
start[, end]]) otherwise returns False
endswith(suffix[, start[, Returns True if string ends with the suffix, 'hello'.endswith('o') True
end]]) otherwise returns False
count() Returns the number of times a specified value occurs in a (1, 2, 2, 3, 4).count(2) 2
tuple
index() Searches the tuple for a specified value and returns the (1, 2, 2, 3, 4).index(2) 1
position of where it was found
TUPLE
Slicing [a:b] Fetches the elements from index 'a' (1, 2, 3, 4, 5)[1:3] (2, 3)
to 'b-1'
append() Adds an element at the end of the list ['a', 'b'].append('c') ['a', 'b', 'c']
clear() Removes all the elements from the list ['a', 'b'].clear() []
count() Returns the number of elements with the specified ['a', 'b', 'a'].count('a') 2
value
extend() Add the elements of a list (or any iterable), to the ['a', 'b'].extend(['c', 'd']) ['a', 'b', 'c', 'd']
end of the current list
index() Returns the index of the first element with the ['a', 'b', 'a'].index('b') 1
specified value
insert() Adds an element at the specified position ['a', 'b'].insert(1, 'c') ['a', 'c', 'b']
pop() Removes the element at the specified position ['a', 'b', 'c'].pop(1) 'b'
remove() Removes the first item with the specified value ['a', 'b', 'a'].remove('a') ['b', 'a']
reverse() Reverses the order of the list ['a', 'b', 'c'].reverse() ['c', 'b', 'a']
copy() Returns a copy of the dictionary {'a': 1, 'b': 2}.copy() {'a': 1, 'b': 2}
fromkeys(seq[, v]) Returns a new dictionary with keys from dict.fromkeys(['a', {'a': 1, 'b': 1}
seq and value equal to v 'b'], 1)
get(key[,d]) Returns the value of the key. If the key {'a': 1, 'b': 2}.get('a') 1
does not exist, return d (defaults to
None)
items() Returns a new object of the dictionary's {'a': 1, 'b': 2}.items() dict_items([('a', 1),
items in (key, value) format ('b', 2)])
keys() Returns a new object of the dictionary's {'a': 1, 'b': 2}.keys() dict_keys(['a', 'b'])
keys
pop(key[,d]) Removes the item with the key and {'a': 1, 'b': 2}.pop('a') 1
returns its value or d if key is not found.
If d is not provided and key is not found,
it raises KeyError
popitem() Removes and returns a (key, value) pair {'a': 1, 'b': ('b', 2)
as a 2-tuple 2}.popitem()
setdefault(key[,d]) Returns the value of key. If key does not {'a': 1, 'b': 3
exist, insert key with a value of d and 2}.setdefault('c', 3)
return d (defaults to None)
update([other]) Updates the dictionary with the {'a': 1, 'b': {'a': 1, 'b': 3}
key/value pairs from other, overwriting 2}.update({'b':3})
existing keys
values() Returns a new object of the dictionary's {'a': 1, 'b': 2}.values() dict_values([1, 2])
values
SET
endswith() Returns true if the string ends with the specified value
find() Searches the string for a specified value and returns the position of where it was found
index() Searches the string for a specified value and returns the position of where it was found
isalpha() Returns True if all characters in the string are in the alphabet
islower() Returns True if all characters in the string are lower case
isupper() Returns True if all characters in the string are upper case
replace() Returns a string where a specified value is replaced with a specified value
split() Splits the string at the specified separator and returns a list
extend() Add the elements of a list (or any iterable), to the end of the current list
index() Returns the index of the first element with the specified value
difference() Returns a set containing the difference between two or more sets
difference_update() Removes the items in this set that are also included in another, specified
set
intersection_update() Removes the items in this set that are not present in other, specified
set(s)
symmetric_difference_update() Inserts the symmetric differences from this set and another
union() Return a set containing the union of sets
update() Update the set with another set, or any other iterable
Tuple:
Method Description
index() Searches the tuple for a specified value and returns the position of where it was found
Dictionary:
Method Description
setdefault() Returns the value of the specified key. If the key does not exist: insert the key, with the specified
value