CHAPTER – 3
REGULAR EXPRESSIONS:
Regular expressions (called REs, or regexes, or regex patterns) are essentially a tiny, highly specialized
programming language embedded inside Python and made available through the re module. A regular
expression is a special sequence of characters that helps you match or find other strings or sets of strings,
using a specialized syntax held in a pattern. Regular expression are popularly known as regex or regexp.
For example:
\d → matches any digit (0–9)
\w → matches any word character (letters, digits, underscore)
. → matches any character (except newline by default)
Example:
import re
# Match a phone number pattern
pattern = r"\d{3}-\d{3}-\d{4}"
text = "My number is 987-654-3210"
match = re.search(pattern, text)
if match:
print("Found:", match.group())
# Output: Found: 987-654-3210
Usually, such patterns are used by string-searching algorithms for "find" or "find and replace" operations
on strings, or for input validation.
Since most of the functions defined in re module work with raw strings, let us first understand what the
raw strings are.
Raw Strings
Regular expressions use the backslash character ('\') to indicate special forms or to allow special
characters to be used without invoking their special meaning. Python on the other hand uses the same
character as escape character. Hence Python uses the raw string notation.
A string become a raw string if it is prefixed with r or R before the quotation symbols. Hence 'Hello' is a
normal string were are r'Hello' is a raw string.
Example:
normal="Hello"
print (normal)
Output: Hello
raw=r"Hello"
1
print (raw)
Output: Hello
In normal circumstances, there is no difference between the two. However, when the escape character is
embedded in the string, the normal string actually interprets the escape sequence, where as the raw string
doesn't process the escape character
Example:
normal="Hello\nWorld"
print (normal)
Output: Hello
World
raw=r"Hello\nWorld"
print (raw)
Output: Hello\nWorld.
In the above example, when a normal string is printed the escape character '\n' is processed to introduce a
newline. However because of the raw string operator 'r' the effect of escape character is not translated as
per its meaning.
Special Symbols and Characters
Regular expressions in Python use special symbols and characters, called metacharacters, to define search
patterns. These metacharacters have specific meanings that extend the capabilities of simple string
matching.
Metacharacters
Most letters and characters will simply match themselves. However, some characters are special
metacharacters, and don't match themselves. Meta characters are characters having a special meaning,
similar to * in wild card.
Here's a complete list of the metacharacters –
. ^ $ * + ? {} [] \ | ()
'\'is an escaping metacharacter. When followed by various characters it forms various special sequences.
If you need to match a [ or \, you can precede them with a backslash to remove their special meaning: \[
or \\.
2
Predefined sets of characters represented by such special sequences beginning with '\' are listed below –
Character Description Example
[] A set of characters "[a-m]"
Example:
import re
txt = "The rain in Spain"
#Find all lower case characters alphabetically between
"a" and "m":
x = re.findall("[a-j]", txt)
print(x)
Output: ['h', 'e', 'a', 'i', 'i', 'a', 'i']
\ Signals a special sequence (can also be used to escape special "\d"
characters)
Example:
import re
txt = "These are 159 colors"
#Find all digit characters:
x = re.findall("\d", txt)
print(x)
Output: ['1', '5', '9']
. Any character (except newline character) "he..o"
Example:
import re
txt = "hello planet"
#Search for a sequence that starts with "he", followed by two
(any) characters, and an "o":
x = re.findall("he..o", txt)
print(x)
Output: ['hello']
^ Starts with "^hello"
Example:
import re
txt = "hello planet"
#Check if the string starts with 'hello':
x = re.findall("^hello", txt)
if x:
print("Yes, the string starts with 'hello'")
else:
print("No match")
Output: Yes, the string starts with 'hello'
$ Ends with "planet$"
Example:
import re
txt = "hello planet"
#Check if the string ends with 'planet':
x = re.findall("planet$", txt)
if x:
print("Yes, the string ends with 'planet'")
else:
3
print("No match")
Output: Yes, the string ends with 'planet'
* Zero or more occurrences "he.*o"
Example:
import re
txt = "hello planet"
#Search for a sequence that starts with "he", followed by 0 or
more (any) characters, and an "o":
x = re.findall("he.*o", txt)
print(x)
Output: ['hello']
+ One or more occurrences "he.+o"
Example:
import re
txt = "hello planet"
#Search for a sequence that starts with "he", followed by 1 or
more (any) characters, and an "o":
x = re.findall("he.+o", txt)
print(x)
Output: ['hello']
? Zero or one occurrences "he.?o"
Example:
import re
txt = "hello planet"
#Search for a sequence that starts with "he", followed by 0 or 1
(any) character, and an "o":
x = re.findall("he..?o", txt)
print(x)
Output:
{} Exactly the specified number of occurrences "he.{2}o"
Example:
import re
txt = "helllo planet"
#Search for a sequence that starts with "he", followed excactly 2
(any) characters, and an "o":
x = re.findall("pl.{3}t", txt)
print(x)
Output: ['planet']
| Either or "falls|stays"
4
The re.match() Function
This function attempts to match RE pattern at the start of string with optional flags. Following is the
syntax for this function –
re.match(pattern, string, flags=0)
pattern: This is the regular expression to be matched.
String: This is the string, which would be searched to match the pattern at the beginning of
string.
Flags: You can specify different flags using bitwise OR (|). These are modifiers, which are
listed in the table below.
The re.match() function returns a match object on success, None on failure. A match object instance
contains information about the match: where it starts and ends, the substring it matched, etc.
The match object's start() method returns the starting position of pattern in the string, and end() returns
the endpoint. If the pattern is not found, the match object is None.
We use group(num) or groups() function of match object to get matched expression.
group(num=0): This method returns entire match (or specific subgroup num)
groups(): This method returns all matching subgroups in a tuple (empty if there weren't any)
Example:
import re
text = "Hello World"
pattern = r"Hello"
match = re.match(pattern, text)
if match:
print("Match found:", match.group())
else:
print("No match")
match.group() is your way to pull out the actual matched string (or parts of it if you used parentheses).
The re.search() Function
This function searches for first occurrence of RE pattern within the string, with optional flags.
Following is the syntax for this function –
re.search(pattern, string, flags=0)
The re.search function returns a match object on success, none on failure. We use group(num) or groups()
function of match object to get the matched expression.
Example:
import re
text = "Python programming"
pattern = r"Python"
5
match = re.search(pattern, text)
if match:
print("Match found:", match.group())
else:
print("No match")
The re.findall() Function
The findall() function returns all non-overlapping matches of pattern in string, as a list of strings
or tuples. The string is scanned left-to-right, and matches are returned in the order found. Empty matches
are included in the result.
Syntax:
re.findall(pattern, string, flags=0)
Example 1: Find all numbers
import re
text = "My roll number is 12345 and ID is 678"
pattern = r"\d+" # one or more digits
matches = re.findall(pattern, text)
print(matches)
Output:
['12345', '678']
Example 2: Find all vowels
text = "Regular Expressions in Python"
pattern = r"[aeiouAEIOU]"
matches = re.findall(pattern, text)
print(matches)
Output:
['e', 'u', 'a', 'E', 'o', 'i', 'o']
Example 3: Find words
text = "Python is fun and powerful"
pattern = r"\w+" # word characters
matches = re.findall(pattern, text)
print(matches)
Output:
['Python', 'is', 'fun', 'and', 'powerful']
6
List of Special Symbols & Characters in Regex with examples
1. Anchors
^ → Matches start of string
$ → Matches end of string
Example:
import re
print(re.findall(r"^Hello", "Hello World"))
Output: ['Hello']
print(re.findall(r"World$", "Hello World"))
Output: ['World']
2. Dot
. Matches any single character except newline
Example:
print(re.findall(r"h.t", "hat hot hit h9t h@t"))
# ['hat', 'hot', 'hit', 'h9t', 'h@t']
3. Quantifiers
* → 0 or more occurrences
+ → 1 or more occurrences
? → 0 or 1 occurrence
{n} → exactly n times
{n,} → n or more times
{n,m} → between n and m times
Example:
print(re.findall(r"ab*", "a ab abb abbb a"))
Output: ['a', 'ab', 'abb', 'abbb', 'a']
print(re.findall(r"ab+", "a ab abb abbb a"))
Output: ['ab', 'abb', 'abbb']
print(re.findall(r"\d{2,4}", "12 123 1234 12345"))
Output: ['12', '123', '1234', '1234']
4. Escaping Special Characters
\ → Escape character
\. → matches a literal dot .
\\ → matches a literal backslash
Example:
print(re.findall(r"\.", "end. finish. done"))
Output: ['.', '.']
Q. Res in Python
In the context of Python, "res" is commonly used as a variable name to represent the "result" of a
calculation or operation. It is not a reserved keyword or built-in function, but rather a standard variable
name chosen for its readability. The "re" module in Python is the built-in module and for working with
regular expressions, also known as RegEx or Reges patterns.
"res" as a variable name:
7
Common Practice: Programmers often use "res" as a variable name to store the result of a
function call, a calculation, or any other operation.
Meaning:"res" is short for "result," and it clearly indicates that the variable will hold the outcome
of a process.
Readability: This convention improves the readability of code by making it easy to understand
what a variable represents
"re" module for Regular Expressions:
Built-in module: The "re" module in Python provides tools for working with regular expressions.
Purpose of Regular Expressions: Regular expressions are powerful patterns used to search. match,
and manipulate text based on predefined patterns.
Common Operations: The "re" module allows you to search for patterns, extract information,
validate formats, and split strings based on patterns.
8
Multithreaded Programming in Python
Q. Threads and Processes
Process:
A process is simply a program in execution.
When you run a Python script (python mycode.py), the OS creates a process for it.
Each process has:
o Its own memory space
o Its own resources (files, I/O, etc.)
o At least one thread (the main thread)
Example:
If you open Google Chrome, the OS starts a process for Chrome.
If you open two tabs, Chrome may create multiple threads or even multiple processes (for
isolation).
States of a Process
A process goes through different states during execution:
1. New → Just created.
2. Ready → Waiting to be assigned to CPU.
3. Running → Currently executing.
4. Waiting/Blocked → Waiting for some event (e.g., I/O).
5. Terminated → Finished execution.
Thread:
Thread is an entity within a process that can be scheduled for execution. Also, it is the smallest unit of
processing that can be performed in an OS (Operating System). In simple words, a thread is a sequence of
such instructions within a program that can be executed independently of other code. For simplicity, you
can assume that a thread is simply a subset of a process.
Thread Control Block (TCB)
When a thread is created, the Operating System maintains information about it in a Thread Control Block
(TCB).
The TCB contains:
1.Thread Identifier (TID)
Every thread needs a unique ID so the OS can distinguish it from other threads.
Think of it like a roll number for a student in a classroom.
Example: If a process has 3 threads, the OS may give them IDs 101, 102, 103.
2. Program Counter (PC)
This stores the address of the next instruction the thread will execute.
If a thread is paused and later resumed, the PC tells the CPU where to continue from.
Example: If a thread is running a loop and stopped at line 5, the PC will point to line 5 so it knows
where to restart.
9
3. Register Set
Registers are small storage locations inside the CPU used to hold temporary data (like variables,
results of calculations, memory addresses).
The thread’s current values in registers are stored here when it is paused (context switching).
Example: If a thread is calculating A + B, the values of A and B might be sitting in registers.
4. Stack
Each thread has its own stack memory.
It is used for:
o Local variables inside functions
o Return addresses (where to go back after a function finishes)
o Function call history
Example: If a thread calls function f1(), which then calls f2(), the stack keeps track so that when f2()
ends, the thread knows to return to f1().
5. Thread State
Tells the OS what the thread is currently doing.
o Ready → waiting to run
o Running → currently using the CPU
o Waiting/Blocked → waiting for I/O or another event
o Terminated → finished execution
Example: If a thread is waiting for data from a file, it goes into Waiting state.
6. Scheduling Information
Used by the OS scheduler to decide which thread runs next.
Includes:
o Priority level (high-priority threads run first)
o CPU time used so far
o Scheduling policies (round-robin, priority scheduling, etc.)
Example: A thread with high priority may be chosen to run before a low-priority one.
7. Pointer to Process Control Block (PCB)
Since a thread belongs to a process, it needs a link back to its Process Control Block (PCB).
PCB contains process-level info like:
o Process ID
o Memory management details
o Open files
Example: If a thread needs to access process memory, the OS follows this pointer to get process
details.
10
Process vs Thread (Key Differences)
Feature Process Thread
Definition A program in execution Smallest unit of execution inside a process
Shares memory with other threads of the
Memory Has its own memory space
same process
Overhead Heavy (needs more resources) Lightweight (less overhead)
Inter-Process Communication (IPC) Direct communication (since threads share
Communication
needed memory)
Creation time Slower Faster
If one process crashes, others are If one thread crashes, it may crash the
Crash impact
safe whole process
Python Threads
Python threads provide a mechanism for achieving concurrency within a single process. They are
lightweight units of execution that share the same memory space and resources as the main program. This
allows for efficient data sharing and communication between different parts of a program.
Multithreading in Python
In Python, the threading module provides a very simple and intuitive API for spawning (creating and
starting a new process or thread) multiple threads in a program. Let us try to understand multithreading
code step-by-step.
Step 1: Import Module
First, import the threading module.
import threading
Step 2: Create a Thread
To create a new thread, we create an object of the Thread class. It takes the 'target' and 'args' as the
parameters. The target is the function to be executed by the thread whereas the args is the arguments to be
passed to the target function.
t1 = threading.Thread(target, args)
t2 = threading.Thread(target, args)
threading.Thread → creates a thread object
Step 3: Start a Thread
To start a thread, we use the start() method of the Thread class.
t1.start()
t2.start()
Step 4: End the thread Execution
Once the threads start, the current program also keeps on executing. In order to stop the execution of the
current program until a thread is complete, we use the join() method.
11
t1.join()
t2.join()
As a result, the current program will first wait for the completion of t1 and then t2. Once, they are
finished, the remaining statements of the current program are executed.
Creating a Single Thread Ex
import threading
def hello_world():
print("Hello, world!")
L=threading. Thread(target=hello_world)
L.start()
Output: hello world.
Example:
import threading
import time
def print_numbers():
for i in range(1,6):
time.sleep(1)
print(f'numbers::{i}')
def print_letter():
for letter in "ABCDE":
time.sleep(1.5)
print(f"letters:{letter}")
thread1=threading.Thread(target=print_numbers)
thread2=threading.Thread(target=print_letter)
thread1.start()
thread2.start()
thread1.join()
thread2.join()
Output:
numbers::1
letters:A
numbers::2
numbers::3
letters:B
numbers::4
letters:C
numbers::5
letters:D
letters:E
12
Q. Global Interpreter Lock
The Python Global Interpreter Lock or GIL, in simple words, is a mutex (or a lock) that allows only
one thread to hold the control of the Python interpreter. Generally, Python only uses only one thread to
execute the set of written statements. This means that in python only one thread will be executed at a
time. The performance of the single-threaded process and the multi-threaded process will be the same in
python and this is because of GIL in python. We cannot achieve multithreading in python because we
have global interpreter lock which restricts the threads and works as a single thread.
Since the GIL allows only one thread to execute at a time even in a multi-threaded architecture with more
than one CPU core, the GIL has gained a reputation as an “infamous” feature of Python. The GIL ensures
that memory management is thread-safe, which simplifies the implementation of Python and protects it
from certain types of concurrency issues.
GIL Work:
The GIL works by acquiring and releasing a lock around the Python-interpreter A thread must acquire the
GIL whenever it wants to execute Python bytecode. If another thread has already acquired the GIL, the
requesting thread has to wait until it is released. Once the thread finishes executing the bytecode, it
releases the GIL, allowing other threads to acquire it.
Python Have a GIL:
The GIL was introduced in the early days of Python as a means to simplify memory management and
make the interpreter easier to implement. Here are a few reasons why it was deemed necessary
Memory Management: Python uses reference counting for memory management. Without the GII
reference counting would require more complex locking mechanisms, which could lead to performance
overhead and deadlocks.
Ease of Implementation: The GIL simplifies the implementation of Python (the standard Python
interpreter), making it easier to develop and maintain. It reduces the need for intricate locking and
threading designs.
Compatibility: A single-threaded model makes it easier for extensions and third-party libraries to work
seamlessly with Python's memory model.
Implications of the GIL:
1. CPU-bound Tasks: In CPU-bound programs (tasks that require heavy computation), the GIL. can be
a bottleneck. Since only one thread can execute Python code at a time, multi-threading may not lead
to performance gains. For CPU-bound tasks, utilizing multi-processing (using the multiprocessing
module) is often a better approach, as it spawns separate processes that can run on different CPU
cores.
2. I/O-bound Tasks: The GIL is less of an issue for I/O-bound tasks (tasks that spend a lot of time
waiting for input/output operations). Threads can release the GIL, when waiting for 1/0 operations,
allowing other threads to run. This makes threading a viable option for applications that are I/O-
heavy, such as web servers.
3. Concurrent Programming: With the introduction of asynchronous programming, many developers
13
have shifted towards using async and await constructs, which allow for writing concurrent code that
doesn't rely on threads. This can help avoid GIL-related issues while still achieving concurrency.
Advantages of the GIL
Simplifies memory management and makes it easier to write thread-safe code.
Provides a level of safety by preventing race conditions and deadlocks.
Allows for efficient execution of I/O-bound tasks through thread-based concurrency.
Disadvantages of the GIL
Limits the benefits of multithreading for CPU-bound tasks.
It can introduce overhead and degrade performance in certain scenarios.
Requires alternative approaches, such as multiprocessing or asynchronous programming, for
optimal performance.
Thread Handling Modules in Python
Thread Handling Modules in Python: Python's standard library provides two main modules for
managing threads: thread and threading.
Q. Thread Module:
The thread module, also known as the low-level thread module, has been a part of Python's standard
library since version 2. It offers a basic API for thread management, supporting concurrent execution of
threads within a shared global data space. The module includes simple locks (mutexes) for
synchronization purposes.
Starting a New Thread Using the _thread Module:
The start new thread() method of the thread module provides a basic way to create and start new threads.
This method provides a fast and efficient way to create new threads in both Linux and Windows
This method call returns immediately, and the new thread starts executing the specified function with the
given arguments. When the function returns, the thread terminates.
Syntax: thread.start_new_thread (function, args)
Example: This example demonstrates how to use the thread module to create and run threads. Each
thread runs the print name function with different arguments. The time sleep(0.5) call ensures that the
main program waits for the threads to complete their execution before exiting.
import _thread
import time
def print_name(name, *args):
print(name, args)
# start two threads
_thread.start_new_thread(print_name, ("Python", 1))
_thread.start_new_thread(print_name, ("Python", 1, 2))
# wait so threads can finish
time.sleep(0.5)
Output:Python (1,)
Python (1, 2)
14
Q. Threading Module
The threading module (introduced in Python 2.4) builds upon the lower-level _thread module to provide a
higher-level and more comprehensive threading API. It offers powerful tools for managing threads,
making it easier to work with threads in Python applications.
Functions in threading Module
threading.active_count()
Returns the number of thread objects that are currently active.
threading.current_thread()
Returns the thread object representing the caller’s thread of control.
threading.enumerate()
Returns a list of all thread objects that are currently active.
The Thread Class
The Thread class implements threading in Python.
Important methods of the Thread class:
run()
The entry point for the thread. Defines the code that the thread will execute.
start()
Starts a thread by calling the run() method in a separate thread of control.
join([time])
Waits for the thread to terminate. Optionally, a timeout can be given.
is_alive()
Checks whether a thread is still executing.
getName()
Returns the name of a thread.
setName()
Sets the name of a thread.
Starting a New Thread Using the threading Module
The threading module provides the Thread class, which is used to create and manage threads.
Steps to Create and Start a Thread
1. Define a function that the thread should execute.
2. Create a Thread object using the Thread class by passing:
the function (target)
arguments (args)
3. Call the start() method on the thread object to begin execution.
4. (Optional) Call the join() method to wait for the thread to finish before proceeding.
Example:
import threading
# Step 1: Define the function
def print_name(name, *args):
print(name, args)
15
# Step 2: Create Thread objects
thread1 = threading.Thread(target=print_name, args=("Python Class", 1))
thread2 = threading.Thread(target=print_name, args=(" Python Class", 1, 2))
# Step 3: Start threads
thread1.start()
thread2.start()
# Step 4: Wait for threads to complete
thread1.join()
thread2.join()
print("Threads are finished...exiting")
16