Python Unit-3
Python Unit-3
Python Regex
A regular expression is a set of characters with highly specialized
syntax that we can use to find or match other characters or groups
of characters. In short, regular expressions, or Regex, are widely
used in the UNIX world.
The re-module in Python gives full support for regular expressions of
Pearl style. The re module raises the re.error exception whenever an
error occurs while implementing or using a regular expression.
We'll go over two crucial functions utilized to deal with regular
expressions. But first, a minor point: many letters have a particular
meaning when utilized in a regular expression.
re.match()
Python's re.match() function finds and delivers the very first
appearance of a regular expression pattern. In Python, the RegEx
Match function solely searches for a matching string at the
beginning of the provided text to be searched. The matching object
is produced if one match is found in the first line. If a match is found
in a subsequent line, the Python RegEx Match function gives output
as null.
Play Video
Examine the implementation for the re.match() method in Python.
The expressions ".w*" and ".w*?" will match words that have the
letter "w," and anything that does not has the letter "w" will be
ignored. The for loop is used in this Python re.match() illustration to
inspect for matches for every element in the list of words.
Matching Characters
The majority of symbols and characters will easily match. (A case-
insensitive feature can be enabled, allowing this RE to match Python
or PYTHON.) The regular expression check, for instance, will match
exactly the string check.
There are some exceptions to this general rule; certain symbols are
special metacharacters that don't match. Rather, they indicate that
they must compare something unusual, or they have an effect on
other parts of the RE by recurring or modifying their meaning.
Here's the list of the metacharacters;
1. . ^ $ * + ? { } [ ] \ | ( )
Repeating Things
The ability to match different sets of symbols will be the first feature
regular expressions can achieve that's not previously achievable
with string techniques. On the other hand, Regexes isn't much of an
improvement if that had been their only extra capacity. We can also
define that some sections of the RE must be reiterated a specified
number of times.
The first metacharacter we'll examine for recurring occurrences is *.
Instead of matching the actual character '*,' * signals that the
preceding letter can be matched 0 or even more times, rather than
exactly one.
Ba*t, for example, matches 'bt' (zero 'a' characters), 'bat' (one 'a'
character), 'baaat' (three 'a' characters), etc.
Greedy repetitions, such as *, cause the matching algorithm to
attempt to replicate the RE as many times as feasible. If later
elements of the sequence fail to match, the matching algorithm will
retry with lesser repetitions.
This is the syntax of re.match() function -
1. import re
2. line = "Learn Python through tutorials on javatpoint"
3. match_object = re.match( r'.w* (.w?) (.w*?)', line, re.M|re.I)
4.
5. if match_object:
6. print ("match object group : ", match_object.group())
7. print ("match object 1 group : ", match_object.group(1))
8. print ("match object 2 group : ", match_object.group(2))
9. else:
10. print ( "There isn't any match!!" )
Output:
There isn't any match!!
re.search()
The re.search() function will look for the first occurrence of a regular
expression sequence and deliver it. It will verify all rows of the
supplied string, unlike Python's re.match(). If the pattern is
matched, the re.search() function produces a match object;
otherwise, it returns "null."
To execute the search() function, we must first import the Python
re-module and afterward run the program. The "sequence" and
"content" to check from our primary string are passed to the Python
re.search() call.
This is the syntax of re.search() function -
1. import re
2.
3. line = "Learn Python through tutorials on javatpoint";
4.
5. search_object = re.search( r' .*t? (.*t?) (.*t?)', line)
6. if search_object:
7. print("search object group : ", search_object.group())
8. print("search object group 1 : ", search_object.group(1))
9. print("search object group 2 : ", search_object.group(2))
10. else:
11. print("Nothing found!!")
Output:
search object group : Python through tutorials on javatpoint
search object group 1 : on
search object group 2 : javatpoint
1. import re
2.
3. line = "Learn Python through tutorials on javatpoint"
4.
5. match_object = re.match( r'through', line, re.M|re.I)
6. if match_object:
7. print("match object group : ", match_object.group())
8. else:
9. print( "There isn't any match!!")
10.
11. search_object = re.search( r' .*t? ', line, re.M|re.I)
12. if searchObj:
13. print("search object group : ", search_object.group())
14. else:
15. print("Nothing found!!")
Output:
There isn't any match!!
search object group : Python through tutorials on
re.findall()
The findall() function is often used to look for "all" appearances of a
pattern. The search() module, on the other hand, will only provide
the earliest occurrence that matches the description. In a single
operation, findall() will loop over all the rows of the document and
provide all non-overlapping regular matches.
We have a line of text, and we want to get all of the occurrences
from the content, so we use Python's re.findall() function. It will
search the entire content provided to it.
Using the re-package isn't always a good idea. If we're only
searching a fixed string or a specific character class, and we're not
leveraging any re features like the IGNORECASE flag, regular
expressions' full capability would not be needed. Strings offer
various ways for doing tasks with fixed strings, and they're generally
considerably faster than the larger, more generalized regular
expression solver because the execution is a simple short C loop
that has been optimized for the job.
_ Underscore 1)Used in place of the Space key when spaces are U + Shift + -
restricted in writing or typing. 005F
{ Open curly brace 1)Used to open a code block in some programming U + Shift + [
languages. 007B
1)They are also used in mathematics to denote the
sets.
" Double Quotation 1) Used in grammar while denoting the direct U + Shift + '
marks or inverted speech spoken by a person. 0022
commas 2) Used in mentioning the attributes in HTML.
' Single Quotation 1) Used to denote a quotation mark in English text. U + '
mark or 2) Denotes the omitted characters and contractions. 0027
Apostrophe
< Less than an open In logical statements and mathematics used to U + Shift + ,
angular bracket denote the relationship between two values, the less 003C
value is placed left to the symbol and the greater
value to the right.
> Greater than close In logical statements and mathematics used to U + Shift +.
angular brackets denote the relationship between two values, the less 003E
value is placed to the right, and the greater value is
placed to the left.
Python Multithreading
Multithreading is a threading technique in Python programming to
run multiple threads concurrently by rapidly switching between
threads with a CPU help (called context switching). Besides, it allows
sharing of its data space with the main threads inside a process that
share information and communication with other threads easier
than individual processes. Multithreading aims to perform multiple
tasks simultaneously, which increases performance, speed and
improves the rendering of the application.
Note: The Python Global Interpreter Lock (GIL) allows running a
single thread at a time, even the machine has multiple
processors.
Thread modules
It is started with Python 3, designated as obsolete, and can only be
accessed with _thread that supports backward compatibility.
Syntax:
Methods Description
start() A start() method is used to initiate the activity of a thread. And it calls only once for each thread so that
the execution of the thread can begin.
run() A run() method is used to define a thread's activity and can be overridden by a class that extends the
threads class.
join() A join() method is used to block the execution of another code until the thread terminates.
Thread Class Methods
Follow the given below steps to implement the threading module in
Python Multithreading:
1. Import the threading module
Create a new thread by importing the threading module, as shown.
Syntax:
1. import threading
A threading module is made up of a Thread class, which is
instantiated to create a Python thread.
2. Declaration of the thread parameters: It contains the target
function, argument, and kwargs as the parameter in
the Thread() class.
o Target: It defines the function name that is executed by the
thread.
o Args: It defines the arguments that are passed to the target
function name.
For example:
1. import threading
2. def print_hello(n):
3. print("Hello, how old are you ", n)
4. t1 = threading.Thread( target = print_hello, args =(18, ))
In the above code, we invoked the print_hello() function as the
target parameter. The print_hello() contains one parameter n,
which passed to the args parameter.
3. Start a new thread: To start a thread in Python multithreading,
call the thread class's object. The start() method can be called once
for each thread object; otherwise, it throws an exception error.
Syntax:
1. t1.start()
2. t2.start()
4. Join method: It is a join() method used in the thread class to
halt the main thread's execution and waits till the complete
execution of the thread object. When the thread object is
completed, it starts the execution of the main thread in Python.
Joinmethod.py
1. import threading
2. def print_hello(n):
3. Print("Hello, how old are you? ", n)
4. T1 = threading.Thread( target = print_hello, args = (20, ))
5. T1.start()
6. T1.join()
7. Print("Thank you")
Output:
Hello, how old are you? 20
Thank you
When the above program is executed, the join() method halts the
execution of the main thread and waits until the thread t1 is
completely executed. Once the t1 is successfully executed, the main
thread starts its execution.
Note: If we do not use the join() method, the interpreter can
execute any print statement inside the Python program.
Generally, it executes the first print statement because the
interpreter executes the lines of codes from the program's start.
5. Synchronizing Threads in Python
It is a thread synchronization mechanism that ensures no two
threads can simultaneously execute a particular segment inside the
program to access the shared resources. The situation may be
termed as critical sections. We use a race condition to avoid the
critical section condition, in which two threads do not access
resources at the same time.
Let's write a program to use the threading module in Python
Multithreading.
Threading.py
What is Process?
A process is an instance of a program that is being
executed. When we run a program, it does not execute directly. It
takes some time to follow all the steps required to execute the
program, and following these execution steps is known as a process.
A process can create other processes to perform multiple tasks at a
time; the created processes are known as clone or child process,
and the main process is known as the parent process. Each
process contains its own memory space and does not share it with
the other processes. It is known as the active entity. A typical
process remains in the below form in memory.
Play Video
A process in OS can remain in any of the following states:
o NEW: A new process is being created.
o READY: A process is ready and waiting to be allocated to a
processor.
o RUNNING: The program is being executed.
o WAITING: Waiting for some event to happen or occur.
o TERMINATED: Execution finished.
Features of Process
o Each time we create a process, we need to make a separate
system call for each process to the OS. The fork() function
creates the process.
o Each process exists within its own address or memory space.
o Each process is independent and treated as an isolated
process by the OS.
o Processes need IPC (Inter-process Communication) in order to
communicate with each other.
o A proper synchronization between processes is not required.
What is Thread?
A thread is the subset of a process and is also known as the
lightweight process. A process can have more than one thread, and
these threads are managed independently by the scheduler. All the
threads within one process are interrelated to each other. Threads
have some common information, such as data segment, code
segment, files, etc., that is shared to their peer threads. But
contains its own registers, stack, and counter.
How does thread work?
As we have discussed that a thread is a subprocess or an execution
unit within a process. A process can contain a single thread to
multiple threads. A thread works as follows:
o When a process starts, OS assigns the memory and resources
to it. Each thread within a process shares the memory and
resources of that process only.
o Threads are mainly used to improve the processing of an
application. In reality, only a single thread is executed at a
time, but due to fast context switching between threads gives
an illusion that threads are running parallelly.
o If a single thread executes in a process, it is known as a
single-threaded And if multiple threads execute
simultaneously, then it is known as multithreading.
Types of Threads
There are two types of threads, which are:
1. User Level Thread
As the name suggests, the user-level threads are only managed by
users, and the kernel does not have its information.
These are faster, easy to create and manage.
The kernel takes all these threads as a single process and handles
them as one process only.
The user-level threads are implemented by user-level libraries, not
by the system calls.
2. Kernel-Level Thread
The kernel-level threads are handled by the Operating system and
managed by its kernel. These threads are slower than user-level
threads because context information is managed by the kernel. To
create and implement a kernel-level thread, we need to make a
system call.
Features of Thread
o Threads share data, memory, resources, files, etc., with their
peer threads within a process.
o One system call is capable of creating more than one thread.
o Each thread has its own stack and register.
o Threads can directly communicate with each other as they
share the same address space.
o Threads need to be synchronized in order to avoid unexpected
scenarios.
Each process is treated as a new The operating system takes all the user-level
process by the operating system. threads as a single process.
If one process gets blocked by the If any user-level thread gets blocked, all of its
operating system, then the other peer threads also get blocked because OS
process can continue the execution. takes all of them as a single process.
Context switching between two Context switching between the threads is fast
processes takes much time as they because they are very lightweight.
are heavy compared to thread.
The data segment and code segment Threads share data segment and code
of each process are independent of segment with their peer threads; hence are
the other. the same for other threads also.
The operating system takes more Threads can be terminated in very little time.
time to terminate a process.
New process creation is more time A thread needs less time for creation.
taking as each new process takes all
the resources.
1. import sys
2. a = []
3. b = a
4. sys.getrefcount(a)
The main concern with the reference count variable is that it can be
affected when two or three threads trying to increase or decrease
its value simultaneously. It is known as race condition. If this
condition occurs, it can be caused leaked memory that is never
released. It may crash or bugs in the Python program.
GIL helps us remove such a situation by using the locks to all shared
data structures across threads so that they are not changed
inconsistently. Python provides an easy way to implement the GIL as
it deals with thread-safe memory management. GIL requires offering
a single lock to a thread for processing in Python. It increases the
performance of a single-threaded program as only one lock requires
to be handled. It also helps to make any CPU-bound program and
prevents the deadlocks condition.
1. import time
2. from threading import Thread
3. COUNT = 100000000
4.
5. def countdown(num):
6. while num>0:
7. num -= 1
8.
9. start_time = time.time()
10. countdown(COUNT)
11. end_time = time.time()
12.
13. print('Time taken in seconds -', end_time - start_time)
Output:
Time taken in seconds - 7.422671556472778
Now we modify the above code by running the two threads.
Example - 2:
1. import time
2. from threading import Thread
3.
4. COUNT = 100000000
5.
6. def countdown(num):
7. while num>0:
8. num -= 1
9.
10. thread1 = Thread(target=countdown, args=(COUNT//2,))
11. thread2 = Thread(target=countdown, args=(COUNT//2,))
12.
13. start_time = time.time()
14. thread1.start()
15. thread2.start()
16. thread1.join()
17. thread2.join()
18. end_time = time.time()
19. print('Time taken in seconds -', end_time - start_time)
Output:
Time taken in seconds - 6.90830135345459
As we can see that both codes took the same time to finish. GIL
prevented the CPU-bound threads from executing in parallel in the
second code.
Related Modules
The table below lists some of the modules you may use when
programming
multithreaded applications.
Threading-Related Standard Library Modules
Module Description
tHRead Basic, lower-level thread module
threading Higher-level threading and synchronization objects Queue
Synchronized FIFO queue for multiple threads mutex Mutual
exclusion objects
SocketServer TCP and UDP managers with some threading control