Application Based Programming in Python_ACE_INTL - Copy
Application Based Programming in Python_ACE_INTL - Copy
Python
APTECH LIMITED
Contact E-mail: [email protected]
Edition 1 – 2023
Preface
This book is the result of a concentrated effort of the Design Team, which is
continuously striving to bring you the best and the latest in Information Technology.
The process of design has been a part of the ISO 9001 certification for Aptech-IT
Division, Education Support Services. As part of Aptech’s quality drive, this team
does intensive research and curriculum enrichment to keep it in line with industry
trends.
Design Team
Table of Contents
Sessions
Python is known for its simplicity, efficiency, and readability. Its elegant syntax,
which is similar to English sentences, makes it easier to write code and allows
developers to focus on solving logical errors in the code rather than focusing
on syntax errors. Python also requires programmers to write lesser code as
compared to other programming languages. It supports modularity and code
reuse. This code simplicity reduces the overhead of code maintenance.
Python also facilitates rapid application development across multiple
platforms.
The Python interpreter and comprehensive standard library are freely available
in both source and binary formats, enabling widespread distribution across
major platforms.
With its versatility and compatibility, Python finds extensive use in Web
development, software development, mathematics, and system scripting. Its
simple syntax and support for procedural, object-oriented, and functional
programming make it a preferred choice for developers.
This versatile tool combines code execution, text, and visualizations in a single
document called a notebook. These notebooks consist of cells that can be
executed individually, allowing users to experiment with code, visualize data,
and document their workflow seamlessly.
Jupyter Notebook
The site automatically detects the operating system and displays the
appropriate version to download.
2. Under Download the latest version for Windows, click Download Python
3.11.4.
The .exe file is downloaded.
9. Press Enter.
The installed version is displayed as shown in Figure 1.4.
jupyter notebook
16. To execute a Python command, in the box, type the command as:
The renamed file appears in the list on the Jupyter homepage as shown
in Figure 1.12.
To open the notebook at a later point in time, click the file in the list.
1.3 Keywords
Python keywords are categorized into different groups based on their usage,
which helps organize and understand their purpose. Table 1.1 describes the
categories of keywords.
Note that the keywords may change over time as Python evolves.
It is always recommended to refer to official documentation or
trusted sources for the most up-to-date information on Python
keywords and their usage.
1.4 Identifiers
There are some rules that the users must follow when specifying names for
identifiers to avoid errors in code. Figure 1.13 shows these rules.
Although, Python does not have any limitation for the length of the identifier
name, it is recommended to keep it short so that they are easy to remember
and use within the code. As per the Python Enhancement Proposal (PEP-8)
standard, the maximum length of an identifier name can be 79 characters. In
addition, use meaningful names so that it is easy to identify what information
the identifiers are storing. In Python, the class names usually start with
uppercase letters and other identifiers start with lowercase letters.
When writing a program, the values to be used in the program or the values
that are calculated in the program are stored in the memory. How can one
access these values from memory? To do that, names are assigned to the
locations where each of these values is stored. These names are known as
variables.
Code Snippet 1:
student_name = “Larrissa”
marks = 90
However, Python does allow variables to be declared with specific types using
casting.
For example, consider these variables as shown in Code Snippet 2:
Code Snippet 2:
student_name = str(Larissa)
roll_no = int(3)
marks = float(90.5)
In this case, student_name will store a string value “Larrissa”; roll_no will
store an integer value 3 and marks will store a floating-point value 90.5. Here,
string, integer, and floating point are data types that specify the type of data
stored in the variables.
In this case, variables will be assigned values in the order specified. That is,
marks1 will be assigned the value 90; marks2 will be assigned the value 75,
and marks3 will be assigned the value 85.
In this case, all three variables, marks1, marks2, and marks3, will be assigned
the value 90.
The rules defined for naming identifiers apply to names specified for
variables.
Scope of Variables
The scope of variables specifies where in the program those variables are
accessible. Variables can have a local scope or a global scope. A variable
with global scope is accessible throughout the entire program, while local
variables are accessible only within the block of code or function where they
are first used.
For example, consider the code in Figure 1.14. When executed in Jupyter
Notebook, this code gives an error.
To run the code successfully, replace the last print statement with the code
shown in the code:
total_marks()
Execute the code once again. The output will appear as shown in Figure 1.15.
Consider a scenario in which the sum of two values has to be calculated and
printed. The two values to use must be specified by the user. In this case, the
developer can use the input function to get the values from the user. After
the calculation, the developer can use the print function to display the
output on the screen.
input (prompt)
In this syntax, prompt refers to the message that will be displayed for the user
to enter the required value. The values entered by the user at the prompt will
be a string. To convert these into numbers, the int function must be used.
print (msg)
In this syntax, msg refers to the message or values to be printed on the screen.
When the user enters the first number as 10 and presses Enter, the prompt for
the second number appears as shown in Figure 1.17.
Consider that the user enters the second number as 10 and presses Enter.
1.7 Statements
Simple statements in Python are those lines of code that are written on a single
line. In Python, a simple statement can be a line of code that assigns a value
to a variable. For example, consider the code in Code Snippet 3.
Code Snippet 3:
student_name = “Larrissa”
marks = 90
In this case, there are two simple statements. The first statement assigns the
value Larrissa to the variable student_name and the second statement
assigns the value 90 to the variable marks.
Code Snippet 4:
intro_msg = “Welcome to a \
world of opportunities that\
help you shape your life”
intro_msg = (“Welcome to a
world of opportunities that
help you shape your life”)
Table 1.3 lists the different types of statements that are used in Python.
All comments are denoted by the hash (#) symbol. For multi-line comments,
each line of the comment must begin with #. The other symbols that can be
used for multi-line comments are triple quotes: single (''') or double (""").
These symbols are used at the beginning and the end of the multi-line
comment.
Triple single and double quotes are also used to enclose text values in
the code. The text within the quotes will be considered as values if they
are assigned to a variable. If the text within the quotes is not assigned
to any variables, it will be considered as comment.
PEP-8 recommends that the comments are concise and focused. The
comments can contain a maximum of 72 characters for inline and
single-line comments. Multiple-line comments must be used if the
length of the comment exceeds this limit.
1.9 Indentation
Most programming languages use curly braces to indicate the start and end
of a block of code, even if the code is not indented. However, Python does
not use any kind of braces. It uses indentation to indicate the start and end
of code blocks. If the indentation is not correctly followed, the lines of code
will not be executed in the required order, leading to incorrect results.
Statement 1
Statement 2
a. return
b. continue
c. break
d. pass
3. Which of the following options are the implicit line continuations used to
split a statement in Python?
a. /
b. ( )
c. [ ]
d. [ ]
1 a, c
2 d
3 b, c, d
4 d
5 a, b
1. Install Python.
2. Install and open Jupyter Notebook.
3. Create a Python notebook. Consider that you have declared two
variables num1 as 18 and num2 as 10. Perform the given tasks:
a. Write a single-line comment on “Python program to multiply
two numbers”.
b. Write an expression statement in Python to multiply the two
variables num1 and num2 and add a constant value, 100. Store the
result of the expression statement in a variable, product_result.
c. Write an inline comment for the expression statement created in
b.
d. Display the result, product_result.
Learning Objectives
When writing code, developers use variables to store values. Several types of
values can be stored in the variables, such as whole numbers, decimal
numbers, text, dates, and so on. These are termed as data types. Python
supports several data types. The values stored in these variables are used to
perform different types of operations, such as mathematical, logical,
comparison, and so on. Python supports a host of operators that can be used
to perform these operations. Like other programming languages, Python also
supports flow control statements. These statements are used to determine
which code to execute or skip, based on specific conditions.
In this session, different types of data types in Python will be dealt with.
Data types help to determine the type of value or data a variable can hold.
Each type of data requires varying space in the memory. A data type helps
reserve the required space for a value in the memory.
In most programming languages, when declaring a variable, the data type for
the variable must also be specified. However, Python behaves differently. The
data type is not required to be specified for the variables; Python identifies the
data type based on the value stored in the variable. Therefore, care must be
taken when assigning values to a variable. If different types of data are
assigned to one variable, the last value assigned will determine the data type.
Python also allows data type to be specified if required.
Numeric data types are assigned to variables in which the developer wants
to store numbers. These numbers can be integers, floating point numbers, or
complex numbers.
This data type is used to store integers in variables. Integers are whole numbers.
They can be positive or negative numbers without decimals or zeros. Some
examples of integers are 5, 25, 300, -80, and -60. The variable that stores
integers are declared using the int keyword.
Some instances where this data type may find its use are:
Figure 2.1 shows the usage of the int data type in Python.
This data type is used to store decimal numbers in variables. These numbers
can be positive or negative numbers with decimals or zeros. Some examples
of decimal numbers are 5.5, 25.363E2, 300.50, -80.75, and -60.35. The
letter ‘E’ indicates an exponential. The variable that stores decimal numbers is
declared using the float keyword.
Complex numbers have two parts: one real and the other imaginary. They can
be represented as p+/-nj, where p and n are real numbers while j is an
imaginary unit. Examples of complex numbers are 3-8j, 9j, -5j, and 2-21j.
The variable that stores complex numbers is declared using the complex
keyword.
List
Tuple
Tuples are similar to lists and can be used to store ordered lists of items. The
difference between the two is that tuples do not allow to change the items
once created. They allow duplicate values to be stored. Tuples are
represented by parentheses (). Examples of tuples are (“apple”,
“cabbage”, “tomatoes”, “potatoes”) and (“Larissa Smith”, 45,
“123 Main Avenue, somecity, 12345”, “123-456-789”).
Range
Range is a data type that is used to store an array of integers or whole numbers.
It is usually used for looping purposes to keep a watch on the increasing or
decreasing counter. Its index starts from 0. Range(10) means values can be
stored from Range(0) to Range(9).
Mapping data type is used for variables that store data in key-value pairs.
Python supports only one mapping data type, which is dictionary. An example
of dictionary data is {“name”: “Larissa”, “age”: 45, “address”: “123
Main Avenue, somecity, 12345”, “phone”: “123-456-789”}. In this
example, name, age, address, and phone are the keys. The data Larissa, 45,
123 Main Avenue, somecity, 12345, and 123-456-789 are the values.
Dictionary variables are declared using the keyword dict.
A variable of Boolean data type can take two values: True or False. It
represents the truth of the given expression. For example, 100 > 200 is False,
but 100 < 200 is True. In Python, a variable can be declared as Boolean using
the keyword bool.
2.1.6 Set
After a set is created, the items in the set cannot be changed. However, items
can be added to or removed from a set. The set cannot have duplicate
values. If there is a duplicate value, only one value is considered. Note that 1
and True are considered as duplicate values. Similarly, 0 and False are
considered as duplicates. A set can be created using a list, tuple, or string using
the set keyword.
2.2 Operators
Operator Description
+ Used to add two operands
- Used to subtract two operands
* Used to multiply two operands
/ Used to divide the two operands
% Used to divide two operands and return the remainder
Used to raise the first operand to the power of the
**
second operand
Used to divide two operands and round the result down
//
to the nearest integer
Operator Description
This operator checks whether both operands are equal. It
==
returns True if they are equal and False if they are not equal.
!= This operator checks whether the operands are equal. It returns
True if they are not equal and False if they are equal.
> This operator checks whether the first operand is greater than
the second operand. It returns True if the first operand is
greater than the second operand; else it returns False.
< This operator checks whether the first operand is less than the
second operand. It returns True if the first operand is less than
the second operand; else it returns False.
>= This operator checks whether the first operand is greater than
or equal to the second operand. It returns True if the first
operand is greater than or equal to the second operand; else
it returns False.
<= This operator checks whether the first operand is less than or
equal to the second operand. It returns True if the first operand
is less than or equal to the second operand; else it returns
False.
Operator Description
Performs an AND operation and returns True only if both the
and
logical expressions evaluate to True
or Performs an OR operation and returns True if any of the logical
expressions evaluate to True
not Reverses the result of the operand
Example
Operator Description
Decimal Binary
For the Bitwise NOT (~) operator, when the bits are inverted, the sign
will also be inverted. So, a positive integer will result in a negative
integer.
You can use assignment operators to assign values to variables. These values
can also be computer values resulting from an arithmetic or logical operation.
Table 2.5 lists various assignment operators available in Python.
Operator Description
Assigns the value specified on the right of the operator to the
=
variable specified on the left
Adds the operands specified on the right and left and assigns
+=
the result to the variable on the left
Subtracts the operand specified on the right of the operator
-= from the operand specified on the left and assigns the result to
the variable on the left
Multiplies the operands specified on the right and left and
*=
assigns the result to the variable on the left
Divides the operand specified on the left of the operator by the
/= operand specified on the right and assigns the result to the
variable on the left
Python also includes bitwise assignment operators. These operators are listed
in Table 2.6.
Operator Description
Performs bitwise AND operation on the operands and assigns
&=
the result to the variable on the left
Performs bitwise OR operation on the operands and assigns the
|=
result to the variable on the left
Operator Description
This operator checks whether the specified value or variable
in exists in the given sequence. It returns True if the value or
variable exists in the sequence; else, it returns False.
This operator checks whether the specified value or variable
not in does not exist in the given sequence. It returns True if the value
or variable does not exist in the sequence; else, it returns False.
These operators compare the memory location of the two specified values or
variables to determine whether they are the same or not. Table 2.8 lists two
identity operators supported in Python.
Operator Description
This operator checks whether the memory location referred to
by the variables specified on the right and left is the same. It
is
returns True if the referred memory location is the same; else, it
returns False.
This operator checks whether the memory location referred to
by the variables specified on the right and left is not the same.
is not
It returns True if the referred memory location is not the same;
else, it returns False.
A program usually executes the code from top to bottom. If developers want
to alter the flow of execution or skip some lines of code depending on
specified conditions, they can use control flow statements. Python supports
three types of control flow statements:
if Statement
The if statement is used when a particular set of code must be executed only
if the specified condition is met; else the set of code must be ignored. The
syntax for the if statement is:
if <condition>:
<code to be executed condition is met>
In Figure 2.20, the first print statement is executed because the condition a >
90 is met. However, the second print statement is not executed because the
condition a < 90 is not met.
if-else Statement
if <condition>:
<code to be executed if condition is met>
else:
<code to be executed if condition is not met>
if-elif-else Statement
if <condition1>:
<code to be executed if condition1 is met>
elif<condition2>:
<code to be executed if condition2 is met>
elif<condition3>:
<code to be executed if condition3 is met>
...
else:
<code to be executed if none of the conditions are met>
match-case Statement
match <expression>:
case <value1>:
<code to be executed if value1 is true>
case <value2>:
<code to be executed if value2 is true>
case <value3>:
<code to be executed if value3 is true>
...
case _:
<code to be executed if none of the case values are
true>
for Statement
The for statement is used to repeat a set of code for each value in a sequence
such as a list, tuple, range, set, or dictionary. For example, consider that there
while Statement
The while statement is used to repeat a set of code statements until the
specified condition evaluates to true. The loop stops when the specified
condition evaluates to false. For example, consider that a developer wants to
identify the odd and even numbers between 0 and 10 to print appropriate
messages. The developer can use the while statement to execute the code
repeatedly until the value of the condition variable is less than or equal to 10.
while <condition>:
<code to be executed>
The specified condition must evaluate to false at some point in the code
execution. Otherwise, the code will just go on executing and the execution will
not come out of the loop, thereby creating an infinite loop. The loop must then
be stopped manually, resulting in the program not executing completely.
While statements are getting executed, transfer of control to some part of the
program may be required based on the occurrence of some event. In such
cases, transfer statements are used. These statements can be used with
iterative or loop statements and functions.
break Statement
For example, consider that the developer is using a for loop for iterating
through a list of student names and printing Hello <student name> for each
student name. The developer wants to terminate the program when it
encounters the name, Ben. To do so, the developer can use the break
statement, as shown in Figure 2.26.
continue Statement
For example, consider that the developer is using a for loop for iterating
through a list of student names and printing Hello <student name> for each
student name. The developer wants to skip the name Angela from the list and
proceed with the next name.
pass Statement
If the developer wants to reserve some space in the program code to add the
required code sometime later, the developer can use the pass statement. The
pass statement ensures that no compilation error occurs and the code runs
successfully even if a block of code has not been completed. For example,
consider that the developer is planning to include an if statement but will do
it at a later point in time. In this case, the developer can use the pass statement
as shown in Figure 2.28.
⮚ Data types help to determine the type of value or data a variable can
hold.
⮚ Data types in Python can be categorized into numeric, string, sequence,
dictionary, boolean, and set.
⮚ Operators help to perform specific operations on variables and values,
known as operands.
⮚ Arithmetic, relational, assignment, logical, membership, identity, and
bitwise are some of the operators.
⮚ Control flow statements help in controlling the flow of program
execution.
⮚ The control flow statements supported by Python are conditional,
iterative, and transfer statements.
⮚ The break, continue, and pass statements are used to transfer the
program execution to some part of the program based on the
occurrence of some event.
a. Tuple
b. Set
c. List
d. Dictionary
a. Tuple
b. Set
c. List
d. Dictionary
4. Which of the following codes can be inserted on line 3 to get the output
as True?
1 num_list = [11, 15, 21, 29, 50, 70]
2 number = 12
3 //insert code here
a. 2 4 6 8
b. 4 6
c. 2 8
d. 2 4 6
1 a, c, d
2 b, d
3 c
4 a, d
5 d
1. Write Python code to create two lists, list1, and list2 where
list1=[“car”, ”cycle”, “bus”, ”car”, ”scooter”] and list2 is
empty. Add the elements from list1 to list2, such that list2
contains only the unique elements (remove duplicates). Print the
elements of list2.
Write a Python code to create a list named mark_list with the marks
35, 75, 86, and 98. Use conditional and iterative statements to print the
grade of the students based on the marks given in the mark_list. The
output should consist of mark and grade.
(ii).
var_num = 0
while var_num < 5:
var_num = var_num+1
if var_num == 2:
break
print(var_num)
Learning Objectives
3.1 Functions
Apart from the built-in functions provided by Python, developers can create
customized functions that perform specific tasks. Such functions are called
user-defined functions. Python provides the def keyword to define a
user-defined function. The syntax to create a function is:
Def function_name(parameter1,parameter2):
#docstring
function_body
Return statement
Let us look at how a simple Python function named message is created. This
function displays the text 'Welcome to Basic Python programming lab'.
The function is called or invoked from another part of the program. The
function call statement includes only the name of the function followed by
parentheses. This function call does not pass any parameters to the function.
Figure 3.1 shows the code and the output to create and call a function.
Functions in Python can be made more versatile when used with parameters.
Parameters are variables or values passed to a function when calling a
function. The code inside the function uses these parameters to accomplish
the desired task.
In this example, the function call statement passes the values 'Richard' and
'Basic Python and Machine Learning' to the function. The name parameter
is assigned the value ‘Richard’ and the course_name parameter is assigned
the value ‘Basic Python and Machine Learning’.
After executing the block of code inside a function, the return statement
passes the values obtained as the result of the execution of the function to the
caller.
Let us look at how a function with a return value is created, called, and
executed. Consider a function named multiply with a, b, and c as its
parameters. This function calculates the product of a, b, and c and stores the
Figure 3.3 shows the code and the output that returns a value.
In this example, the function call statement passes three arguments to the
function. These arguments provide the values of a, b, and c as 2, 4, and 5,
respectively. The function calculates the product of these values and passes
the calculated value back to the caller using the return statement.
Scope of a variable refers to the area of a program where the variable can be
referenced and accessed. Two types of variables in Python with respect to
their scope in a function are:
Local variable
Global variable
Variables that are declared inside a function are visible and accessible only
inside the function. Such variables are called local variables. They cannot be
accessed from outside the function. Local variables are created when the
function is called and are destroyed when the function completes execution.
Figure 3.4: Error Thrown While Accessing a Local Variable Outside its Scope
When the code in the main program tries to access the input2 variable
outside the func_Add function, an error is thrown. Figure 3.4 shows that the
program can access input2 inside the function, whereas the program throws
an error when it tries to access input2 outside the function.
Variables that are declared outside any block of code are called global
variables. These variables can be accessed from any part of the program.
Consider the function named func_Add1 which is similar to func_Add except
that the variable named input2 is declared outside the function.
If a variable with the same name as the global variable is declared inside a
function, then the value of the global variable is overridden by the locally
declared value. In the previous example, consider that input2 is declared
both outside the function and inside the function with different values.
In Figure 3.7 the main program calls the MyFunc function with arguments
namely arg1, arg2, ... argN. The control is transferred to the function to
execute the code block inside it. The arguments are replaced by parameters
paramtr1, paramtr2, paramtr3,.. paramtrN. These parameters hold the
values of arguments. paramtr1 takes the value of arg1, paramtr2 takes the
value of arg2, and so on. The four types of arguments in Python are as follows:
Code Snippet 1:
def
MyFunc_Default(empid,empname,department='Research'):
print("The employee details are ", empid, ",",
empname, ",", department)
Based on the requirement, the function call for this function can either specify
or not specify the department name in the argument list. If value for the
argument is not specified, the function will use the default value, ‘Research’,
defined in the function definition. If a value is specified, the function will use the
specified value. Code Snippet 2 lists the code to create function calls for both
cases:
Code Snippet 2:
The first call statement passes the value of arguments for the first two
parameters and does not pass the value for the department. The function
takes the default value, ‘Research’. The second call statement passes the
The order of arguments in a function call must match with the order of
parameters in the function definition. Such an arrangement of arguments is
called positional arguments. If there is a mismatch in the order of arguments
between the function definition and function call, the function might provide
incorrect outputs. Consider a function named MyFunction that takes name
and age of a person as input parameters and displays this information. Code
Snippet 3 lists the function code to perform this:
Code Snippet 3:
Code Snippet 4:
Figure 3.9 shows the outputs of both function calls. It can be observed that
the output of the second function call is not as expected.
Consider that the function named Myfunction takes two parameters as input:
name and age of a person. This function is called with keyword arguments
where the values of the parameters are specified using their respective names.
The value ‘Clayman Saw’ is passed for the name parameter and 32 is passed
for the age parameter.
Figure 3.10 shows the code for the function and its output.
Keyword arguments whose count is not known in advance are called arbitrary
keyword arguments. The argument in the function definition is defined with two
asterisk symbols preceding it. Consider the example in the previous topic of
arbitrary positional arguments. Let the developer pass keyword arguments to
the marks function.
Figure 3.13 shows the code and the output of this function.
Note that the values for subject name and marks are assigned in key-value
pair in the function call. In the first function call, the name ‘Catherine’ is
passed with three subjects and their respective marks. In the second function
call, the name ‘Josephine’ is passed with four subjects and their respective
marks.
def show(**kwargs):
for i in kwargs:
print(i)
show(name="Alexander", age=20)
a. name
age
b. Alexander
20
c. name=Alexander
age=20
d. name=Alexander
def fun1(age):
return age + 5
fun1(5)
print(age)
a. 5
b. 10
c. NameError
d. TypeError
def fun1_num(x):
x=x+1
x=100
print(x)
x=x+1
fun1_num(200)
a. 201
b. 101
c. 202
d. 100
1 a, b, d
2 a, c
3 a
4 c
5 d
2. Write a Python function that accepts three values from the console. The
function must return the difference between the first two values and the
product of the last two values.
4. Write a Python function to find and display only those words which have
more than four characters from the list named words. Consider that the
list words = ["Python", "Java", "Ruby", "Perl", "JavaScript"].
(i)
x=20
def outer_fun():
x = 10
def inner_fun():
y = 20
print(x + y)
inner_fun()
print(x)
outer_fun()
(ii)
def function1(var1,var2=5):
var3=var1*var2
return var3
var1=3
print(function1(var1=5,var2=6))
print(function1(var1,var2=6))
print(function1(var1,var2=3))
Learning Objectives
Functions differ in the way they are defined. For example, some functions can
call themselves until a specific condition is met. Such functions are termed
recursive functions. Certain concise functions can be anonymous with a single
expression in their body. This type of function is called a Lambda function.
Functions can be part of programs or modules. Modules are collections of
related variables, classes, and functions that can be imported as and when
necessary. Modules are often grouped along with similar functions and
distributed as packages. This makes it easier for developers to import a
package and use the specific function only when required.
A very effective way of solving problems is by breaking them down into similar
smaller subproblems. This technique is efficient in searching, sorting, and
traversing data structure. Even mathematical series can be calculated using
this breaking-down technique. While doing so, to further simplify the problem,
the same technique is applied repeatedly until the problem cannot be further
broken down.
To understand this better, consider calculating the solution for the series: an +
an-1 + an-2 + an-3 + an-4 +…+ a1, given the value of a and n. The solution to
this problem can be calculated as shown in Figure 4.1:
This is a classic case of recursion. As seen in the Figure 4.1, the sum is broken
down into a value (nth term) plus sum of the series up to the previous term
(n-1). In each step, the series is broken down one step further. This repetition of
calculation can be coded using recursion. The recursiveness ends when the
power becomes one.
Let us now see the general syntax for coding recursion in Python.
The main program calls the recursive function. Inside the function, after the
processing steps are executed in the function body, a condition is checked. If
the condition is not satisfied, then the function calls itself with a reduced set of
input. Otherwise, the control returns to the main program. Until the condition is
not satisfied, the function calls itself repeatedly.
Figure 4.3 shows the code and the output of the sum of the series example.
First, the main program makes the function call, SeriesSum(4,6). Then, the
recursive function SeriesSum calls itself five more times to complete the
calculation. Finally, the control returns to the main program and the answer is
printed.
Let us create a lambda function to multiply two numbers. Figure 4.5 shows the
working of this lambda function in Jupyter environment.
Note that the (lambda x,y:x*y) is the lambda function. (4,5) after the
function is the set of values passed to call the function. Thus, lambda functions
are created to be called and executed instantaneously.
Figure 4.6 shows the working of this lambda function in Jupyter environment.
Here, the variable product is assigned the return value of the lambda function
which takes x and y as parameters. When the lambda function is invoked by
calling product(4,5), the result gets printed. Note that there is no explicit
return statement in lambda function.
Code Snippet 1 shows the code for a regular function equivalent to the
product lambda function.
Code Snippet 1:
def Product(x,y):
return x*y
print('The product of 4, 5 =', Product(4,5))
The filter function uses two arguments. The first argument is the condition to
be checked for and the second argument is a sequence from which data
must be filtered. The filter function then, returns the sequence with values that
satisfy the condition.
filter(condition, sequence)
For example, the condition can be a lambda function that checks for the
divisibility of its argument by 3. This allows filtering the values in the sequence
which are divisible by 3.
Figure 4.7 shows the code and the output of the filter function that takes a
lambda function as an argument.
% is the built-in function that gives the remainder when x is divided by 3. If the
remainder is zero, it means that x is perfectly divisible by 3. The numbers that
are divisible by 3 are filtered from the list, Given_list. After filtering, the
resultant numbers are sorted and printed. The lambda function which checks
for divisibility is used as an argument in the filter function.
The map function uses two arguments. The first argument is the action to be
performed on each value of a sequence and the second argument is the
sequence.
map(functionality, sequence)
Figure 4.8 shows the code and the output of the map function that takes a
lambda function as an argument.
The expression x*3 in the lambda function gives the product of x and 3. These
resulting numbers are added to a list, result, and the result is then printed.
The reduce function uses two arguments. The first argument is the action to be
done on each value of a sequence and the second argument is the
sequence. This function reduces the sequence values from their original value
using the specified functionality. It can return a single value or multiple
values-one for each member of the sequence.
reduce(functionality, sequence)
Functionality is a lambda function that returns each value in the list with some
reduction or a single reduced value. The reduce function must be imported
from the functools module.
Figure 4.9 shows the code and the output of the reduce function that takes a
lambda function as an argument.
● Built-in module
● User-defined module
Python offers a rich library support that can be imported readily into the code.
These ready-to-use modules are called built-in modules. They are part of the
Python standard library and are available with Python installation. Datetime,
math, and sys are some of the built-in modules.
Importing a Module
The built-in modules offer a variety of functionalities such as file handling, string
manipulation, date and time handling, and mathematical operations.
Developers must import these modules to make them available in their
programs. The syntax to import all the functions in a module is:
import <modulename>
Consider that the developer wants to make use of the sqrt function that
calculates the square root of a number. This function is available in the math
module. Figure 4.10 shows the code to import and utilize this function in a
program.
Consider that the developer wants to import the gcd function from the math
module. Figure 4.11 shows the code to import and utilize the gcd function in a
program.
The gcd function takes two numbers as arguments and calculates the greatest
common divisor of the two numbers.
import <modulename1,modulename2,..>
Consider that the developer wants to import two modules—math and random
to calculate the area of a shape. The shape for which area must be calculated
is chosen at random by the choice function in the random module.
The pi function and the pow function in the math module are used to calculate
the value of pi and the power of a number, respectively. The choice function
in the random module picks a random value from the provided list.
For ease of referring to a module in the code, the module can be referred to
using another name. The syntax for renaming the module is:
Consider that the developer wants to import and rename the random module
as rand. Figure 4.13 shows the code to perform this is.
Here, the random module is imported as rand. Any further reference to the
random module must be made by addressing it as rand. The randint function
of the random module picks a number in the specified range.
Python provides another method to import all the functions of a module. The
first method to import a module using the import <modulename> command
was covered earlier in this session. The syntax to import all the functions in a
module is:
Though both these methods provide the same result, the most convenient one
is to import all the modules using the asterisk (*). In this method, for each
reference of a function, the developer does not have to mention the name of
the module as a prefix. Consider that the developer wants to use several
functions from the math module. Figure 4.14 shows the code to perform this.
Note that the developer does not have to refer to any of the functions with the
module name as its prefix. For example, the factorial function is referred to
as factorial and not as math.factorial.
Python offers the facility for developers to create their own modules. Such
modules are called user-defined modules. Developers can include functions,
classes, and variables inside these modules.
Creating a Module
def FindMax( x, y ):
if x > y:
return x
return y
def FindMin( x, y ):
if x < y:
return x
return y
def FindDigitSum(n):
total = 0
for digit in str(n):
total += int(digit)
return total
Variables in a Module
Modules consist of not only functions but also variables such as arrays,
dictionaries, and objects. Once defined in a module, these variables are
readily available and can be used in a program.
This code must be saved as vehiclelist.py in the same folder as the main
program and imported into the main program. Figure 4.16 shows the code to
perform this.
Reloading a Module
reload <modulename>
The modulename parameter is the name of the module that was already
imported and must be reloaded. The reload function is found in the
importlib module. Hence, the importlib module must be imported into the
main program.
The dir function lists all the attributes and methods of an object. The syntax of
this function is:
dir <objectname>
Here, the object name is not specified. Hence, the dir function lists all the
attributes in the current object. This function can be used to inspect objects
while debugging programs.
4.4 Packages
As the name suggests there can be packages for specific purposes such as
Employeedata_package which deals with information about employees of an
organization. All modules, functions, and variables declared inside the
package help in retrieving or storing data pertaining to employees. Creating
packages in Python improves code readability and scalability.
The first step is to create a folder named Calpackage. Inside this folder, an
empty file named __init__.py is created. This is to indicate that Calpackage
is a package and all modules under this package can be imported. Module
files LargeThree.py and Evenodd.py are created and stored under the
Calpackage folder.
Figure 4.20 shows the code for the module, Largethree.py. The function,
maximum_two, takes two arguments and returns the largest among the two.
The function, maximum_three, takes three arguments and returns the largest
among the three.
Figure 4.22 shows the folder structure of the created package and the folders
under it.
Once a package is installed, the functions in the modules that are part of the
package can be executed by importing the package.
Figure 4.23 shows the code to import the Largethree module from the
Calpackage. After importing the module, the code uses the maximum_three
function from the module.
li = [13,28,29,37,38,41]
//insert code
print(final_list)
Which option should be inserted in the given code to get the output as
[13, 29, 37, 41]
Import numcal
//insert code
a. print(“Product is:”,Num_product(20,45))
b. print(“Product is:”,numcal.Num_product(20))
c. print(“Product is:”,Num_product(20,45).numcal)
d. print(“Product is:”,numcal.Num_product(20,45))
Import colorslist
//insert code
print(“my color is:”, likecolor)
a. likecolor=colorlist.colors[2]
b. likecolor=colorlist.colors(3)
c. likecolor=colors[2]
d. likecolor=colors.colorlist[3]
a. __inits__.py
b. __init_core__.py
c. __init__.py
d. __core_init__.py
1 a, b, c
2 b
3 d
4 a
5 c
5. Import a function called shuffle from the module named random and
shuffle the elements in the given lists. Given that num_list=[1, 3, 5,
9] and word_list = [‘rose’, ’lily’, ’jasmine’, ’daisy’].
File and exception handling are two important aspects of any programming
language. File handling allows you to read and write data from and to files,
such as text files, XLS files, CSV files, and JSON files. Exception handling allows
you to handle errors and exceptions that may occur during the execution of a
program, such as syntax, logical, and runtime errors. In this session, you will learn
how to use the built-in functions and keywords for file and exception handling
in Python.
Input and output of files in Python depend on the type of file. There are two
types of files in Python:
When opening a file, you must specify the mode in which you wish to open it.
Table 5.1 lists various access options available when opening a file in Python.
Mode Description
x Opens a new file and throws an error if the name already exists
r Opens the read-only version of an existing file, with the pointer in the
beginning
w Opens the write-only version of a file
t Opens a file in text mode
b Opens a file in binary mode
a Opens an existing file in the append mode to add content retaining
the existing content
r+ Opens the read-and-write version of an existing file, with the pointer
in the beginning
rb Opens the read-only version of a file in binary format, with the
pointer in the beginning
For all the write access modes, if a file with the same name exists, the file is
overwritten; otherwise, a new file is created. For all the append access modes,
if no file exists with the name provided, a new file is created.
The programming language Python is frequently used for data analytics and
includes several built-in file operations. Major file operations include creating a
file, writing data into the file, reading data from the file, and closing the file
when no longer required.
Create a file
The filepointer is just a variable that points to the position of the cursor in
the opened file. The initial position is 0. The file path can be of one of two
types:
Absolute Path The whole directory list required to find the file is
contained in an absolute path.
To create a file, you must pass the file name and access mode to the open
method. The purpose of opening a file is specified through the access
mode.
Figure 5.1 shows the Code Snippet to open a file using the x access mode.
Note that the file created is stored in a file pointer variable. This variable is
Figure 5.2 shows the folder structure which lists the newly created file.
Figure 5.3 shows the code to create a file using the w access mode. The
string ‘Welcome to Python programming Lab’ is written to the file using
the write method. Finally, the file is closed.
You can open a file in read-only mode using the r access mode in the open
method. This will allow you only to view the contents of the file. For example,
the command to open the welcomepython.txt file in read-only mode is:
fp = open("welcomepython.txt", "r")
Once a file is open, the read method can be used to return the desired
number of bytes.
Figure 5.5 shows the code to open a file in read-only mode and display its
contents.
In the code, the read method is used to read the contents of the file and
display it using the print method.
You can use the open method with the w access mode to open a file and
write data into it. The file will start with the cursor or file pointer at the
Once a file is open, the write method can be used to add data to it.
Stream position and the file access mode determine where the newly
added text will be placed in the file. For example, with the access mode as
w, the new text will be added at the start of the file after its original content
is erased. If the access mode is a then, the text will be added at the end of
the file which is the current stream position.
Consider the Welcomepython.txt file that has already been created and
contains the text ‘Welcome to Python programming Lab’. Figure 5.6
shows the code and the output to open the Welcomepython.txt file in
write mode and replace its contents.
Now, consider the welcomepython.txt file that has the text, ‘Today's
session is File Handling in Python.’. Let us add the text ‘ Next
session is Exception handling in Python.’ at the end of this file.
Figure 5.7 shows the code and the output to append the new text.
The with statement along with the open method can be used to open a
file.
The generic syntax to use the with statement is:
Figure 5.8 shows the code and the output to create a new file with the same
text.
The with open statement opens the welcomepython.txt file in read mode
and copies the contents of this file into the sesssionpython.txt file.
Using the + access mode in Python, you can open a file to carry out many
operations at once. Both reading and writing options in the file are enabled
when you use the r+ access mode.
Figure 5.9 shows the code and the output to open the sessionpython.txt file in r+
mode and add more content to it.
Note that the code uses the seek method to move to the starting point of
the file.
Python offers a wide range of methods that you can use with the filepointer
handle to manipulate the file object.
readlines Method
Each line in the file is returned as a list item by the readlines method. To
restrict the number of lines returned, you can use the hint option.
<filepointer>.readlines(hint)
Parameter Description
hint Specifies the number of bytes to be returned
Figure 5.10 shows the code and the output for the readlines method.
writelines Method
<filepointer>.writelines(list)
truncate Method
The truncate method in Python allows you to resize a file to the specified
number of bytes. If the number of bytes is not specified, then by default the
file will be truncated at the current position of the cursor. That is, the
contents of the file after the current position of the cursor will be removed
from the file.
<filepointer>.truncate(size)
Figure 5.12 shows the code and the output for the truncate method.
tell Method
The tell method returns the numerical value of the current position of the
file pointer.
<filepointer>.tell()
Figure 5.13 shows the code and the output for the tell method.
Figure 5.13: Knowing the Current File Position Using tell Method
seek Method
The seek method lets you change the current position in a file stream. This
method can be used to go to a specified position in the file. Once done,
the contents of the file from that position can be read or new data can be
written to the current position. Alternatively, the file can also be truncated
from the current position.
The seek method returns the new position to the file pointer after moving to
the specified location. Offset provides the location to which the current
file pointer position must move. The syntax for the seek method is:
<filepointer>.seek(offset)
Figure 5.14: Changing the Current File Position Using seek Method
The main function of the Pickle module is to serialize and de-serialize Python
object structures. When you must move Python objects from one system to
another, pickling and unpickling become crucial. Figure 5.15 depicts the
process of pickling and unpickling.
The Pickle module offers two methods: dump to write binary data and load
to read binary data.
dump Method
To use the dump method, you must import the Pickle module. You can
open a file in binary write mode and dump data into it. The dump method
converts the data object provided into binary streams before storing it in
the file pointed by the file object.
dump(data_object, file_object)
Here, file_object is the file handle and data_object is the object that
must be written to it.
Figure 5.16 shows the code to create a file with a list object serialized into a
byte stream.
load Method
The Pickle module contains the load method to read data from byte
data_object = load(file_object)
Figure 5.17 shows the code to open a file, read the binary stream from it,
and deserialize the binary streams to store in a list object.
Exceptions in Python are objects that derive from the BaseException class.
Every exception object in Python includes the type of error, a message that
clearly states information about the error, and the state of the object when the
error occurred.
Python offers the try and except blocks to catch and handle exceptions,
respectively. The syntax for exception handling is:
try:
#<statements that may throw an exception>
except <exception_name>:
#<statements to handle the exception>
Here, the try block includes the statements that may throw an error during
execution. If such an error occurs, then the statements in the except block will
be executed to handle the error. You can display a user-friendly message or
For example, if the try block contains a statement that divides a number and
the divisor evaluates to zero, then the ZeroDivisionError occurs. Figure 5.19
shows the code and the output for where the ZeroDivisionError exception
occurs.
Exception Description
KeyboardInterrupt Occurs when the user hits the interrupt key, Ctrl+C or
Delete
MemoryError Occurs when an operation runs out of memory
Given the wide range of built-in exceptions, Python facilitates the handling of
multiple exceptions for a single try block. However, if an exception occurs, only
one exception will be handled.
Consider an example where the user enters marks obtained by a student and
the total number of subjects. The code calculates the average marks obtained
by the student using the total number of subjects given by the user as the
denominator value. Code Snippet 1 shows the code for doing this.
except NameError:
print("NameError occurred. Please define the variable
or functions correctly.")
except ZeroDivisionError:
print("Can't divide by zero")
Figure 5.21 shows the output of code in Code Snippet 1 with specific inputs.
Since variable avg is incorrectly written as avge in the code, even if the user
enters proper data, NameError occurs. This exception is caught by the first
except block that says except NameError: and the message “NameError
occurred. Please define the variable or functions correctly.” is
displayed. The code can be corrected by modifying print("Average",
avge) as print("Average", avg).
Now, if the input provided for the number of subjects is zero, then the next
exception is caught as shown in Figure 5.22.
except <exception_name>:
<statements to handle the exception>
else:
<statements to be executed if there is no exception>
Consider the same example discussed for multiple except blocks. Code
Snippet 2 shows the code with the addition of the else block.
Code Snippet 2:
try:
sub1 = int(input("Enter value of subject1:"))
sub2 = int(input("Enter value of subject2:"))
num_sub=int(input("Enter total number of
subjects:"))
avg= (sub1+sub2) / num_sub
print("Average", avg)
except NameError:
print("NameError occurred. Please define the
variable or functions correctly")
except ZeroDivisionError:
print("Can't divide by zero")
else:
print("Program executed successfully")
Figure 5.24 shows the output for the execution of the else block.
The resources used in the try and the exception blocks must be freed before
the execution continues the next line of code after the try, except, and else
blocks. Such a block of code that must be executed whether an exception
occurred or not can be added to the finally block. Figure 5.25 shows the
flow of execution within the exception blocks.
The finally block is executed immediately after the try and exception
blocks. If an exception is not handled by the code, then the unhandled
exception is re-thrown after the execution of the finally block.
try:
<statements that may throw an exception>
except <exception_name>:
<statements to handle the exception>
else:
<statements to be executed if there is no exception>
finally:
<statements to be executed whether there is an
exception or not>
Consider the same example discussed for the try, except, and else blocks.
Code Snippet 3 shows the code with the addition of the finally block.
Code Snippet 3:
try:
sub1 = int(input("Enter value of subject1:"))
sub2 = int(input("Enter value of subject2:"))
num_sub=int(input("Enter total number of subjects:"))
avg= (sub1+sub2) / num_sub
print("Average", avg)
except NameError:
print("NameError occurred. Please define the variable
or functions correctly")
except ZeroDivisionError:
print("Can't divide by zero")
else:
print("Program executed successfully")
finally:
print("try-except block is completed.")
Figure 5.26: Using finally Block with try and except Blocks
The finally block will be executed even if an exception has not occurred.
Figure 5.27 shows the output for the execution of else and finally blocks.
Figure 5.27: Using finally Block with try and else Blocks
Note that the unhandled exception is thrown after the execution of the
finally block.
Often when validating user input, you might want to show a custom error
message in case of inappropriate input. For example, consider that in the
previous example, you must restrict the number of subjects entered to two. If
the user enters any integer value greater than two, you want to raise an
exception that displays the message “Total number of subjects should
be 2”. The raise statement lets you raise an exception. The syntax for the
raise statement is:
The raise statement has a single argument that indicates an exception should
be raised. This argument can either be a class that is inherited from the
Exception class or an exception object.
Code Snippet 4 shows the code that raises an exception if the user entered
any number other than two.
Code Snippet 4:
The code creates a function named average. This function accepts three
parameters. The first two parameters are the subject marks and the third
parameter is the number of subjects. Figure 5.29 shows the output for this code.
The first call to the function executes normally without raising the exception.
The second and the third calls to the function raise the exception and display
the error message.
You can build a custom exception and use it to validate data. Consider that
you want to validate that the subject marks entered by the user are positive
values. If a user enters a negative value, then you can raise a custom
exception such as InvalidMarksError. Such exceptions defined by the
developer and raised through code are called user-defined exceptions or
custom exceptions.
You must define a new class to define a custom exception. The new custom
exception class must derive from the Exception class instead of the
BaseException class.
class <CustomError>(Exception)
<statements>
Pass
try:
...
except CustomError:
...
Note that the user-defined CustomError class inherits from the Exception
class. The pass statement marks the end of the custom exception code.
Code Snippet 5 shows the code for a custom exception. This exception is raised
if the user enters a negative value for the marks.
Code Snippet 5:
class InvalidMarksError(Exception):
"Raised when a negative value is entered for
the marks"
pass
try:
sub1 = int(input("Enter value of subject1:"))
sub2 = int(input("Enter value of subject2:"))
if (sub1 | sub2)< 0:
raise InvalidMarksError
else:
print("Valid numbers")
except InvalidMarksError:
print("Exception occurred: Invalid Mark")
1. Which of the following file mode allows you to open the binary version of a
file to write/read data and replace current data?
a. wb
b. wb+
c. ab+
d. ab
2. Which of the following option allows you to create an empty text file named
f1.txt, provided f1.txt does not exist?
a. fp = create('f1.txt', 'w')
b. fp = open('f1.txt', 'b')
c. fp = open('f1.txt', 'x')
d. fp = create('f1.txt', 't')
3. Which of the following statements are true about opening a file using the
with statement?
4. Which of the following option is the correct syntax of the seek method?
a. file.seek(list)
b. file.seek(hint)
c. file.seek(size)
d. file.seek(offset)
a. try
b. except
c. finally
d. else
1 b
2 c
3 a, b
4 d
5 c
2. Consider that you are accessing an element list1[5] from the list
list1 which is out of range. Use exception handling in Python to
handle the IndexError exception. Given that the list, list1 = [11,
77, 88, 33, 66].
This session will provide you with a solid understanding of regular expressions,
their practical usage in Python, and the ability to apply them effectively for
text manipulation and pattern matching.
Regular Expressions, often called regex, serve as a powerful tool for defining
patterns used in searching, manipulating, and replacing strings. With regex,
you can efficiently match and find specific strings or sets of strings using a
specialized syntax known as pattern.
By leveraging regex, you can save considerable time in various scenarios that
involve text processing and manipulation such as:
So, what are regular expressions made of? How are they used? Let us explore.
Metacharacters
Regular
Expressions
Special Character
Sequences Classes
However, if you want to locate a string starting with a specific set of characters
or a string that contains a given pattern, then you must use metacharacters. In
Python, metacharacters are special characters that impact the interpretation
of regular expressions that include them. They do not match themselves, but
rather signify specific rules. Special characters such as |, +, and * are
considered metacharacters, also referred to as operators, signs, or symbols in
regular expressions.
Table 6.2 lists some of the special sequences in Python regular expressions with
their purpose. These special sequences are formed using a \ (backslash)
followed by a character as shown in the table.
Special Description
Sequence
\A Matches pattern only at the start of the string
import re
The functions in the re module must be supplied with arguments such as the
search pattern, the string to be searched and optional flags. These optional
flags provide additional control when applying regular expression patterns. The
use of optional flags allows for the utilization of different features and syntax
variations. This enables developers to customize the behavior of regular
expression operations based on their specific requirements.
Consider that you want to search for a word within a string using regular
expressions. You can include the re.I (where I stands for ignore case) flag as
Table 6.4 lists some of the optional flags available for regular expression
methods in Python.
The re.findall method from the re module lets you search the target string
with a regex pattern. This method scans the entire string and retrieves all the
matches encountered, returning them as a list for further processing and
analysis. The syntax for the findall method is:
In this syntax:
• pattern is the regular expression pattern to find in the string
• string is the target string
The re.findall method scans the target string from left to right, as specified
in the regular expression pattern, and returns all matches encountered in the
order they were found. Consider that you want to find all the numbers in a
string. The regex pattern that you will use to do this is \d+.
Code Snippet 1 shows the code that uses the findall method to search for
all the numbers in the given string.
Code Snippet 1:
import re
Code Snippet 2:
import re
Code Snippet 2 displays the two three-letter words that are at the beginning
of a new line in a list.
You can use the findall method to search for a sequence that starts with a
specific text followed by zero or more characters. To do this, you must make
use of the asterisk (*) metacharacter. In Python, when * appears within a
pattern, it signifies that the preceding expression or character should repeat 0
Code Snippet 3 shows the code that uses the findall method to match a
sequence that starts with re followed by zero or more characters.
Code Snippet 3:
import re
The code in Code Snippet 3 displays the string starting from ‘re’ up to a white
space character at the end.
The finditer method returns an iterator that yields matching objects if the
search is successful. In cases where no match is found, the method still returns
an iterator, but does not yield any match objects.
Consider that you want to search for all vowels in a given string. The regex
pattern to do this is r’[aeiou]’. Code Snippet 4 shows the code that uses the
finditer method to find all the vowels in the string ‘Computer Languages’.
Code Snippet 4:
import re
s = 'Computer Languages'
vowelmatch = re.finditer(r'[aeoui]', s)
The search method returns a match object that contains two elements:
• The tuple object contains the start and end index of a successful match
• An actual matching value that you can retrieve using a group method
If the search method is unable to find the desired pattern or if the pattern does
not exist within the target string, the method returns None.
Consider that you want to search for all vowels in a given string. The regex
pattern to do this is r’[aeiou]’. Code Snippet 5 shows the code that uses the
finditer method to find all the vowels in the string ‘Computer Languages’.
Code Snippet 5:
import re
s = 'Computer Languages'
vowelmatch = re.finditer(r'[aeoui]', s)
Consider an example to search a word in the given string. Code Snippet 6 lists
the code to search for ‘Java‘ in the given string: ‘Among the programming
languages, JAVA is a high-level and class-based language. Java
is an object-oriented program.’. The re.I or re.IGNORECASE flag can
be used with the search method to enable case-insensitive searching of the
regex pattern.
Code Snippet 6:
import re
The group method of the match object is used to display the word found. To
match any character instead of a specific word, you can use the ‘.’
metacharacter in the regular expression. Code Snippet 7 shows the code for
searching any character in a given string.
import re
The code matches and returns the first character. If you add a ‘+’ sign to the
‘.’ metacharacter as in re.search(r'.+', strn), then the search will be
made for zero or more repetitions of the same pattern. Here it will be for any
character. Figure 6.9 shows the output for the same.
If you want the ‘.’ to match the newline character as well, use
the re.DOTALL or re.S flag as an argument to the search method. Thus, the
search method call will be re.search(r'.+', strn, re.S). Figure 6.10
shows the output for the same.
When the regular expression pattern matches zero or more characters at the
beginning of the string, the re.match method returns a match object. This
match object contains information about the starting and ending positions of
the match, as well as the actual matched value.
Consider that you want to find the four-letter word at the beginning of the
given string. Code Snippet 8 shows the code for this search.
Code Snippet 8:
import re
if (result != None):
print("Match word: ", result.group())
The match method will return None if there is no match. In this case, the print
statement will throw an error if the group method is called, as there is no
matching object. Therefore, the code in Code Snippet 8 checks if the result
Python regex provides the sub and subn methods, which allow for searching
and replacing patterns in a string. With these methods, you can replace one
or more occurrences of a regex pattern in the target string with a specified
substitute string. The syntax for the sub method is:
In this syntax:
• pattern is the regular expression pattern to find in the string
• replacement is the string that is to be inserted at every occurrence of
the matched pattern
• string is the target string
• count is the number of occurrences that must be replaced, the default
of which is zero indicating all the occurrences will be replaced
• flags refers to optional regex flags, the default of which is zero
indicating no flags are applied
Pattern, replacement, and string are essential arguments in the search and
replace operations using regular expressions. However, count and flags are
optional arguments that can be utilized for additional customization.
The sub method returns a new string by replacing the occurrences of the
pattern in the original string with the specified replacement string. If the pattern
is not found, the original string is returned without any changes.
Consider you want to replace all the whitespaces in a string with colon (:).
Code Snippet 9 shows the code for this replacement.
import re
print(res_str)
If you change the sub method to subn, you will also get the number of
replacements listed in the list. Figure 6.13 shows the output of Code Snippet 9
after changing res_str = re.sub(r"\s", ":", target_str) to res_str
= re.subn(r"\s", ":", target_str).
re.compile(pattern, flags=0)
import re
string_pattern = r"\d{4}"
regex_pattern = re.compile(string_pattern)
print(type(regex_pattern))
result = regex_pattern.findall(str1)
print(result)
The split method from the re module allows you to split the string into
substrings based on occurrences of the regex pattern. This split operation results
in a list of separated substrings. The syntax for the split method is:
In this syntax:
• pattern is the regular expression pattern to split the target string
• string is the target string which must be split
• maxsplit is the number of occurrences that must be split, the default
of which is zero indicating all the occurrences will be split
The regular expression pattern and the target string are required parameters
when using the re.split method. However, the maxsplit and flags are
optional parameters that can be used to customize the splitting behaviour.
The re.split method divides the target string based on the regular expression
pattern provided and returns the matches as a list. If the pattern is not found in
the target string, then the method returns the unsplit string itself as the only
element of the resulting list.
Consider you want to split the given string into the list of words matching the
whitespace character. Here, you must use the \s special sequence in the
regex along with the + metacharacter. Adding the + symbol will split the target
string on one or more occurrences of the whitespace characters. Code
Snippet 11 shows the code to achieve this.
import re
print(listofWords)
import re
target_string = "30-11-1995"
The code uses the maxsplit parameter to show the different result in each
case. Figure 6.16 shows the output of the code in Code Snippet 12.
A string can be split in multiple ways using different regex patterns. Consider
that you want to split the given string into words using the \s+ pattern and the
[\b\W\b]+ pattern. In each of these cases, the resulting list will be different.
The \b special sequence in a regex pattern matches the empty strings at the
edge boundaries of a word. The \W special sequence in a regex pattern
matches any non-alphanumeric character which is not a letter, digit, or
underscore.
Code Snippet 13 shows the code to split the given string into multiple word
boundary delimiters, resulting in a list of alphanumeric or word tokens. It also
shows the same target string split around the whitespace characters.
import re
Figure 6.17: Using the split method with Two Different Patterns
Python supports word tokenization using the split method or the findall
method. However, a limitation of using the split method for word tokenization
is that it does not treat punctuation marks as individual tokens.
import re
Figure 6.18 shows the output of the code in Code Snippet 14.
You can use the split method of the match object returned by the compile
method to tokenize the given text into individual sentences. Code Snippet 15
shows the code to split the given string into multiple sentences.
import re
a. #
b. $
c. ^
d. *
a. \d
b. \dg
c. \D
d. \Z
3. Which of the following regex patterns will split the target string on the
occurrence of one or more whitespace characters?
a. \w+
b. \s+
c. \W+
d. \S+
a. re.search
b. re.match
c. re.sub
d. re.subn
a. re.split
b. re.interpret
c. re.compile
d. re.compute
1 b
2 c
3 b
4 a
5 c
Open Jupyter notebook and perform the given tasks using regular
expressions:
c. Using the string str, write a Python program to check whether the
given string str contains the word “times”.
Flask is a compact and lightweight Python Web framework that facilitates the
development of Web applications with the help of useful tools and features. A
Web framework is a collection of libraries and tools that enables you to quickly
create Web applications without the need for writing code from scratch. In this
session, you will learn how to create Web applications in Python using the Flask
framework.
To install Flask, use Python 3.8 or a higher version. You must create virtual
environments to isolate and manage multiple Python projects that use different
versions of Python libraries.
Each virtual environment acts as an independent group of Python libraries for
the respective Python project. Therefore, a Flask framework installed on one
virtual environment will not affect the Flask framework installed on another
virtual environment.
There are various tools for creating virtual environments. One such tool is
virtualenv. To use this tool, you must first install it.
In the syntax, [virtual directory] is the name that you provide for the
virtual environment being created.
Figure 7.3 shows the creation of a virtual environment named env within the
flask_app directory. Note that the virtual environment creation creates a
root directory named env and other installation sub-directories. It generates
a batch file named activate.bat in the path env\Scripts.
Install Flask
Once the virtual environment is activated, the next step is to install Flask.
To install Flask, run the given command within the virtual environment.
Figure 7.5 shows the installation of Flask within env using the pip installer
package.
Consider that you want to build a sample Web application in Python using Flask
that displays a message on the browser.
Code Snippet 1:
if __name__ == '__main__':
app.run(debug=True)
python sample_app.py
Figure 7.6 shows the execution of the sample_app.py application and the
displayed output.
Code Snippet 1:
Imports the Flask object from the flask package.
from flask import Flask
app = Flask(__name__)
@app.route('/')
def fun_print():
return "Website created using
Python Flask"
Note that the Flask class's route function is a decorator that binds a
URL to a Python function. The route function converts the Python
function into a view function. The view function in turn converts the return
value into an HTTP response to respond to the incoming requests.
app.route(rule, options)
In the syntax, the rule parameter represents the URL string to be bound
with the Python function. options represents a list of parameters to be
sent to the underlying rule object.
Uses the run method to launch the application on the local
development server. By setting debug=True, the code enables the
debug mode to display detailed error messages if the application
encounters any error.
if __name__ == '__main__':
app.run(debug=True)
All the parameters in the run method are optional. Table 7.1 describes
the parameters of the run method.
<variable_name>
Optionally, you can use a converter to specify the type of the argument as
in <converter:variable_name>. Table 7.2 lists the converter types that can
be included in the URL generation.
Datatype Description
string accepts any text without a slash
int accepts positive integers
Code Snippet 2:
@app.route('/welcome/<labname>')
def fun_welcome(labname):
return "Welcome to %s Lab" % labname
if __name__ == '__main__':
app.run(debug = True)
Consider that a library has three logins namely, Student, Faculty, and Guest.
Code Snippet 3 demonstrates the use of url_for function.
@app.route('/guest/<gid>')
def fun_guest(gid):
return "Welcome to the Department of Computer Science
Library. You have logged in as Guest with id: %s" %gid
@app.route('/student/<sid>')
def fun_student(sid):
return "Welcome to the Department of Computer Science
Library. You have logged in as Student with id: %s" % sid
@app.route('/faculty/<fid>')
def fun_faculty(fid):
return "Welcome to the Department of Computer Science
Library. You have logged in as Faculty with id: %s" % fid
@app.route('/Library/<login>/<id>')
def fun_library(login,id):
if login =='guest':
return redirect(url_for('fun_guest',gid=id))
elif login=='student':
return redirect(url_for('fun_student',sid=id))
elif login=='faculty':
return redirect(url_for('fun_faculty',fid=id))
if __name__ == '__main__':
app.run(debug = True)
You can save the file as library_login.py and run the Web application
using the command:
python library_login.py
Figure 7.10, Figure 7.11, and Figure 7.12 show the outcome of Code Snippet 3.
Based on the login value, the fun_library function makes a call to the
corresponding view function fun_guest, fun_student, or fun_faculty using
the if-elif condition. The variables, gid, sid, and fid are passed as
parameters to the respective view function in the url_for method.
The render_template auxiliary function from Flask allows the use of the Jinja2
@app.route('/')
def homepage():
return render_template('welcome.html')
if __name__ == "__main__":
app.run(debug=True)
<html>
<head>
<style>
h1 {
border: 2px #eee solid;
color: brown;
text-align: center;
padding: 10px;
}
</style>
</head>
<body>
<h1>Welcome to Python Programming Lab</h1>
</body>
</html>
Figure 7.14 shows the output of typing the URL http://127.0.0.1:5000/ in the
browser.
Table 7.3 lists the various HTTP methods available to transfer data between
HTML pages.
Methods Description
When you use an HTML template to render a page, the template can contain
placeholders for variables and expressions that are substituted with their actual
values. This is done in the Flask script by changing the route decorator method.
By default, the route method in Flask responds to HTTP get requests. However,
this can be changed by using a method list as an argument to the route
decorator method.
Let us write a Flask script that uses HTTP get method to retrieve user input from
a Web form on the welcome.html page. The data retrieved is then displayed
on the sessiondisplay.html page using HTTP post method.
Here,
In this case, form data may be processed or retrieved by utilizing the POST or
GET parameters of the flask request method. The request object has to be
imported as given in Code Snippet 5a.
<html>
<head>
<title>Welcome</title>
<style>
h1 {
border: 2px #eee solid;
color: brown;
text-align: center;
padding: 10px;
}
</style>
</head>
<body>
<form action="/sessiondisplay" method="post">
<h1>Welcome to Python Programming Lab</h1>
<div style="text-align:center">
<label>Enter the current Python programming
session:</label>
<input type="text" name="cursession">
<input type="submit" value="Submit">
</div>
</form>
</body>
</html>
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Current Session</title>
<style>
h3 {
font-family: arial;
font-size: 16pt;
color: blue;
text-align: center;
}
</style>
</head>
<body>
<h3>Current Session in Python Programming Lab
is:{{csession}}</h3>
</body>
</html>
Here,
The HTTP post method is used to specify an action in the welcome.html
file. When the post event occurs, /sessiondisplay is appended to the
URL.
A session name is accepted and passed on to the Flask script through the
input variable, cursession. By giving this variable name of the input as a
parameter to the request.form.get function, you can retrieve the HTML
input from a form. The value received is stored in a variable, cs.
The value of this cs variable is shown in the placeholder for csession in the
sessiondisplay.html file.
Enter the session name and click Submit. The sessiondisplay.html page is
rendered as shown in Figure 7.16.
Let us now create a small Web application for an online bookshelf, BooksOnly.
There are three categories of books available in this library as shown in Figure
7.17.
This Web application contains Python scripts including Flask scripts, HTML files,
images, and Cascading Style Sheet (CSS) files. You must create the application
folder structure as shown in Figure 7.18 which contains various files as listed.
Flask can also be used to deliver media files such as text, Portable Document
Formats (PDFs), audio, video, and image files. The /Static folder can be
utilized for storing these files. Here, in the Images folder under Static, you will
store the book images as in /Static/Images/book1.jpeg. You can then use
relative file paths to link to these static files. However, it is recommended that
you construct absolute URL references to static files using the url_for function.
To reference the static files, pass the directory name, in this case, Static, and
the keyword parameter filename=, followed by the name of your static file,
For example, to refer to an image from the Static folder use the code given
in Code Snippet 6.
Code Snippet 6:
<img src="{{url_for('static',
filename='example_image.png')}}">filename='example_image.pn
g')}}">
<html>
<head>
<link rel="stylesheet" href="{{url_for('static',
filename='css/book.css')}}">
</head>
<body>
<form action="/book_cat" method="post">
<h1>Welcome to the Bookshelf</h1>
<div class="content">
<div class="left-col">
<img src="{{url_for('static',
filename='images/book1.jpeg')}}" class=image />
</div>
<div class="right-col">
<button type="submit" value="submit">BOOKSONLY</button>
</div>
</div>
</form>
</body>
</html>
A button, BooksOnly, which when clicked will take you to the next
page, bookcat.html. This is achieved through the HTTP post method
from the action attribute of the HTML form object.
<!DOCTYPE html>
<html>
<head>
<link rel="stylesheet" href="{{url_for('static',
filename='css/bookcat.css')}}">
</head>
<body>
<div class="container">
<div class="right">
<h1>Book Categories</h1>
<table align="center" style="width:50%">
<form action="/book_thriller" method="post">
<tr>
<td align="center"><a
href="{{url_for('fun_thrill')}}">Thriller</a></td>
</tr>
</form>
<form action="/book_fiction" method="post">
<tr>
<td align="center">Fiction</td>
</tr>
</form>
<form action="/book_adult" method="post">
<tr>
<td align="center">Young Adults</td>
</tr>
</form>
</table>
</div>
<div class="left">
<img src="{{url_for('static', filename='images/book2.png')}}"
class=image /> </div>
</div>
</body>
</html>
The next page is the book_thriller.html page. Code Snippet 9 lists the
code for this page.
Code Snippet 9:
<!DOCTYPE html>
<html>
<head>
<link rel="stylesheet" href="{{url_for('static',
filename='css/bookstyle.css')}}">
</head>
<body>
<form action="/book_back" method="post">
<div class="content">
<div class="container">
<div class="column1">
<h2>Thriller</h2>
<table class="center">
<tr>
<th>Image</th>
<th>Title</th>
<th>Author</th>
<th>Price</th>
<th>Age group</th>
<th>Description</th>
</tr>
<tr>
<td><img src="{{url_for('static',
filename='images/secret.png')}}" class=image /> </td>
<td>The Secret of The House On The Hill</td>
<td>Julia Harris</td>
<td>$10</td>
<td>15+</td>
<td style="width: 500px;height: 100px">Emily
Mayweather, a 24 year old florist, was now on the search for
Similarly, you can easily create the HTML pages for the other two categories.
Let us now create the binding Python script that will make the application work.
Code Snippet 10 lists the code for the app_books.py script file.
@app.route('/')
def homepage():
return render_template('books.html')
@app.route("/book_thriller")
def fun_thrill():
return render_template('book_thriller.html')
if __name__ == "__main__":
app.run(debug=True)
In this script:
The request method is imported to use both the post and get HTTP
methods in the fun_cat view function.
Now, the application is ready. However, before viewing the output let us add
three CSS pages to the Book_details\static\css folder as mentioned
earlier in Figure 7.19.
Code Snippet 11 shows the book.css code for styling the books.html page.
.image {
display: block;
margin-left: auto;
margin-right: auto;
width: 40%;
}
button {
background-color: #05c46b;
border: 1px solid #777;
border-radius: 2px;
font-family: inherit;
vertical-align: middle;
font-size: 36px;
display: block;
}
.content {
width: 100%;
position: absolute;
top: 10%;
}
.left-col {
margin-top: 5%;
}
.right-col {
float: right;
margin-right: 10%;
margin-top: -5%;
display: flex;
align-items: center;
}
body {
background-color: lightyellow;
}
h1 {
border: 2px #eee solid;
color: brown;
text-align: center;
padding: 10px;
}
Code Snippet 12 shows the bookcat.css code for styling the bookcat.html
page.
table, th, td {
border: 1px solid;
font-size: 30px;
}
body {
background-color: lightblue;
}
container {
display: grid;
grid-template-columns: 100px 150px 200px;
text-align: center;
}
.column1 {
float: left;
width: 90%;
padding: 10px;
height: auto;
background-color: lightblue;
}
All the files are ready in their respective folders. To run the application, create
the virtual environment and run the activate.bat script in command prompt.
Run the command python app_books.py as shown in Figure 7.20 to start the
application.
Click BOOKSONLY button to open the Book Categories page shown in Figure
7.22.
Click Back to Book Categories link to navigate to the Book Categories page
as shown in Figure 7.22.
2. Which of the following options is the default port number on which the
Flask application runs?
a. 127
b. 187
c. 500
d. 5000
a. url_for
b. for_url
c. url_flask
d. url_forFlask
4. Which of the following code will manually reload the application after
each code modification?
a. app.run(debug = True)
b. app.route(debug = True)
c. app.route(DEBUG)
d. app.run()
5. Which of the following assets are stored in the static folder during Web
development in Python?
a. CSS files
b. JavaScript files
c. Images files
d. HTML files
1 c
2 d
3 b
4 a
5 a, b, c
1. In the command prompt, install the virtual environment and activate it.
Within the activated environment, install Flask.
3. Consider that you have created a Web application in Python Flask for a
school where the students and staff can log in. Write a Python program
to generate dynamic URL bindings in Flask.” On the other hand, if the
user executes the application as localhost/staff, then the
application should redirect to a function to display the message “This
is staff login.”.
Web scraping is the process of extracting data from Web pages for various
purposes that include data collection, price monitoring, and academic
research. Python provides several libraries that can be used for Web Scraping.
This session will provide an overview of Web scraping along with the rules to be
followed while Web scraping. It will explore various Web scraping libraries, their
installation, and the process of implementing Web scraping in Python.
Web scraping is a method used to extract data from Web pages. Web
scraping is also known as Web harvesting and Web data mining. Data from
news Websites, social networking platforms, and e-commerce Websites can
be collected using Web scraping. Web scraping helps in acquiring market
data, monitoring competitor prices, and gathering data for research. Though
these processes can be done manually, automation helps in faster retrieval of
data.
• Static Web scraping: A method of data extraction from Websites that does
not change frequently is called static Web scraping.
Python, JavaScript, C++, Java, and Perl are some of the languages available
for writing code to automate Web scraping. Python is the most preferred
language for Web scraping due to its usability, extensive library of modules,
and simple syntax that makes the task of scraping easier. Data science,
corporate intelligence, and investigative reporting are some of the areas that
benefit greatly from Web scraping.
Some of the libraries used for Web scraping are Scrapy, Beautiful Soup, and
Requests. These libraries provide tools for data extraction that are incredibly
quick and effective.
The legality of Web scraping is determined by various factors that include the
location of the scraper, the terms of service of the Website, and the purpose
of scraping. A developer must analyze all these factors and then proceed with
Web scraping if it complies with applicable laws and terms of service.
When scraping Web pages, the ethical and legal factors that must be
considered are:
● Terms of service: To make sure that scraping is not forbidden, it is vital to
check the terms of service of the Website that is aimed to be scraped.
While some Websites forbid scraping, others might allow it with
restrictions. Public Websites may allow scraping while private ones do
not as they may have sensitive data.
● Copyright: Copyright regulations are to be followed for Websites holding
documents or contents that have rights over making copies. Steps to
follow copyright protocols include requesting permission from the
copyright holder or making fair use of the material.
● Effect on Website performance: The Website's performance may be
affected if it is scraped frequently. The Website's servers may become
overloaded or experience other issues if it is excessively scraped.
The robots.txt file is a text file placed on the Website by the Webmasters
and Website owners. This file provides instructions regarding permission for
scraping and the frequency at which scraping is allowed. Web scraping
activities must be performed ethically by adhering to guidelines.
Libraries are built-in functions that can be used ready-made. The popular Web
scraping libraries in Python are:
Beautiful
Requests lxml Scrapy Selenium
Soup
8.2.2 Requests
lxml is a simple yet effective Python library used for parsing HTML and XML
documents. lxml works best in combination with Requests to scrape data from
Web pages. It also enables usage of Cascade Styling Sheet (CSS) and XML
Path Language (XPath) selectors to retrieve data from HTML and XML. It is best
suited for scraping huge databases with structured data and complex
documents.
8.2.4 Scrapy
Web crawling refers to the process where bots browse the World Wide Web to
discover and index Web pages. Scrapy is a framework made for Web crawling
and Web scraping. Web scraping tasks include making HTTP requests, indexing
links, fetching data, and processing the fetched data. The strength and
scalability offered by Scrapy make it ideal for performing various Web scraping
tasks.
8.2.5 Selenium
Consider that you want to extract data from the book_thriller.html Web
page that was built using Code Snippet 9 of Session 7. The extracted data must
be stored in an Excel file. Let us use the libraries, Request and Beautiful Soup,
to scrape the content.
The first step of Web scraping is to identify the URL of the page to be scraped.
To identify the URL of the Web page to be scraped:
2. Copy the URL of this Web page from the Address bar as shown in Figure
8.3.
Having identified the URL of the Web page to be scraped, the next step is to
inspect the page. Inspection helps you to understand the structure of the HTML
file. In an HTML file, the data is layered within tags. Inspection of the source
code of the HTML file must be done to locate the tag which holds the required
information.
After examining the HTML source, the next step is to write a code to extract the
data. The Requests library of Python offers a variety of predefined functions
that facilitate interaction with Web pages by means of HTTP requests, including
get, post, put, patch, and head requests. HTTP requests can be used to fetch
data from a specified URL or to push data to a server. The get method is
specifically used to fetch information.
Steps to be performed to accomplish this task are:
1. The get method of the Request library helps to get data from the server
using the URL. Code Snippet 1 lists the code to get data from the
http://127.0.0.1:5000/book_thriller Web page.
import requests
res_obj =
requests.get('http://127.0.0.1:5000/book_thriller')
print(res_obj)
print(res_obj.content)
Figure 8.6 shows the code and the result of the execution of the code.
2. To parse this raw HTML code into meaningful information after obtaining
the page's HTML, execute the code in Code Snippet 2.
Code Snippet 2:
import requests
res_obj =
requests.get('http://127.0.0.1:5000/book_thriller')
print(res_obj)
print(res_obj.content)
The command executes as shown in Figure 8.7. The status of this request is
displayed as <Response [200]>. The number 200 signifies that the request
operation was successful.
The HTML form of the Web page that was fetched in the previous step must be
arranged in a legible form. The prettify function of the Beautiful Soup library
helps in formatting the HTML document by adding proper indentations,
resulting in a more readable structure. Code Snippet 3 shows the code for
prettifying the HTML.
Code Snippet 3:
import requests
from bs4 import BeautifulSoup
res_obj =
requests.get('http://127.0.0.1:5000/book_thriller')
print(res_obj)
soup = BeautifulSoup(res_obj.content, 'html.parser')
print(soup.prettify())
The Beautiful Soup library is called with the response object and the
'html.parser' string and the result is stored in the soup object. Execution of
the prettify function on the soup object makes the HTML, readable. Figure
8.8 shows the output of executing the prettify method.
You can observe from Figure 8.8 that data is stored in HTML table where data
is stored in rows and columns. The next step is to extract each row from the
HTML table. The find function and the find_all function help in achieving
this. Code Snippet 4 shows the code to extract the rows from the HTML code.
Code Snippet 4:
import requests
from bs4 import BeautifulSoup
res_obj =
requests.get('http://127.0.0.1:5000/book_thriller')
print(res_obj)
soup = BeautifulSoup(res_obj.content, 'html.parser')
result=soup.find('table').find_all("tr")
print(result)
The <th> tag refers to a header cell of the HTML table, <tr> tag refers to a
row in the HTML table and <td> tag refers to a table cell that contains data.
The code in Code Snippet 4 finds all <tr> tags and extracts each row from the
HTML table. Figure 8.9 shows the output of execution of this code.
After extracting data from each row of the HTML table, the data in each cell
must be read. This is achieved by iterating over each row and reading each
cell in the row. Scraping is done from the second row as the first row contains
only the headers. The code in Code Snippet 5 performs this task. In this code,
result is a sequence object which holds all the rows. The for loop iterates
over each row, reads all the cells in the row, and prints the data in each cell.
Code Snippet 5:
import requests
from bs4 import BeautifulSoup
res_obj =
requests.get('http://127.0.0.1:5000/book_thriller')
print(res_obj)
soup = BeautifulSoup(res_obj.content, 'html.parser')
result=soup.find('table').find_all("tr")
for re in result[1:]:
cells=re.find_all(['td'])
print(cells)
It can be noted from Figure 8.10 that the data is extracted with tags. The data
must be cleaned up to remove these tags, unwanted spaces, and images.
Code Snippet 6 picks and prints only the text in the cells.
Code Snippet 6:
import requests
from bs4 import BeautifulSoup
res_obj = requests.get('http://127.0.0.1:5000/book_thriller')
print(res_obj)
soup = BeautifulSoup(res_obj.content, 'html.parser')
result=soup.find('table').find_all("tr")
for re in result[1:]:
cells=re.find_all(['td'])
celltext=[cell.get_text(strip=True) for cell in
cells[1:]]
print(celltext)
The code in Code Snippet 6 extracted the data in the cells and printed it on
the screen. The data extracted from the Web page must be written into a file.
To store the data in an Excel file, the Openpyxl library must be installed and
imported. A few lines of code must be introduced to Code Snippet 6 to store
the extracted data into an Excel file. The lines of code that must be included
are:
Code Snippet 7 has these steps included in bold along with the code in Code
Snippet 6. To install the Openpyxl library, in the command prompt, run the
command as shown:
Code Snippet 7:
sheet=excel.active
sheet.title="Thriller-books"
sheet.append(['Title', 'Author','Price','Age group',
'Description'])
res_obj = requests.get('http://127.0.0.1:5000/book_thriller')
print(res_obj)
soup = BeautifulSoup(res_obj.content, 'html.parser')
result=soup.find('table').find_all("tr")
for re in result[1:]:
cells=re.find_all(['td'])
celltext=[cell.get_text(strip=True) for cell in
cells[1:]]
sheet.append(celltext)
excel.save("thriller.xlsx")
print("Successfully Scraped and saved the contents")
The Excel file gets created in the current working directory. To find the current
working directory, in the Jupyter notebook, run the command given in Code
Snippet 8.
Code Snippet 8:
import os
os.getcwd()
Open the thriller.xlsx file to see the scraped data from the Web Page.
Figure 8.16 shows the content of the thriller.xlsx file.
1. Which of the following Web scraping libraries enables users to send HTTP
requests to gather information from Web sources?
a. Beautiful Soup
b. Requests
c. HTTP Request
d. lxml
a. Selenium
b. Scrapy
c. Beautiful Soup
d. Playwright
4. Which line of code is used to retrieve information from the given server
using a given URL?
a. res_obj = requests.get(url)
b. res_obj = get_requests(url)
c. res_obj = get_httprequest(url)
d. res_obj = httprequests.get(url)
a. Pythonpyxl
b. Openpyexel
c. Openpyxl
d. Pythonexcel
1 b
2 c
3 a, b, c
4 a
5 c
a. Navigate to the given Website and inspect the page to scrape its
contents.
b. Write Python code to retrieve information from the given URL using
the request library.
c. Write Python code to create a Beautiful Soup object to parse the
raw HTML code which is obtained using the request library.
d. Write Python code to extract the Web page title without the tag.
e. Write a Python code to extract and display the header tags h2
and h3.
f. Write Python code to store the extracted contents namely, the
title and the header tag contents h2 and h3 into an Excel file,
header.xlsx.
This session aims to familiarize you with application development using Tkinter.
It introduces the various widgets available as part of the Tkinter library along
with their properties. It also explains the process of creating a simple
application using the Tkinter widgets.
Tkinter is a standard GUI library toolkit. It comes with the standard library of
Python. If Tkinter is not installed, it can be installed using the command:
The library should be imported into the code to make use of its features. You
can then, create instances of the objects in the library to facilitate rapid
application development. Tkinter is used more to create desktop applications
rather than Web applications.
9.2 Widgets
Widgets are controls which make the presentation in GUI possible. They are
similar to elements in a Hypertext Markup Language (HTML) page. You can use
widgets in your applications to facilitate users to enter details, make choices,
or read some information. Thus, widgets make the end-user experience with
the application very smooth and easy. Radio buttons, text boxes, and lists are
some examples of widgets. Figure 9.1 lists some of the widgets available in
Python.
Widget Description
Button Displays buttons in applications that perform the
specified action when clicked
Canvas Provides space to draw shapes, such as lines, ovals,
polygons, and rectangles in applications
Checkbutton Allows the user to select multiple options from a list of
options
Entry Accepts a single-line of text from a user
Here,
Tkinter consists of some common properties and methods that are applicable
to all the widgets. Color, font, and dimensions of the widgets can be set using
these common properties. The three common methods included in the Tkinter
geometry manager are:
9.2.1 Label
The Label widget in Tkinter serves the purpose of displaying text and images.
A label on a product carries information about the product such as its price,
date of manufacturing, and so on. Similarly, a label widget displays information
on the screen. For example, you can use a label to display the current time on
a screen.
Label_name is the name of the label created. Label is the keyword to create
the label and root is the window which holds the label. Table 9.2 lists the option
values that can be configured for a Label widget.
Option Description
anchor Controls the position of text in the widget where
the default value is CENTER
bd Sets the border width of the widget where the
default is 2 pixels
bitmap Sets the bitmap to the graphical object specified
bg Sets the background color of the widget
cursor Specifies the type of cursor to show when the
mouse is moved over the label
fg Specifies the foreground color of the text written
inside the label
font Specifies the font type of text inside the label
height Specifies the height of the label
image Indicates the image shown as the label
justify Specifies the alignment of multiple lines in the
label
padx Indicates the horizontal padding of the widget
pady Indicates the vertical padding of the widget
relief Indicates the 3D style of border
text Sets the text to display
textvariable Helps in updating text of label by updating the
variable
underline Helps to include an underline for a specific part
of the text
width Sets the width of the widget
wraplength Specifies the number of characters after which
the text must be wrapped
1. Import the Tkinter library and create an instance to use it further in the
code. You can do this using the code:
import tkinter as tk
2. Create a GUI application main window. You can do this with the code:
root = tk.Tk()
Here, root is the container variable in which you will create a Label
widget to display the welcome message.
3. Create and place the Label widget to the container root using the
code:
Here, message is the widget name, root is the container, and the text
option of the widget is set to the string to be displayed. Tkinter toolkit uses
the geometry managers to position the widgets on the container. Here,
the pack method of the widget geometry manager is used to place the
label on the container.
4. Enter the main event loop to display the message. This event loop informs
the code to display the widget with the message until you manually
close the window. The code to enter the main event loop is:
root.mainloop()
Code Snippet 1 shows the complete code to create a simple GUI application
to display a welcome message.
import tkinter as tk
root = tk.Tk()
message = tk.Label(root, text="Hello, World!")
message.pack()
root.mainloop()
You can use Jupyter Notebook to run this code. Figure 9.2 shows the output
of execution of the code in Code Snippet 1.
In this example, the pack method does not include any options. However, you
can include options with the pack method to change the display of the
widgets. Table 9.3 lists some of the options that can be used with the pack
method.
Options Description
fill Fills the widgets in horizontal (X) or vertical (Y)
manner where the default is none
side Specifies which side of the container the widget
must be placed as in TOP, BOTTOM, LEFT, or RIGHT
expand Specifies if the widget should expand to fill the extra
space available
Figure 9.3 shows the output of the code if message.pack()in Code Snippet 1
is replaced with message.pack(ipadx=50,ipady=50,expand=True). The
ipadx and the ipady options help you specify the internal padding of the text
within the widget without affecting the widget position.
9.2.2 Frames
Frames are rectangular holders of the widgets. They are the containers of other
widgets that facilitate the arrangement of widgets in an order.
Frame_name is the name of the frame created. Frame is the keyword to create
the Frame. root is the window which holds the frame. Even though all the
values listed in Table 9.2 are valid for the Frame widget too, Table 9.4 lists the
option values that can be configured for a Frame widget.
Option Description
Highlightbackground Helps in denoting the color of
the background color when it is under focus
Highlightthickness Helps in specifying the thickness around the
border when the widget is under the focus
Relief Helps in specifying the type of border of the
frame which is FLAT by default
Highlightcolor Helps in representing the color of the focus
highlight when the frame has the focus
Consider that you want to divide the screen into two halves horizontally: top
and bottom. The bottom portion must be further subdivided into two vertical
halves: left and right. The frame arrangement will appear as in Figure 9.4.
Code Snippet 2 lists the code to divide the frames as shown in Figure 9.4 with
each frame displaying a text in different backgrounds.
Code Snippet 2:
import tkinter as tk
root = tk.Tk()
tframe = Frame(root)
tframe.pack()
bframe = Frame(root)
bframe.pack(side=BOTTOM)
lframe = Frame(bframe)
lframe.pack(side=LEFT)
rframe = Frame(bframe)
rframe.pack()
root.mainloop()
Figure 9.5 shows the output of execution of the code in Code Snippet 2.
9.2.3 Checkbutton
The Checkbutton widget allows the user to make single or multiple choices
from a list of options available. For example, users may often choose one or
more hobbies from a list of hobbies when entering their personal details.
The syntax of the Checkbutton widget is:
CheckButton_name = CheckButton(root,options)
CheckButton_name stands for the name of the widget. root is the parent
window. You can use many options to configure the Checkbutton widget and
these options are written as comma-separated key-value pairs. Table 9.5 lists
some of the Checkbutton widget options.
Option Description
activebackground Helps in indicating the
background color of the
Checkbutton when it is under the
cursor
command Helps in calling the scheduled
function when the state of
Checkbutton is changed
activeforeground Helps in indicating the foreground
color of the Checkbutton when it
is under the cursor
disableforeground Helps in indicating that the text of
Checkbutton is disabled
justify Helps in indicating the way the
multiple text lines are presented
variable Helps in representing the
associated variable to track the
state of the Checkbutton
offvalue Helps in setting the value of the
OFF state to another value which
is 0 by default
Table 9.6 lists some of the methods associated with the CheckButton widget.
Method Description
invoke Helps in invoking the method associated with the
Checkbutton
select Helps in turning on the Checkbutton
deselect Helps in turning off the Checkbutton
toggle Helps in toggling between the check buttons
flash Helps flashing the Checkbutton between active
and normal colors
Consider that in Code Snippet 1 you want to display a list of hobbies in the first
frame and a button in the second frame. When the user selects hobbies and
clicks the button, then the selected hobbies must be displayed on the third
frame. Code Snippet 3 provides the code to create the desired interface using
a Checkbutton.
import tkinter as tk
root = tk.Tk()
tframe = tk.Frame(root)
tframe.pack()
bframe = tk.Frame(root)
bframe.pack(side=tk.BOTTOM)
lframe = tk.Frame(bframe)
lframe.pack(side=tk.LEFT)
rframe = tk.Frame(bframe)
rframe.pack()
def show_hobbies():
if (read.get() & song.get()):
dmessage = tk.Label(rframe, text="Reading books and
Listening to songs", bg="blue", fg="white")
dmessage.pack()
elif (song.get()):
dmessage = tk.Label(rframe, text="Listening to
songs", bg="blue", fg="white")
dmessage.pack()
elif (read.get()):
dmessage = tk.Label(rframe, text="Reading books",
bg="blue", fg="white")
dmessage.pack()
read = tk.IntVar()
song= tk.IntVar()
label1 = tk.Checkbutton(tframe, text="Reading books",
variable=read, onvalue = 1, offvalue = 0).grid(column=0,
row=0, sticky=tk.W)
label2 = tk.Checkbutton(tframe, text="Listening to songs",
variable=song, onvalue = 1, offvalue = 0).grid(column=1,
row=0, sticky=tk.W)
button1 = tk.Button(lframe, text="Click to display
selected hobbies.", bg="red", padx=5, pady=5,
command=show_hobbies)
button1.pack()
root.mainloop()
You can select any one or both hobbies and then click the button. Figures 9.7,
9.8, and 9.9 show the output for selection of different hobbies.
Figure 9.7: Reading Books Selected Figure 9.8: Listening to Songs Selected
In Code Snippet 3:
9.2.4 Radiobutton
Unlike Checkbutton, Radiobutton allows the users to choose only one option
from the available options. For example, you can ask the users to choose one
of the age group ranges from the list of age groups when entering their
personal details.
Radiobutton_name stands for the name of the widget. root is the parent
window. You can use many options to configure the Radiobutton widget and
these options are written as comma-separated key-value pairs. Table 9.8 lists
some of the options available for the Radiobutton Widget.
Options Description
borderwidth Helps in representing the size of the border
Consider that in Code Snippet 1 you want to display a list of age groups in the
first frame and a button in the second frame. When the user selects an age
group and clicks the button, an appropriate message must be displayed. If <
18 is selected, then the message must be ‘Child or Young Adult’ or the
message must be ‘Adult’. This message must be displayed in the third frame.
Code Snippet 4 provides the code to create the desired interface using a
Radiobutton.
Code Snippet 4:
import tkinter as tk
root = tk.Tk()
tframe = tk.Frame(root)
tframe.pack()
bframe = tk.Frame(root)
bframe.pack(side=tk.BOTTOM)
lframe = tk.Frame(bframe)
lframe.pack(side=tk.LEFT)
rframe = tk.Frame(bframe)
rframe.pack()
def show_ages():
if (age.get()==1):
for widgets in rframe.winfo_children():
widgets.destroy()
dmessage = tk.Label(rframe, text="Child or Young
Adult", bg="blue", fg="white")
dmessage.pack()
elif (age.get()==2):
for widgets in rframe.winfo_children():
widgets.destroy()
dmessage = tk.Label(rframe, text="Adult",
bg="blue", fg="white")
dmessage.pack()
age = tk.IntVar()
label1 = tk.Radiobutton(tframe, text="< 18", variable=age,
value = 1).grid(column=0, row=0, sticky=tk.W)
label2 = tk.Radiobutton(tframe, text=">= 18",
variable=age, value = 2).grid(column=1, row=0,
sticky=tk.W)
Figure 9.10 shows the output of execution of the code in Code Snippet 4.
You can select only one age group and then, click the button. Figures 9.11 and
9.12 show the output for selection of different age groups.
In Code Snippet 4:
The Entry widget is used to accept inputs such as name, email address, and
so on from the users. It is a small text box which allows the user to type in a
single line of input. The style of entering the input text can be changed by
options.
Entry_name stands for the name of the widget. root is the parent window
which holds the Entry widget. Table 9.9 lists some of the options available for
the Entry widget.
Options Description
exportselection Sets the default value 1 (copying the
text to clipboard) to 0
selectbackground Sets an image for the widget
selectforeground Sets the font color of the selected text
selectborderwidth Sets the width of the border for the
selected text
width Sets the width of the border or image
Textvariable Gets the current text from the widget
Shows the input text instead of the
show masked character such as *
xscrollcommand Makes the horizontal scroll bar available
if the entered text is more than the size
of the widget
insertbackground Sets the background color
Methods Description
delete(first, Deletes the specified characters inside the
last=None) widget
get Fetches the Entry widget's current text as a
string
icursor(index) Sets the insertion cursor just before the character
at the specified index
index(index) Places the cursor to the left of the
character written at the specified index
select_clear Clears the selected text in the entry widget
select_present Checks if some text is selected in the widget
insert(index, s) Inserts the specified string(s) before
the character placed at the specified index
select_adjust(index) Includes the selection of the character present
at the specified index
select_form(index) Sets the anchor index position to the character
specified by the index
select_range(start, Selects the characters in between the specified
end) range
select_to(index) Selects all the characters from the beginning to
the specified index
xview(index) Links the entry widget to a horizontal scrollbar
xview_scroll(number, Makes the entry widget scrollable horizontally
what) where number specifies the number of units to be
moved and What specifies whether by tk.PAGES
or tk.UNITS
Consider that you want to display two labels: Name and e-mail. Against these
labels, you want to accept user inputs for name and e-mail. You also want to
include a button, which when clicked will display the name and e-mail entered
by the user. Code Snippet 5 shows the code to accept user input using the
Entry widget.
import tkinter as tk
frame = tk.Tk()
frame.geometry("400x250")
bframe = tk.Frame(frame)
bframe.pack(side=tk.BOTTOM)
def show_info():
for widgets in bframe.winfo_children():
widgets.destroy()
dmessage = tk.Label(bframe, text="Name: " +
namevar.get() + " ; e-mail: " + emailvar.get(), bg="blue",
fg="white")
dmessage.pack()
namevar = tk.StringVar()
emailvar = tk.StringVar()
name = tk.Label(frame, text = "Name").place(x = 30,y = 50)
nameentry = tk.Entry(frame, textvariable =
namevar).place(x = 90, y = 50)
email = tk.Label(frame, text = "e-mail").place(x = 30, y =
130)
emailentry = tk.Entry(frame, textvariable =
emailvar).place(x = 90, y = 130)
button1 = tk.Button(frame, text="Click to display the name
and e-mail.", bg="red", padx=5, pady=5, command=show_info)
button1.place(x = 30, y= 180)
frame.mainloop()
You can enter the name and e-mail information and then, click the button.
Figure 9.14 shows the output with the entered information.
In Code Snippet 5:
Options Description
x, y Helps to define horizontal and vertical offset in
pixels
height, Helps to set height and width of the widget in
width pixel
anchor Helps to specify the exact location of the widget
in the container such as NW and SE
relx, rely Specify floating point numbers between 0.0 and
1.0 which is used as an offset in the horizontal
and vertical direction
relheight, Specify floating point numbers between 0.0 and
relwidth 1.0 which serve as offset between horizontal and
vertical direction in relation to the height and
width of the parent window
Canvas widget helps in creating graphical images on the screen. Any kind of
shapes such as circle, rectangle, or octogen can be drawn on a Canvas
widget. Canvas is used to complex layouts, charts, and plots.
In the syntax, Canvas_name is the name of the canvas, and the root parameter
denotes the parent window. You can change the layout of the canvas using
many options that are written as comma-separated pairs of key-values.
Table 9.12 lists some of the options available for the Canvas widget.
Options Description
Confine Helps in making the canvas non-
scrollable outside the scroll region
Consider the example in Code Snippet 5. Let us add two lines-one atop and
one beneath the Button widget. Code Snippet 6 adds lines as intended using
the Canvas widget.
Code Snippet 6:
import tkinter as tk
frame = tk.Tk()
frame.geometry("350x250")
bframe = tk.Frame(frame)
bframe.pack(side=tk.BOTTOM)
canvas = tk.Canvas(frame, bg="brown", height=250,
width=350)
def show_info():
for widgets in bframe.winfo_children():
widgets.destroy()
dmessage = tk.Label(bframe, text="Name: " +
namevar.get() + " ; e-mail: " + emailvar.get(), bg="blue",
fg="white")
dmessage.pack()
namevar = tk.StringVar()
emailvar = tk.StringVar()
line1 = canvas.create_line(20,170,340,170,fill='white')
button1 = tk.Button(frame, text="Click to display the name
and e-mail.", bg="green", fg="white", padx=5, pady=5,
command=show_info)
button1.place(x = 60, y= 180)
line2 = canvas.create_line(20,230,340,230,fill='white')
canvas.pack()
frame.mainloop()
Figure 9.15 shows the output of execution of the code in Code Snippet 6.
You can enter the name and e-mail information and then, click the button.
The output with the entered information will appear beneath the canvas.
In Code Snippet 6:
The Canvas widget is used as a frame to draw two lines as shown in Figure
9.15. The create_line method of the canvas widget is used to draw the
lines.
A function named show_info is used in the command option of the
Using the widgets discussed so far, you can easily develop a simple application
that takes user input and displays the data.
9.3.1 Example 1
Example 1 helps take user inputs from widgets such as Entry and Button. The
input entered is then displayed on the canvas as shown in Figure 9.16. The user
must provide the name and author of the book. When the user clicks the Book
Font color button, the color palette appears. The user can choose a color
for the Book Name from this color palette. Similarly, the user can choose a color
for the Author Name by clicking the Author Font color button.
Code Snippet 7:
def get_author():
e_author=author_var.get()
color2=colorchooser.askcolor()[1]
canvas.create_text(180, 130, text="by", fill="black",
font=('Helvetica 10 bold'))
canvas.create_text(210, 150, text=e_author,
fill=color2, font=('Helvetica 10 bold'))
name = Label(canvas, text = "Book Name").place(x = 20,y =
250)
author=Label(canvas, text = "Author Name").place(x = 20,y
= 280)
namebutton = Button(canvas, text ="Book Font
color",command=get_name).place(x = 80, y = 320)
authorbutton = Button(canvas, text ="Author Font
color",command=get_author).place(x = 180, y = 320)
e1 = Entry(canvas,textvariable = name_var).place(x = 100,
y = 250)
e2 = Entry(canvas,textvariable = author_var).place(x =
100, y = 280)
canvas.pack()
root.mainloop()
The code uses widgets such as Canvas, Label, Entry, and Button.
The Canvas widget holds the Entry widgets for taking the names of the
book and author from the user.
The Button widget is used to pick font color and display the entered text
in desired font color.
The functions, get_name and get_author, take input from the Entry
widgets, retrieve the chosen colors from the colorchooser interface
and display the output in the upper part of the screen.
9.3.2 Example 2
import tkinter as tk
from tkinter import *
root = tk.Tk()
root.geometry("400x280")
name = tk.Label(root, text = "Name").place(x = 30,y = 50)
gender = tk.Label(root, text = "Gender").place(x = 30, y =
100)
qualification=tk.Label(root, text =
"Qualification").place(x = 30, y = 150)
chk1 = IntVar()
chk2 = IntVar()
chk3 = IntVar()
radio = IntVar()
name_var=tk.StringVar()
def get_value():
e_name=name_var.get()
e_gender=str(radio.get())
e_quali1=str(chk1.get())
e_quali2=str(chk2.get())
print("Name:" + e_name)
if (e_gender=="1"):
print("Gender: Male")
else:
print("Gender: Female")
if(e_quali1=="1") and (e_quali2=="1"):
print("Qualification: BS, MS")
elif(e_quali1=="1"):
print("Qualification: BS")
else:
if(e_quali2=="1"):
print("Qualification: MS")
e1 = tk.Entry(root,textvariable = name_var).place(x = 80,
y = 50)
e2 =tk.Radiobutton(root, variable=radio,text='Male',
value=1).place(x=80,y=100)
e3=tk.Radiobutton(root, variable=radio,text='Female',
value=2).place(x=150,y=100)
e4=tk.Checkbutton(root, text = "BS",variable = chk1,
onvalue = 1,offvalue = 0).place(x=130,y=150)
In Code Snippet 8:
Figure 9.18 shows the output of the code with inputs entered in various widgets.
GUI applications make the user experience with the application smooth.
Tkinter is a library to create a GUI-based application in Python.
Tkinter provides many widgets such as Label, Entry, Checkbutton,
Radiobutton, and Canvas.
Each widget has its own properties which can be specified as options
during declaration.
Methods can be invoked to make some changes in the property values.
Pack, Grid, and Place are the methods of the geometry manager that
facilitate positioning of the widgets on the frame.
2. Which of the following widgets is used to get a single-line text from the
user?
a. Entry
b. Frame
c. Canvas
d. Label
a. pack
b. place
c. layout
d. grid
a. Draw
b. Canvas
c. Frame
d. Custom
a. Button
b. Canvas
c. Frame
d. Container
1 a, c, d
2 a
3 a, b, d
4 b
5 c
1. Write a Python GUI program to enter your name in a text box. When a
button is clicked, your name should be displayed with a welcome
message.
In this session, you will learn how to connect Python with MySQL.
To connect Python with MySQL, use the MySQL Connector Python module that
allows Python to connect with the MySQL database server.
To install MySQL:
13. In the Accounts and Roles window, in the MySQL Root Password and
Repeat Password text boxes, enter the password. Click the Next
button as shown in Figure 10.8.
Code Snippet 1:
import mysql.connector
conn = mysql.connector.connect(user='root',
password='mysql123', host='localhost')
cursor = conn.cursor()
cursor.execute("CREATE DATABASE Studentdb")
print("Database created Successfully")
conn.close()
Argument Description
user This argument represents the username of the user
who interacts with the MySQL server.
passwd This argument represents the password to
authenticate the user. The user must enter the
password given by the user during MySQL
installation.
host This argument represents server name/ IP address
on which the MySQL sever is running.
database This argument represents the name of the
database to which the user wants to connect to.
The cursor method creates a MySQLCursor object that allows you to interact
with the database. The execute method is used to execute the Structured
Queried Language (SQL) queries. Here, the query to create the Studentdb
database is executed. The close method is used to close the connection.
Save the code in Code Snippet 1 in a file, for example create_db.py. Open
the Command Prompt and run the command as:
python create_db.py
2. You will be prompted to enter the password. Enter the password given
when installing the MySQL server as shown in Figure 10.16.
Code Snippet 2:
import mysql.connector
conn = mysql.connector.connect(user='root',
password='mysql123', host='localhost',
database='studentdb')
cursor = conn.cursor()
sql ='''CREATE TABLE IF NOT EXISTS student_info(
Stud_id CHAR(10) NOT NULL, Name CHAR(20),
Age INT,
City CHAR(80)
)'''
cursor.execute(sql)
print("Table successfully Created");
conn.close()
python create_table.py
2. To view the tables in the studentdb database, run the commands as:
Show tables;
describe student_info;
The command executes as shown in Figure 10.20.
Let us design an application using Tkinter that stores student information in the
student_info table. The Tkinter Entry widget can be used to obtain student
details such as name, age, gender, and address. The Button widget can be
used to perform operations such as create, read, update, and delete the
student information. Code Snippet 3 lists the code for this application.
import tkinter as tk
from tkinter import ttk, messagebox
import mysql.connector
from tkinter import *
root = tk.Tk()
root.geometry("800x500")
global e1
global e2
global e3
global e4
global e5
e1 = tk.Entry(root)
e1.place(x=140, y=10)
e2 = tk.Entry(root)
e2.place(x=140, y=50)
e3 = tk.Entry(root)
e3.place(x=140, y=90)
e4 = tk.Entry(root)
e4.place(x=140, y=130)
python student_design.py
Adding a record
Reading a record
Updating a record
Deleting a record
Refreshing the frame
Add a Record:
Code Snippet 4:
def Add():
stuid = e1.get()
stuname = e2.get()
stuage = e3.get()
city = e4.get()
mysqldb=mysql.connector.connect(host="localhost",user="root",
password="mysql123",database="studentdb")
mycursor=mysqldb.cursor()
try:
sql = "INSERT INTO student_info
(Stud_id,Name,Age,City) VALUES (%s, %s, %s, %s)"
val = (stuid,stuname,stuage,city)
mycursor.execute(sql, val)
mysqldb.commit()
lastid = mycursor.lastrowid
messagebox.showinfo("information", "Student Record
inserted successfully.")
e1.delete(0, END)
e2.delete(0, END)
e3.delete(0, END)
e4.delete(0, END)
e1.focus_set()
except Exception as e:
print(e)
mysqldb.rollback()
mysqldb.close()
The Add function in Code Snippet 4 gets the data that the user inputs in the
form. This data is stored in the student_info table using the INSERT
statement.
Delete a Record:
Code Snippet 5:
def Delete():
stuid = e1.get()
mysqldb=mysql.connector.connect(host="localhost",user="root",
password="mysql123",database="studentdb")
mycursor=mysqldb.cursor()
try:
sql = "DELETE FROM student_info WHERE Stud_id = %s"
val = (stuid,)
mycursor.execute(sql, val)
mysqldb.commit()
lastid = mycursor.lastrowid
messagebox.showinfo("information", "Student Record
Deleted successfully.")
e1.delete(0, END)
e2.delete(0, END)
e3.delete(0, END)
e4.delete(0, END)
e1.focus_set()
except Exception as e:
print(e)
mysqldb.rollback()
mysqldb.close()
tk.Button(root, text="Delete",command =
Delete,height=3, width= 13).place(x=250, y=160)
The Delete function in Code Snippet 5 gets the Student ID of the student
whose record must be deleted. The user inputs Student ID in the form. The
record is deleted from the student_info table using the DELETE statement.
Update a Record:
Code Snippet 6:
def Update():
stuid = e1.get()
stuname = e2.get()
stuage = e3.get()
city = e4.get()
mysqldb=mysql.connector.connect(host="localhost",user="root",
password="mysql123",database="studentdb")
mycursor=mysqldb.cursor()
try:
sql = "UPDATE student_info SET Name= %s, Age= %s,
City= %s WHERE Stud_id= %s"
val = (stuname,stuage,city,stuid)
mycursor.execute(sql, val)
mysqldb.commit()
lastid = mycursor.lastrowid
messagebox.showinfo("information", "Student Record
Updated successfully.")
e1.delete(0, END)
e2.delete(0, END)
e3.delete(0, END)
e4.delete(0, END)
e1.focus_set()
except Exception as e:
Also, the line of code pertaining to the Update button in Code Snippet 3 must
be edited to include command = Update as shown in the code.
tk.Button(root, text="Update",command =
Update,height=3, width= 13).place(x=140, y=160)
The Update function in Code Snippet 6 gets the Student ID, Name, Age, and
City of the student whose record must be updated. The user inputs this data
in the form. The record is updated in the student_info table using the UPDATE
statement.
Read a Record
Code Snippet 7:
def Read():
stuid = e1.get()
mysqldb=mysql.connector.connect(host="localhost",user="root",
password="mysql123",database="studentdb")
mycursor=mysqldb.cursor()
try:
sql = "SELECT Stud_id,Name,Age,City FROM student_info
WHERE Stud_id= %s"
val = (stuid,)
mycursor.execute(sql, val)
records = mycursor.fetchone()
e1.delete(0, END)
e1.insert(0,records[0])
e2.insert(0,records[1])
e3.insert(0,records[2])
e4.insert(0,records[3])
e1.focus_set()
except Exception as e:
Also, the line of code pertaining to the Read button in Code Snippet 3 must
be edited to include command = Read as shown in the code.
tk.Button(root, text="Read",command =
Read,height=3, width= 13).place(x=140, y=160)
The Read function in Code Snippet 7 gets the Stud_id of the student whose
record must be fetched. The user inputs the Stud_id in the form. The record is
fetched from the student_info table using the SELECT statement and
displayed on the screen.
When the frame is launched, all the records in the student_info table are
displayed in the Treeview widget at the lower half of the frame. After an
update or delete operation is performed on the student_info table, clicking
the Refresh button will fetch the updated records from the table. The show
method in Code Snippet 8 takes care of this.
Code Snippet 8:
def show():
mysqldb=mysql.connector.connect(host="localhost",user="root",
password="mysql123",database="studentdb")
mycursor=mysqldb.cursor()
children = listdisplay.get_children()
for child in children:
listdisplay.delete(child)
mycursor.execute("SELECT Stud_id,Name,Age,City FROM
student_info")
records = mycursor.fetchall()
Code Snippet 9:
import tkinter as tk
from tkinter import ttk, messagebox
import mysql.connector
from tkinter import *
def Add():
stuid = e1.get()
stuname = e2.get()
stuage = e3.get()
city = e4.get()
mysqldb=mysql.connector.connect(host="localhost",user="root",
password="mysq@123",database="studentdb")
mycursor=mysqldb.cursor()
try:
sql = "INSERT INTO student_info
(Stud_id,Name,Age,City) VALUES (%s, %s, %s, %s)"
val = (stuid,stuname,stuage,city)
mycursor.execute(sql, val)
mysqldb.commit()
lastid = mycursor.lastrowid
messagebox.showinfo("information", "Student Record
inserted successfully.")
e1.delete(0, END)
e2.delete(0, END)
e3.delete(0, END)
e4.delete(0, END)
e1.focus_set()
except Exception as e:
print(e)
mysqldb.rollback()
mysqldb.close()
def Read():
stuid = e1.get()
mysqldb=mysql.connector.connect(host="localhost",user="root",
password="mysql123",database="studentdb")
mycursor=mysqldb.cursor()
try:
e1.focus_set()
except Exception as e:
print(e)
mysqldb.rollback()
mysqldb.close()
def Update():
stuid = e1.get()
stuname = e2.get()
stuage = e3.get()
city = e4.get()
mysqldb=mysql.connector.connect(host="localhost",user="root",
password="mysql123",database="studentdb")
mycursor=mysqldb.cursor()
try:
sql = "UPDATE student_info SET Name= %s, Age= %s, City=
%s WHERE Stud_id= %s"
val = (stuname,stuage,city,stuid)
mycursor.execute(sql, val)
mysqldb.commit()
lastid = mycursor.lastrowid
messagebox.showinfo("information", "Student Record
Updated successfully.")
e1.delete(0, END)
e2.delete(0, END)
e3.delete(0, END)
e4.delete(0, END)
e1.focus_set()
except Exception as e:
print(e)
def Delete():
stuid = e1.get()
mysqldb=mysql.connector.connect(host="localhost",user="root",
password="mysql123",database="studentdb")
mycursor=mysqldb.cursor()
try:
sql = "DELETE FROM student_info WHERE Stud_id = %s"
val = (stuid,)
mycursor.execute(sql, val)
mysqldb.commit()
lastid = mycursor.lastrowid
messagebox.showinfo("information", "Student Record
Deleted successfully.")
e1.delete(0, END)
e2.delete(0, END)
e3.delete(0, END)
e4.delete(0, END)
e1.focus_set()
except Exception as e:
print(e)
mysqldb.rollback()
mysqldb.close()
def show():
mysqldb=mysql.connector.connect(host="localhost",user="root",
password="mysql123",database="studentdb")
mycursor=mysqldb.cursor()
children = listdisplay.get_children()
for child in children:
listdisplay.delete(child)
mycursor.execute("SELECT Stud_id,Name,Age,City FROM
student_info")
records = mycursor.fetchall()
e1 = tk.Entry(root)
e1.place(x=140, y=10)
e1.focus_set()
e2 = tk.Entry(root)
e2.place(x=140, y=50)
e3 = tk.Entry(root)
e3.place(x=140, y=90)
e4 = tk.Entry(root)
e4.place(x=140, y=130)
show()
root.mainloop()
python student_design.py
In the page that appears, insert 1001 in the Student ID field, Richard
Franklin in the Name field, 14 in the Age field, and Boston in the City field.
Then, click the Add button as shown in Figure 10.22.
Similarly, insert a few more records into the student_info table. To fetch a
record, enter the Student ID and click Read as shown in Figure 10.24.
The fields in this record can be updated by editing the values displayed. For
example, let us update the Age as 14. After updating the value, click the
Update button. Then, click Refresh to view the updated record as shown in
Figure 10.26.
To delete a record, enter the Student ID and click Delete as shown in Figure
10.27.
Click the Delete button. Then, click Refresh to verify that the record is deleted
as shown in Figure 10.28.
a. Python connector
b. MySQL Connector
c. Connector MySQL Python
d. SQL Python connector
a. connection
b. pythonConnect
c. connect
d. mysqlConnect
1 import mysql.connector
2 conn = mysql.connector.connect(user=’root’,
password=’mysql123’, host=’localhost’)
3 cursor = conn.cursor()
4 //insert code here
5 conn.close()
1 b
2 c
3 a
4 d
5 a
Pandas is a Python package that includes various functions to help you work
with data sets. The different types of data that you can handle using Pandas
include textual data, numerical data, boolean data, datetime, tabular data,
time series, matrices, arrays, and more. The various tasks that you can perform
using Pandas are:
Loading data from different sources such as Excel, Comma-Separated
Values (CSV), SQL, HTML, and JSON.
Handling missing data by filling, replacing, or dropping them.
Merging and joining data from different data sources.
Reshaping and pivoting data for further data processing or data
summarization.
Grouping and aggregating data.
Performing statistical operations on grouped data, such as mean,
median, and standard deviation.
Presenting data through plots, graphs, and charts.
Formatting the display of data by applying styles.
To use the classes and functions available in the Pandas library, you must first
install Pandas on your system and then import the library into your code.
This command launches the pip installer package, which in turn downloads
the packages and files required to run Pandas. Figure 11.1 shows the
installation of Pandas using the pip installer.
Once Pandas is downloaded and installed, it is ready for use. Next, you must
import it in your code. The syntax for importing pandas is:
The given syntax imports Pandas with an alias name. Accessing Pandas
functions using its alias name makes your code concise and readable. For
example, the given code imports Pandas with an alias name pd.
import pandas as pd
By this way, whenever you want to access one of the functions in Pandas,
instead of typing pandas.<function_name>, you can type pd.<function
name>.
• A Series is an one-dimensional
Series labeled array that holds data of
any type such as strings and
integers.
• A DataFrame is a two-dimensional
DataFrame array such as a table or a
spreadsheet that stores data of
different types in rows and columns.
In the syntax, data, index, dtype, and copy are the parameters of the Series
function. Table 11.1 describes these parameters.
Parameter Description
data Contains data to be stored in the Series object.
The data can be a list, an array, a scalar value, or a
dictionary.
index A unique value that acts as a label to identify the
data in the array or list. The index should be of same
length as data.
Code Snippet 1 shows the creation of a Series object from a list. The code
passes a list of string values as the data argument and creates an object
named color. The code then, displays the values stored in color to the
standard output device. The code lets Pandas create a default index for
the values in the list.
Code Snippet 1:
import pandas as pd
color = pd.Series(['red', 'blue', 'green' ,'yellow',
'White'])
print(color)
Figure 11.3 shows the output of Code Snippet 1. Note that the output
specifies the index or the labels for each of the values in the list. By default,
the index is set to a range of integers from 0 to 4. In addition, the output also
displays the inferred data type of color, which in this case is object.
import pandas as pd
dict = {'Richard': 15,
'David': 20,
'William': 25}
user_ser = pd.Series(dict)
print(user_ser)
Figure 11.4 shows the output of Code Snippet 2. Note that the inferred data
type of user_ser is displayed as int64 in the output.
A scalar is a single valued data. To create a Series object from a scalar value,
you must pass the index argument to the pandas.Series function by
specifying the labels for the data. Pandas then creates a Series object by
repeating the scalar value to match the length of the specified index. For
example, Code Snippet 3 creates a Series object from the scalar value 50 by
passing a list of five integers as the index argument.
Code Snippet 3:
import pandas as pd
num_ser = pd.Series(50, index=[1, 2, 3, 4, 5])
print(num_ser)
Figure 11.5 shows the output of Code Snippet 3. Note that the output repeats
the scalar value 50 for five times matching the length of the specified index.
pandas.DataFrame(data, index,
columns, dtype, copy)
In the syntax, data, index, columns, dtype, and copy are the parameters of
DataFrame function. Table 11.2 describes these parameters.
Parameter Description
data Specifies the data to be stored in the DataFrame
object. The data can be any collection such as a
ndarray (an n-dimensional array), a list, a
Series, a dictionary, a Microsoft Excel file, a CSV
file, a SQL table, or a SQL result set.
index Specifies the row labels of the DataFrame. If not
specified, index defaults to range(n), where n is
the length of the data.
columns Specifies the column labels of the DataFrame. If
not specified, columns default to range(m),
where m is the number of columns in the data. If
the data is a dictionary, the keys are used as
column labels, by default.
dtype Specifies the type of data to be stored in the
DataFrame. If not specified, Pandas infer the type
from the given data.
copy A boolean value that indicates whether to copy
the data or not. The default value is false.
You can create DataFrames from different data sources such as lists,
dictionaries, CSV files, Excel files, and SQL tables.
You can create a DataFrame object from a single list, multiple lists or a list of
lists using the pandas.DataFrame function. For example, Code Snippet 4
creates a DataFrame from a list of lists. The code passes a list containing three
lists representing names and genders of people as the data arguments. As
each list contains two columns, the code passes two labels, namely Name and
Gender as the columns argument. The code then, prints the created
DataFrame object to the standard output device.
Code Snippet 4:
import pandas as pd
data = [['Richard','M'],['Emy','F'],['Adam','M']]
df = pd.DataFrame(data,columns=['Name','Gender'])
print (df)
Figure 11.6 shows the output of Code Snippet 4. You can see that the output
displays the data stored in the DataFrame object with the specified column
labels and the default row labels in a tabular format.
import pandas as pd
data = {'Name':['Richard', 'Emy',
'Adam'],'Gender':['M','F','M']}
df = pd.DataFrame(data)
print (df)
Figure 11.7 shows the output of Code Snippet 5. The output shows the column
labels, the default row labels, and the values in a tabular format. Note that the
keys of the dictionary are used as column labels and their corresponding
values are displayed in the respective columns.
CSV files are comma-separated text files. You can create a DataFrame object
from a CSV file using the pd.read_csv function. This function accepts the path
or URL of a CSV file as an argument, reads the CSV file, and returns its contents
as a DataFrame object.
Code Snippet 6:
import pandas as pd
df = pd.read_csv('Product_sales.csv')
df
Figure 11.8 shows the output of Code Snippet 6. The output shows the contents
of the Product_sales.CSV file that was loaded into the DataFrame object.
Pandas provides various methods to help you view the data stored in its data
structures. For example, you can use the head and tail methods of the
Series and DataFrame classes to view a small sample of the respective
objects. Alternatively, to view the statistical information of the data, you can
use the describe method.
By default, the head method returns the top five rows. To view a greater
number of rows, you can pass the required integer as an argument to the head
method.
pandas.DataFrame.head(<n>)
The syntax for retrieving the first n rows from a Series is:
pandas.Series.head(<n>)
Code Snippet 7:
import pandas as pd
df = pd.read_csv('Product_sales.csv')
data_top = df.head()
data_top
Figure 11.9 shows the output of Code Snippet 7. The output shows the first five
rows of the Product_sales.csv file with all the columns and the default index
starting from 0.
To view the last n rows of a Series or a DataFrame, use the tail method. By
default, the tail method returns the last five rows. To view a greater number
pandas.DataFrame.tail(<n>)
The syntax for retrieving the last n rows from a Series is:
pandas.Series.tail(<n>)
Code Snippet 8:
import pandas as pd
df = pd.read_csv('Product_sales.csv')
data_bottom = df.tail()
data_bottom
Figure 11.10 shows the output of Code Snippet 8. The output shows the last five
rows of the Product_sales.csv file from index 9 to 13 with all the columns.
To view the statistical data of a DataFrame or a Series, you can use the
describe method. This method returns a summary of the statistical information
of the Series or DataFrame provided. The statistical information in the
summary includes the mean of the values, maximum of the values, minimum
of the values, count, standard deviation, and percentile.
pandas.DataFrame.describe()
pandas.Series.describe()
Let us look at an example code that demonstrates the use of the describe
method using a DataFrame object. Code Snippet 9 creates a DataFrame
object called df from the Product_sales.csv file and calls the describe
method to obtain the summary statistics of df. It then prints the summary
statistics on the screen.
Code Snippet 9:
import pandas as pd
df = pd.read_csv('Product_sales.csv')
print(df.describe())
Figure 11.11 shows the output of Code Snippet 9. The output shows the statistics
of only those columns that are of numerical data type and does not include
Product, Brands, and Description columns. This is because the data type of
these columns is not numerical.
To add a column to a DataFrame, specify the new column name between the
[] brackets at the left side of the assignment operator. Then, specify the value
to be assigned to the column at the right side of the assignment operator. The
value can be a list, a dictionary, a Series or another DataFrame, or even a
result of an arithmetic operation.
Let us now look at an example code that demonstrates how to add a new
column to a DataFrame. The newly added column holds a computed value
import pandas as pd
df = pd.read_csv('Product_sales.csv')
df['Discount'] = (df['Marked Price in $'] - df['Sale Price
in $'])*100/df['Marked Price in $']
df
Figure 11.12 shows the output of Code Snippet 10. Note that the DataFrame
has an additional column, Discount that shows the discount percentage for
each product.
There may be scenarios, where you want to add a scalar value to the values
in a Series or column of a DataFrame object. For example, consider that a
DataFrame holds a column Sale Price for each product, and you want to
increase the values in Sale Price by a constant for all products. Pandas
allows you to perform such basic arithmetic operations on a Series or a
DataFrame by providing various functions. Some of these functions are:
Code Snippet 11 demonstrates the use of add function. The code reads the
Product_sales.csv to a DataFrame object df. It then uses the df['Sale
Price in $'].add(20) function to add the scalar value of 20 to the values
in the column Sale Price in $ of the DataFrame object df. It displays df
with the updated values in the column Sale Price in $.
Code Snippet 11
import pandas as pd
df = pd.read_csv('Product_sales.csv')
df['Sale Price in $']= df['Sale Price in $'].add(20)
df
Figure 11.13 shows the output of Code Snippet 11. Note that values in Sale
Price in $ is incremented by 20 for each product.
pandas.DataFrame.sort_values(by,axis=0, ascending=True,
inplace=False, kind=’quicksort’, na_position=’last’)
pandas.Series.sort_values(by,axis=0, ascending=True,
inplace=False, kind=’quicksort’, na_position=’last’)
Parameter Description
by A string or a list of strings that specifies the labels to
be sorted. If axis is 0 or 'index', then the labels may
contain column names. If axis is 1 or 'columns',
then the labels may contain index names.
axis An integer or a string that specifies the axis to be
sorted. The default value is 0 or 'index', which
indicates sorting by rows. If the value is 1 or
'columns', then the data is sorted by columns.
Code Snippet 12 demonstrates the use of the sort_values method. The code
reads the data from Product_sales.csv into a DataFrame object df. It then
uses the sort_values method to sort the DataFrame by the Star ratings
column in ascending order. The code modifies the original DataFrame to the
sorted order by specifying True for inplace. In addition, it specifies 'last' for
na_position to place any null values represented by Not a Number (NaN) at
the end of the sorted order. The code then calls df to display the DataFrame
on the standard output device.
Code Snippet 12:
import pandas as pd
df = pd.read_csv('Product_sales.csv')
df.sort_values('Star ratings', axis = 0, ascending = True,
inplace = True, na_position ='last')
df
Figure 11.14 shows the output of Code Snippet 12. Note that the DataFrame
df is sorted by Star ratings. As you can see, the row with the index 0 is
displayed at the last because the value of Star ratings is NaN, which
indicates missing or null data.
Truncating is removing data. You can truncate rows or columns of data from a
DataFrame or a Series using the truncate method. The syntax for the
truncate method is:
pandas.DataFrame.truncate(before,after,axis,copy)
pandas.Series.truncate(before,after,axis,copy)
Parameter Description
before Specifies the index value to truncate all rows
before it.
after Specifies the index value to truncate all rows after
it.
axis Specifies the axis to truncate along. The values
can be:
0 or 'index'
1 or 'columns'
Code Snippet 13 demonstrates the use of the truncate method. The code
reads the data from Product_sales.csv into a DataFrame object df. It then
calls the df.truncate method and sets the before and after parameters of
the method to 5 and 9, respectively. This indicates that all the rows before the
index label 5 and all the rows after the index label 9 should be removed from
the resultant DataFrame. As the code does not specify any value for the axis
parameter, the default value of 0 or 'index' is used. The truncation is done
along the axis or rows. Similarly, as the code does not specify the copy
parameter, the default value of True is used. Then, a copy of the truncated
section is returned as a new DataFrame object. The code assigns this newly
returned DataFrame object to the variable newdf. It then calls newdf to display
the DataFrame on the standard output device.
import pandas as pd
df = pd.read_csv('Product_sales.csv')
newdf = df.truncate(before=5, after=9)
newdf
Figure 11.15 shows the output of Code Snippet 13. As you can see, the output
shows newdf that contains only the rows between the index labels 5 and 9
that are truncated from the original DataFrame df.
Filtering data refers to extracting the required subset of data from an entire
dataset. To filter data in a data set, you can read the data into a DataFrame
pandas.DataFrame.loc(searchstring)
In general, the input value of loc can be a single label, a list, or an array of
labels. An example of a single label could be 6 or 'Product' where 6 is not
interpreted as an integer but as a label of the index.
Let us consider Code Snippet 14 that demonstrates the loc method. Code
Snippet 14 reads Product_sales.csv into a DataFrame called df. The code
sets the index_col to 0 indicating that the first column of the CSV file should
be used as the row labels of df. It then uses the df.loc method to extract the
rows that have the index or row label as Earphones and return the rows as a
DataFrame object. The code assigns the newly returned DataFrame object to
the variable find_rec. Finally, the code calls find_rec to print the DataFrame
on the standard output device.
import pandas as pd
df = pd.read_csv('Product_sales.csv', index_col =0)
find_rec = df.loc["Earphones"]
find_rec
Figure 11.16 shows the output of Code Snippet 14. As you can see, the output
contains the details about the product, Earphones.
In Pandas, NaN and None represent missing or null values, where NaN is the
default missing value marker. The reasons for using NaN as the default missing
value marker are for computational speed and ease of use with different
data types such as floating point, boolean, or integer.
Pandas provides you with various functions to detect, fill, or drop NaN values
in a DataFrame or a Series.
As both the isnull and notnull functions serve the same purpose, let us look
at a sample code demonstrating the isnull function. Let us consider the
data in the Product_sales.csv file in which the Star Ratings for the
Earphones of Sony brand is NaN. Figure 11.17 shows the data in the
Product_sales.csv file highlighting the NaN value for Star Ratings in the
row with index 0.
Code Snippet 15 reads this CSV file into a DataFrame object df and prints df
to the standard output device. It then calls the df.isnull function to return a
boolean DataFrame object that shows whether each value in the original
DataFrame object df is null or not. As the code does not assign the output
DataFrame to any variable, the DataFrame directly prints to the standard
output device.
import pandas as pd
df = pd.read_csv('Product_sales.csv')
df
df.isnull()
Figure 11.18 shows the boolean DataFrame that was returned by the
df.isnull function. You can see that this output DataFrame has the same
number of rows and columns as that of the original DataFrame object df,
except that the values are either True or False. Note that the row with index
0 has the value True for Star Ratings because the value of this column is
NaN in the original DataFrame.
Once you have identified that there are missing values in a DataFrame or a
Series, you can either fill those missing values or drop them. The functions to
fill or drop missing values are:
o fillna: This function replaces missing values with some other values
and returns the object with missing values filled.
o dropna: This function removes rows or columns with missing values
and returns the object with NaN entries dropped from it.
import pandas as pd
df = pd.read_csv('Product_sales.csv')
df.fillna(0)
Figure 11.19 shows the output of Code Snippet 16 that displays the new
DataFrame object returned by the df.fillna function. Note that the NaN
import pandas as pd
df = pd.read_csv('Product_sales.csv')
df.dropna()
Figure 11.20 shows the output of Code Snippet 17. The output DataFrame does
not have the row with index 0 as it has dropped this row due to the existence
of missing value or NaN in Star Ratings. Similarly, rows with index 3 and 9 also
are dropped.
These functions operate on text data in Series or index objects and can
perform tasks such as converting case, replacing strings, and removing
whitespaces.
Converting Case
Pandas provides the str.lower and str.upper functions to convert text data
to lowercase and uppercase, respectively. Suppose you have a DataFrame
object df with a column Brands that contains string values. You can use the
df[“Brands”].str.upper() command to get a new Series with all brands
in uppercase. Code Snippet 18 demonstrates this.
import pandas as pd
df = pd.read_csv('Product_sales.csv')
df["Brands"]= df["Brands"].str.upper()
df
Replacing Strings
import pandas as pd
df = pd.read_csv('Product_sales.csv')
df["Product"]= df["Product"].str.replace("earphones",
"Headphones", case = False)
df
The boolean value False for the Case parameter indicates that the
replacement is not case sensitive. The code assigns the resultant Series object
Figure 11.22 shows the output of Code Snippet 19. Note that the rows with index
0, 5, and 6 in the Product column, which had the value Earphones previously
is now replaced with Headphones.
Removing Whitespaces
Code Snippet 20 demonstrates the use of the strip function. The code
creates the DataFrame object df from Product_sales.csv and prints the
Product column in df to the standard output device. It then calls the
df["Product"].str.strip() function to remove the leading and trailing
spaces from the string values in Product column. The str.strip function
returns a new Series object with the same index as the original data, but with
import pandas as pd
df = pd.read_csv('Product_sales.csv')
print(df.Product)
df['Product'] = df['Product'].str.strip()
print(df.Product)
Figure 11.23 and Figure 11.24 show the output of Code Snippet 20. Figure 11.23
shows the values in the Product column in the original DataFrame object df
with whitespaces. Figure 11.24 shows the updated df, with all the whitespaces
removed from the values in the Product column.
a. Series
b. Panel
c. DataFrame
d. FrameSeries
a. bottom
b. tail
c. row_bottom
d. tail_row
a. stat
b. percentile
c. statistics
d. describe
4. Which of the following functions replaces all the NaN values with a
specified value, 99?
a. fillnull
b. fillnan
c. fillna
d. fillNaN
a. isnull
b. is nan
c. notnull
d. notnan
1 c
2 b
3 d
4 c
5 a, c
b. Write Python program to get the first three rows and the last four
rows from the DataFrame.
c. Write a Python program to select only the rows where the student
hobby is Dancing.
d. Write a Python program to select only the rows where the Java
marks are missing.
This session will provide an overview of data visualization and its importance in
conveying information effectively to a wide audience. It will explain the Python
libraries such as Matplotlib and Seaborn used for data visualization. Finally, it
will describe different types of data visualization techniques available in
Python, including bar charts, scatter plots, line graphs, and histograms.
Consider that you are a marketing analyst tasked with understanding customer
behavior for an e-commerce Website. You have been provided with a massive
spreadsheet containing purchase history, demographics, and Website
interaction data for thousands of customers. Trying to decipher insights from
this tabular data will be challenging.
Tables and Comma Separated Value (CSV) files provide only raw data. To
identify meaningful patterns, trends, and relationships or to detect any
correlations between Website engagement and purchasing decisions, you
require a tool that can represent the data pictorially. Data visualization is the
process where raw data is transformed into visual representations.
Data visualization falls within the sphere of data analysis and involves creating
visual depictions of information. These visuals such as pictures, maps, and
graphs help to convey insights of data in a clear way. Through data
visualization, you can quickly grasp an overview of any information. This visual
approach helps the human brain in processing and comprehending the data
provided.
12.2.1 Matplotlib
● Line
● Histogram
● Scatter
● 3D
● Image
● Contour
● Polar
12.2.2 Seaborn
Seaborn stands out as an exceptional Python library tailored for the graphical
representation of statistical data. Seaborn offers an array of color palettes and
visually pleasing styles and helps to create enhanced statistical plots in Python.
Widely embraced for data science and machine learning, the Python Seaborn
library builds upon the visualization capabilities of Matplotlib.
For a given set of data points, Matplotlib provides the option to create various
types of plots. You can create a line plot, custom marker line plot, plot with
custom markers and line styles, or an advanced line plot with labels and a grid.
Let us generate a basic line plot for a set of data points. Consider that the
scores of five students—Adam, Richard, William, Emy, and Linda—in a test
are 86, 90, 79, 78, and 96, respectively. Code Snippet 1 generates a basic line
plot for this data.
Code Snippet 1:
In Code Snippet 1:
Figure 12.2 displays the output of the code in Code Snippet 1. The code will
generate a basic line plot that connects the given names on the x-axis with
their corresponding scores on the y-axis. It helps to visualize the relationship
between the names and the scores.
Customize a Plot
You can customize various aspects of a line plot using both the fmt string and
keyword arguments in Matplotlib.
Markers are shapes or symbols on a plot that help to highlight data points in a
plot. These markers can be customized to different styles using the marker
argument in the fmt string.
Marker Description
O Circle
* Star
. Point
, Pixel
x X
X X (filled)
+ Plus
Let us generate a custom marker line plot for the same set of data points used
in Code Snippet 1. The code in Code Snippet 2 generates a line plot for this
data with asterisk(*) as the marker.
Code Snippet 2:
Line style refers to the appearance of the line that connects the data points in
a plot. Line style can be customized by setting the type of line or changing the
color of line in the fmt string. Table 12.2 shows the available options for type of
line.
Let us generate a plot with customized marker and line for the same data
points used in the previous sections. The code in Code Snippet 3 generates a
line plot for this data with a circular green marker and dotted green line.
Code Snippet 3:
In Code Snippet 3, the marker and line style are customized using 'o:g'. Here
‘o’ sets the marker style to circle, ‘:’ sets the line style to dotted line and ‘g’
sets the color to green. Figure 12.4 displays the output of the code.
Let us generate a plot with customized marker and line for the data points (2,
3) and (6, 8). For example, consider that you want to create a plot with green
asterisk markers with red edges, sized 20 and connected by a green dotted
line. The code in Code Snippet 4 generates a line plot with the specified
customizations.
Code Snippet 4:
Pyplot provides options for enhancing your plots. The options include:
● The xlabel and ylabel functions to assign labels to the x-axis and
y-axis.
● The title function establishes a title for your plot.
● The grid function enhances visual clarity by adding grid lines to your
plot.
Let us generate a plot where student names are plotted on the x-axis and their
respective marks on the y-axis. The data points are represented using
diamond-shaped markers, connected by dashed lines in red color. The plot is
enhanced with labels for x and y axes, a title indicating Marksheet and grid
lines for improved visual clarity. The code in Code Snippet 5 generates a plot
with these specifications.
Code Snippet 5:
plt.plot(x, y,'D--r')
plt.xlabel("Students")
plt.ylabel("Marks")
plt.title("Marksheet")
plt.grid()
plt.show()
In Code Snippet 5:
The plt.xlabel("Students") statement marks the x-axis with the label
as ‘Students’.
The plt.ylabel("Marks") statement marks the y-axis with the label
as ‘Marks’.
The plt.title("Marksheet") statement gives the plot a title as
‘Marksheet’.
Charts are one of the most common tools used for data visualization. You can
create a variety of charts in Python using Matplotlib. Download the
Product_sales.csv file from Onlinevarsity. Refer to this file for the examples
on various charts.
Line charts are used to represent the relation between two datasets, X and Y,
on different axes.
Let us contrast the sale prices and marked prices of the leading five brands of
a product with high star ratings. You can generate dual-line graphs to facilitate
a comparative analysis of their prices. Code Snippet 6 lists the code for
creating a line chart to compare the sale prices of the brands.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
df = pd.read_csv('Product_sales.csv')
dp = df.head()
brand = dp["Brands"]
saleprice = dp['Sale Price in $']
mrpprice=dp['Marked Price in $']
plt.xlabel("Brands")
plt.ylabel("Price in $")
plt.plot(brand,saleprice,marker = '*',label ="Sale Price in $")
plt.plot(brand,mrpprice,marker = 'D',label ="Marked Price in
$")
plt.legend()
plt.show()
In Code Snippet 6:
Primarily, a bar chart is used to illustrate the relationship between numeric and
categorical values. A bar graph serves to contrast discrete categories. One
axis of the chart represents a particular category of the columns and another
axis represents the corresponding values or counts of each category.
A bar chart, also referred to as a bar graph, presents categorical data using
rectangular bars. The charts can be created in vertical or horizontal
orientations. The dimensions of the bars including heights or lengths are
proportional to the values that they represent.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('Product_sales.csv')
df['Discount'] = (df['Marked Price in $'] - df['Sale Price
in $'])*100/df['Marked Price in $']
x_pt = df["Brands"]
y_pt = df['Discount']
plt.barh(x_pt, y_pt)
plt.xlabel("Discount %")
plt.ylabel("Brands")
plt.show()
In Code Snippet 7:
Data is taken from a CSV file named Product_sales.csv.
The code reads the CSV file, calculates the percentage discount for
each product, and then generates a horizontal bar chart using the
plt.barh function.
The x-axis represents different brands of products, the y-axis indicates the
percentage discount, and each bar corresponds to a brand's discount
percentage.
Scatter plots illustrate the relationships between two variables. They use dots to
plot and represent various data points. These plots are particularly useful for
showcasing relationships between numeric variables, as they position data
points along both horizontal and vertical axes. This provides insight into the
degree of influence one variable exerts on another.
Let us generate a scatter plot to show the relationship between the sale price
and the marked price of each brand of products. Code Snippet 8 shows the
code for creating a scatter plot.
Code Snippet 8:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
df = pd.read_csv('Product_sales.csv')
saleprice = df['Sale Price in $']
mrpprice=df['Marked Price in $']
plt.xlabel("Marked Price in $")
plt.ylabel("Sale Price in $")
plt.grid()
plt.scatter(saleprice, mrpprice)
In Code Snippet 8:
Data is taken from a CSV file named Product_sales.csv.
The code reads the CSV file and creates a scatter plot to visualize the
relationship between Sale Price in $ and Marked Price in $ values.
The code adds labels and gridlines for a better understanding of the
data distribution.
12.3.4 Histogram
Parameter Description
X Is the array or sequence of arrays
bins Can be an integer or a sequence or ‘auto’ and is
an optional parameter
Code Snippet 9:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('Product_sales.csv')
plt.xlabel("Star Ratings Range")
plt.ylabel("Count")
plt.hist(df['Star Ratings'],bins =
[4,4.1,4.2,4.3,4.4,4.5,4.6,4.7,4.8,4.9])
In Code Snippet 9:
Data is taken from a CSV file named Product_sales.csv.
The code reads the CSV file. It takes the data in the Star Ratings
column from the DataFrame and divides it into specified bins, ranging
from 4 to 4.9 with small increments (0.1).
The histogram depicts the frequency or count of star ratings falling within
each bin.
a. plt.plot
b. plt.scatter
c. plt.bar
d. plt.hist
import pandas as pd
import matplotlib.pyplot as plt
data = {
'Subjects': ['Math', 'Science', 'History', 'English',
'Geography'],
'Average_Score': [78, 92, 85, 75, 88]
}
df = pd.DataFrame(data)
x_pts = df["Subjects"]
y_pts = df['Average_Score']
plt.barh(x_pts, y_pts, color='skyblue')
plt.xlabel("Average Score")
plt.ylabel("Subjects")
plt.title("Average Scores of Students")
plt.show()
a. Scatter Plot
b. Histogram
c. Line Chart
d. Horizontal Bar Chart
import pandas as pd
import matplotlib.pyplot as plt
data = {
'Age': [25, 30, 35, 40, 45],
'Height': [160, 170, 165, 175, 180]
}
data_df = pd.DataFrame(data)
# Missing code: Complete the code to create a scatter
plot
Display the scatter plot
plt.show()
a. scatter(data_df['Height'], data_df['Age'])
plt.xlabel("Height (cm)")
plt.ylabel("Age (years)")
plt.title("Height vs Age")
b. plot(data_df['Age'], data_df['Height'])
plt.xlabel("Age (years)")
plt.ylabel("Height (cm)")
plt.title("Height vs Age")
c. scatter(data_df['Age'], data_df['Height'])
plt.xlabel("Age (years)")
plt.ylabel("Height (cm)")
plt.title("Age vs Height")
d. plot(data_df['Height'], data_df['Age'])
plt.xlabel("Height (cm)")
plt.ylabel("Age (years)")
plt.title("Height vs Age")
5. You want to customize your line plot with circle data points connected
using dotted/dashed lines. Which of the following options specifies the
correct fmt string to plot that line in blue color?
a. 'c-:b'
b. 'c-:B'
c. 'o-.b'
d. 'o-.B'
1 a
2 d
3 c
4 d
5 c
3. You work for a furniture retail company that offers a wide range of
furniture products such as tables, chairs, sofas, and cabinets. The
company has a dataset named Furniture_sales.csv containing
information about the marked price of different furniture items. Create
the Furniture_sales.csv file using the dataset given in Table 12.7.
Data when sourced from different sources such as surveys or Web pages is raw
and comes in different formats, such as text, numbers, and images. For
example, suppose you want to analyze the academic performance of
students in a particular university. In that case, you may have to collect raw
data from various sources, such as faculty feedback and performance reports
of students. The raw data can be in different formats such as Portable
Document Formats (PDFs) and online forms. Also, the raw data may contain
missing values, errors, inconsistencies, and duplicate values and thus make it
difficult to comprehend the data. To make the raw data useful for analysis, you
must process it by:
Filling or dropping the missing values.
Removing the errors, inconsistencies, and duplicate values.
Filtering or grouping the data by relevant criteria.
Formatting the data into one single format.
Loading and exploring the dataset is the preliminary step performed in ML. This
step is important because it helps to:
Discover the distribution and trends of the data.
Identify the shape or structure of the data (such as number of rows and
columns it has).
View information about the data.
Identify missing values.
Identify duplicates.
Identify categorical and numerical columns.
To load and explore data for ML, you can use Pandas. For example, let us
consider a dataset that contains the gender-wise academic performance of
students in various courses. The data is based on the number of hours the
students attended each course and the number of hours the students
prepared for each course. Download the student_success.csv dataset file
present under Course Files on OnlineVarsity. Figure 13.1 shows the dataset in
student_success.csv opened in Excel. To analyze this dataset, you must
load the dataset into a Pandas DataFrame and use the functions available in
the Pandas library to perform the analysis.
Code Snippet 1:
import pandas as pd
df=pd.read_csv('student_success.csv')
df
Figure 13.2 shows the DataFrame df printed on the screen. Note that the
output is displayed in a readable and formatted way. This can help in
inspecting and exploring the data visually to identify any issues or patterns in
the data.
After loading the dataset into a DataFrame, you can identify the shape or
dimension of the dataset by using the pandas.DataFrame.shape property.
This property returns the number of rows and columns of the dataset
representing its dimensionality. The code to display the size of a DataFrame is:
df.shape
Figure 13.3 shows the output of the code displaying a tuple containing the
number of rows and number of columns in the dataset in df. The output
indicates that the DataFrame df has 21 rows and 5 columns.
To view a brief information about the dataset, you can use the
pandas.DataFrame.info method. This method prints a concise summary of
the DataFrame. The summary first lists the number of columns and the range
index of the DataFrame. It then presents the count of non-null values in each
column and the data type of each column in a table format. Finally, the
summary includes the memory usage of the DataFrame. The code to
summarize the DataFrame is:
df.info()
Figure 13.4 shows the output of the code displaying information about the
dataset stored in the DataFrame df.
To improve the quality and accuracy of the data for ML, it is important to
identify missing values in the data. This will give an understanding of how the
missing values are distributed and related to other variables. The sum method
can be used on the result of the isnull method to display the number of
missing values in each column of the DataFrame df. The return value of the
sum method is a Series object containing the column names and
corresponding counts of null values.
df.isnull().sum()
Figure 13.5 shows the output of the code. The output indicates that there are
two null values in each of the columns, Hours_attended and
Hours_prepared, respectively. There are three null values in Success_exam.
df.duplicated()
Categorical data are data that can be classified into different groups based
on their features or attributes, such as gender, courses, and results. They are
textual in nature. The default type of text data in Pandas is object. Identifying
categorical data is necessary because they must be encoded or transformed
into numerical values. This will enable ML models to perform mathematical
operations on the data, learn from the data, and make predictions.
Code Snippet 2 finds the categorical and numerical columns in the DataFrame
df. The code uses for loop to enumerate the columns in df and assigns the list
of columns whose dtype is object to the variable, categorical_col. It then
again enumerates the columns in df and assigns the list of columns that are
not of object data type to the variable, numerical_col.
Code Snippet 2:
df[categorical_col].nunique()
After loading and exploring the data, the next step is to clean the data. Some
of the tasks involved in data cleaning are:
Splitting the
Dataset into
Handling Missing
Dependent and
Data
Independent
Variables
To remove duplicates from the dataset in a DataFrame, you can use the
pandas.DataFrame.drop_duplicates method. This method returns the
DataFrame object after removing the duplicates.
Code Snippet 3:
df=df.drop_duplicates()
df
Figure 13.9 shows the output of Code Snippet 3. Note that the rows with indexes
19 and 20 are removed from the DataFrame df.
To make the data in a dataset consistent and usable for different ML models,
it is necessary to have a single representation for values in columns in a dataset.
For example, refer to the DataFrame df shown in Figure 13.9. The values in
column Success_exam in the DataFrame df represent the success or failure
status of students in various courses. However, this column has different kinds
of entries for indicating success and failure. For example, while the row with
index 0 has an entry Y for indicating success, the row with index 2 has the entry
as Yes for the same status. Similar is the case for the failure status too. The row
with index 1 has the entry as No for indicating failure, whereas the row with
index 4 has the entry as N for the same.
To ensure that the success and failure statuses are uniformly represented, Code
Snippet 4 replaces all Yes entries with Y and all No entries with N in column
Success_exam using df["Success_exam"].str.replace('Yes','Y') and
Code Snippet 4:
df["Success_exam"]=df["Success_exam"].str.replace('Yes',
'Y')
df["Success_exam"]=df["Success_exam"].str.replace('No','
N')
df["Success_exam"].fillna('Y',inplace=True)
df
Figure 13.10 shows the output of Code Snippet 4. Note that the DataFrame df
has the replaced and consistent set of values in Success_exam.
Outliers are extreme values in a dataset that differ extensively from other data
points within that dataset. Outliers can distort the result of the statistical analysis
and thus might make accurate predictions difficult to arrive at. Therefore, they
should be identified and removed from the dataset. Outliers are generally
identified for numerical columns.
One of the ways to identify outliers is by using the Interquartile Range (IQR)
method. This method divides the respective column values in a dataset into
four equal parts called quartiles. The first quartile Q1 is defined as the 25th
percentile of the data, which is the middle number between the smallest
number and the median of the dataset. The second quartile Q2 is the 50 th
percentile of the data, which is the median of the dataset. The third quartile
Q3, which is the 75th percentile of the data is the middle value between the
median and the highest value of the dataset. The interquartile range is the
difference between the third quartile and the first quartile. The steps to identify
the outliers are:
To calculate the first quartile and the third quartile for a column in a
DataFrame, you can use the pandas.Series.quantile method. This method
calculates the value at a given quantile in a Series object or in a column of
a DataFrame.
Code Snippet 5 shows the code to identify the IQR, lower bound and upper
bound for the Hours_attended column. The code uses the
df['Hours_attended'].quantile method to calculate the Q1 value at
0.25 quantile and Q3 value at 0.75 quantile for Hours_attended in df. The
code calculates the IQR by subtracting Q3 from Q1. It calculates the lower
bound and upper bound by using the respective formula and prints these
values to the standard output device.
Code Snippet 5:
q1 = df['Hours_attended'].quantile(0.25)
q3 = df['Hours_attended'].quantile(0.75)
iqr = q3 - q1
lower_bound=q1 - 1.5 * iqr
upper_bound=q3 + 1.5 * iqr
print('Lower Bound=',lower_bound,'Upper
Bound=',upper_bound)
Figure 13.11 shows the output of Code Snippet 5 and displays the lower and
upper bounds for the column Hours_attended in df.
Code Snippet 6 shows the code to remove the outliers from the column
Hours_attended in df. The code retrieves the rows in df that have
Hours_attended values that are either greater than the upper bound or lesser
than the lower bound by using the df.loc method. The code stores the
retrieved rows in a new DataFrame called outliers. It then calls the
df.drop(outliers.index) method to remove the respective rows in df that
have the same index as the rows in outliers. The code then returns a new
DataFrame df with the outliers removed. The code calls df to print the
DataFrame to the standard output device.
Figure 13.12 shows the output of Code Snippet 6. Note that the output shows
the DataFrame df with the row with index 22 removed, which had an outlier
value of 180 in the column Hours_attended.
Code Snippet 7 shows the code to calculate the IQR, upper bound, and lower
bound for the column Hours_prepared in df. The code uses the
df['Hours_prepared'].quantile method to calculate the Q1 value at 0.25
quantile and Q3 value at 0.75 quantile for Hours_prepared in df. The code
calculates the IQR by subtracting Q3 from Q1. It calculates the lower bound
and upper bound and prints these values to the standard output device.
q1 = df['Hours_prepared'].quantile(0.25)
q3 = df['Hours_prepared'].quantile(0.75)
iqr = q3 - q1
lower_bound=q1 - 1.5 * iqr
upper_bound=q3 + 1.5 * iqr
print('Lower bound for hours
prepared=',lower_bound,'Upper bound for hours
prepared=',upper_bound)
Figure 13.13 shows the lower bound and upper bound for Hours_prepared.
Code Snippet 8 shows the code to remove the outliers from the column
Hours_prepared in df. The code retrieves the rows in df that have
Hours_prepared values that are either greater than the upper bound or lesser
than the lower bound by using the df.loc method. The code stores the
retrieved rows in a new DataFrame called outliers. It then calls the
df.drop(outliers.index) method to remove the respective rows in df that
have the same index as the rows in outliers. The code then returns a new
DataFrame df with the outliers removed. The code calls df to print the
DataFrame to the standard output device.
Code Snippet 8:
Figure 13.14 shows the output of Code Snippet 8. Note that the output shows
the DataFrame df with the same number of original rows. This indicates that
the column Hours_prepared did not have any outliers.
Missing data in a dataset can be handled by either removing the missing data
or replacing the data with some other values. The replacement values can be
a string or numerical constant, or a derived value, such as the mean, mode, or
median of the column.
There are two null values in each of the columns, Hours_attended, and
Hours_prepared. There were three null values in Success_exam. Code
Snippet 4 replaced the null values in Success_exam with Y to make the column
values consistent.
Code Snippet 9 shows the code to replace the null values in Hours_attended
and Hours_prepared. The code replaces the NaN values in these columns with
the mean of the respective column. The code calls
df['Hours_attended'].fillna(df['Hours_attended'].mean())to
replace the missing values in the Hours_attended column with the mean
Code Snippet 9:
df['Hours_attended'].fillna(df['Hours_attended'].mean(),
inplace=True)
df['Hours_prepared'].fillna(df['Hours_prepared'].mean(),
inplace=True)
df
Figure 13.15 shows the output of Code Snippet 9. Note that the rows with NaN
entries in Hours_attended and Hours_prepared are replaced with the mean
values of the respective column data.
Code Snippet 10 retrieves the values from the columns that are the
independent variables in df and stores them as a two-dimensional array called
x. The code first selects the Gender, Courses, Hours_attended, and
Hours_prepared columns from df. It then uses the values attribute to retrieve
the values of the selected columns as a numpy array. The code assigns the
numpy array to the variable x and prints x to the standard output device.
x=df[['Gender','Courses','Hours_attended','H
ours_prepared']].values
x
Figure 13.16 shows the value of numpy array x. Note that the array has the same
number of rows as df with four columns corresponding to the selected
independent variables.
Code Snippet 11 selects the column Success_exam from df, which is the
dependent variable in df. The code retrieves the values of the column and
stores them as a two-dimensional array called y. The code prints y to the
standard output device.
y=df[['Success_exam']].values
y
Encoding is the process of converting data from one form to another form for
the purpose of storage or processing. In ML, the categorical data should be
converted into numerical data by using appropriate encoding methods. The
reason is categorical data are text values. It is difficult for the ML models to
learn from the categorical data as computers understand only numerical
values.
There are several ways to encode categorical data. Two of the most used
methods are as follows:
One-hot encoding
Label encoding
Figure 13.20 shows the numpy array x with each of the unique categories in its
second column Courses transformed into a binary vector of three columns
appended to its end.
labelencoder_x = LabelEncoder()
x[:,0] = labelencoder_x.fit_transform(x[:,0])
x
Figure 13.21 shows the output of the execution of Code Snippet 13. Note that
the first column is label-encoded with numerical values 0 or 1.
The code:
1. Creates an instance of the LabelEncoder class and assigns it to the
variable labelencoder_y.
2. Uses labelencoder_y.fit_transform(y) to fit and transform the
values of y into a two-dimensional array of numerical labels of shape
(n_samples, 1), where n_samples is the number of samples in the input
array.
3. Uses the numpy.ravel method to flatten the two-dimensional array
into a one-dimensional array. The reason is that some of the ML models
require a one-dimensional array of shape (n_samples) as the target
variable.
4. Assigns the result of labelencoder_y.fit_transform(y).ravel(),
which is a one-dimensional array containing the numerical labels to y,
thus overwriting the original values of y with the numerical labels.
Code Snippet 14:
labelencoder_y = LabelEncoder()
y=labelencoder_y.fit_transform(y).ravel()
y
After encoding the data, the next step is to split the data as training and test
sets. The reason is that training and testing the ML model on the same data
might make the model learn only specific patterns rather than the relationships
between the underlying data. This can lead to the poor performance of the
ML model when it is provided with new data that has different patterns.
Splitting the data into training and test sets allows us to simulate the
performance of the model on new data and evaluate how well the model
generalizes to new data. The training set contains data that ML models learn
from. The test set contains data on which ML models are evaluated.
Code Snippet 15 shows the code to split the independent and dependent
variables, x and y into training and test sets. The code:
1. Imports the train_test_split function from the
sklearn.model_selection module. The train_test_split function
is a tool for splitting data into training and test sets.
2. Uses the train_test_split function that takes an input array, x and
an output array, y as arguments and returns four subtests: x_train,
x_test, y_train, and y_test. These subsets represent the input and
output variables for the training and test sets, respectively.
3. Specifies the test_size parameter of the train_test_split function
as 0.2. This means that 20% of the data will be used for the test set and
80% for the training set.
4. Specifies the random_state parameter of the train_test_split
function as 0, which means that the data will be shuffled before splitting
in a reproducible way. This ensures that the results are consistent across
different runs of the code.
Code Snippet 15:
The code to print the value of x_train to the standard output device is:
x_train
The code to print the value of x_test to the standard output device is:
x_test
The code to print the value of y_train to the standard output device is:
y_train
The code to print the value of y_test to the standard output device is:
Y_test
Code Snippet 16 applies feature scaling to preprocess the training and test
sets of the input variables using the same scaling parameters. This is done to
ensure that the training and test sets have a similar range and distribution. The
code:
1. Imports the StandardScaler class from the sklearn.preprocessing
module, which is a tool for standardizing features by removing the mean
and scaling to unit variance.
2. Creates an instance of the StandardScaler class and assigns it to the
variable sc_x.
3. Uses sc_x.fit_transform(x_train) to fit and transform the x_train
array. This method calculates the mean and standard deviation of each
feature in x_train. It then uses them to center and scale each feature
value by subtracting the mean and dividing by the standard deviation.
The result is a new array of standardized features that has zero mean
and unit variance along each column and it is assigned back to
x_train.
4. Uses the sc_x.transform(x_test) to transform the x_test array. This
method uses the same mean and standard deviation that it calculated
Figure 13.27 shows the feature scaled array of x_train and x_test. Note
that both the arrays have similar range and distribution.
Figure 13.28 shows the feature scaled array of x_test. Note that both the
arrays have similar range and distribution.
As the training and test sets are the basis for training, testing, and improving
the ML models, it is necessary to save these datasets. By saving the datasets,
they can be reused for various purposes. Datasets can be compared with
other datasets, updated with new data, or shared among researchers and
developers who work on related ML models.
13.7.1 Save x_train, x_test, y_train, and y_test
To save the training and test sets, you can use the Python library joblib.
joblib helps to store any Python object that contains numpy array.
Code Snippet 17 shows the code to save the training and test sets using the
joblib.dump function. The code first imports the joblib library. It then uses
the joblib.dump function to serialize the x_train, x_test, y_train, and
y_test numpy array objects. The serialized objects are saved as files in the
current working directory with the names filextrain.pkl, filextest.pkl,
fileytrain.pkl, and fileytest.pkl, respectively.
import joblib
joblib.dump(x_train,'filextrain.pkl')
joblib.dump(x_test,'filextest.pkl')
joblib.dump(y_train,'fileytrain.pkl')
joblib.dump(y_test,'fileytest.pkl')
The extension .pkl denotes pickle files that are used to store
Python objects. The numpy array objects in the .pkl files can be
later loaded into the memory using the joblib.load function.
2. You gathered data on stock market trends for your machine learning
project. The dataset includes both historical prices and categorical
data. To use this data effectively for training your model, what step
should you perform after loading and exploring the dataset?
a. Noisy data
b. Missing values
c. Unusable format
d. Overfitting
1 d
2 b
3 c
4 b
5 d
A dataset of animals is stored in the machine. Suppose, a new input in the form
of a cow arrives. Based on the data stored, the machine thinks if the input is a
cat, a dog, a snake, or a cow, and then decides that it is a cow. This can be
considered as a form of supervised learning.
Supervised learning finds its use in many places including healthcare, finance,
and educational sectors. It is based on input data that is being stored and the
data which is questioned. A map between the input and possible output is
made to predict the result accurately. The supervised learning algorithm tries
to make a relationship between input and output and take meaningful insights
from the mapping.
The two types of supervised learning are Regression Learning and Classification
Learning.
Figure 14.3 shows the possibilities of output for a binary classification prediction
model.
Find the
Training the Predicting the
Accuracy of the
Model Test Data
Model
Training
The training and test data are taken in the form of pickle files with extension
.pkl. Download the filextrain.pkl file present under Course Files on
Onlinevarsity. This file contains the training data of independent variables.
Code Snippet 1 shows how to load this data.
Code Snippet 1:
import joblib
x_train = joblib.load('filextrain.pkl')
x_train
Similarly, Code Snippet 2 loads the training data can be loaded for the
dependent variables (y_train).
Code Snippet 2:
import joblib
y_train = joblib.load('fileytrain.pkl')
y_train
Code Snippet 3 shows how to load the test data for the independent variables
(x_test).
Code Snippet 4 shows how to load the test data the dependent variables
(y_test).
Code Snippet 4:
import joblib
y_test = joblib.load('fileytest.pkl')
y_test
The data loaded can now be used to train the model. The
LogisticRegression class which belongs to the sklearn library provides
many methods to train the model. Code Snippet 5 shows the code to train the
model. Training data x and y are given as arguments for the fit function in
the classifier_model object.
Code Snippet 5:
Code Snippet 6 shows the creation of prediction array. The predict function
helps in predicting the values of the dependent variables (y_pred) based on
the values of the independent variables (x_test). The values of y_pred is
displayed.
Code Snippet 6:
y_pred=classifier_model.predict(x_test)
y_pred
The next step is to evaluate the model. Metrics are used to evaluate the results
of prediction against the actual values. Here, y_test is the actual values and
y_pred is the predicted values. To test the accuracy of the predicted model,
the predicted values must be compared with the actual values. Figure 14.11
depicts the process flow of determining the accuracy of the prediction model.
Value Meaning
True Positive Actuals and predicted show the same data point as 1
which means the inference and actual values are positive
True Negative Actuals and predicted show the same data point as 0
which means the inference and actual values are negative
False Positive Actual has the data point as 0 and predicted is 1 which
means the inference value is negative but actual value is
positive
False Negative Actual has the data point as 1 and predicted is 0 which
means the inference value is positive but the actual value
is negative
The confusion_matrix function can be used with test values and prediction
values of independent variables as input to generate the confusion matrix.
Code Snippet 7 helps in creating the confusion matrix.
Code Snippet 7:
from sklearn.metrics import
confusion_matrix
confmat = confusion_matrix(y_test,
y_pred)
print ("Confusion Matrix : \n", confmat)
To interpret the accuracy from the confusion matrix, you must add the true
positive and true negative scores. In the confusion matrix in Figure 14.13, the
true positive and true negative have the values 2. Thus, the total score is 4 (2+2).
The total number of values in y_pred and y-test arrays is 4. Therefore, the
prediction model is 100% accurate. Note that the rest of the cells in the matrix
have the value 0 as there are only 4 values in the array.
Code Snippet 8:
14.3.1 Clustering
Clustering learning patterns involve classifying the group into sub-groups based
on similarities. This helps in classifying such that the subgroup has homogeneity.
Each subgroup differs from the other in some ways. This classification helps
businesses in targeting their audiences.
Once classified, the attributes and trends of the groups are learned by the
machine itself without any external inputs or input-output mapping. The
clustering algorithm takes the raw data and classifies it into clusters or
subgroups based on patterns, appearances, and behavior. Figure 14.15 shows
the clusters formed from a general population. Clusters formed include Males,
Females, Children, Seniors, and Physically Challenged people.
Business entities target the clusters based on their preferences. For example,
children prefer toys and gaming instruments. Women prefer clothing and
accessories.
14.3.2 Association
Associative learning is learning about the behavior of the clusters. Here, the
clusters and their behavior are studied. The inference of preferences of the
clusters help in guessing the preferences of a similar cluster. For example, when
the target customers are children, the preferences of buying are studied. When
a customer of a similar cluster arrives, the options of products are displayed
with the thought that similar products may be the preferred products of buying.
Customer 1 and Customer 2 bought chocolates, toys, and gadgets. The model
associates this to Customer 3 and predicts purchase preferences.
In this learning model, the number of cluster types can be defined by the K-
means algorithm. K-means algorithm creates K number of clusters for n number
of observations. For each value of K, a different number of clusters are created.
The aim is to create as many clusters as possible and find the best K value to
get optimal results. This clustering algorithm is very useful in image
segmentation, customer segmentation, species clustering, anomaly
detection, and clustering languages.
import numpy as nm
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv('monthlysavings.CSV')
df
x = df.iloc[:, [2,3]].values
x
The Kmeans library has methods such as fit and append. These methods help
in plotting points into the graph. The graph can be observed to arrive at the
elbow point. Code Snippet 11 uses the fit and append methods to plot the
graph.
In Figure 14.19, the elbow point is at 3 and so K=3. Thus, there must be three
major cluster types.
Code Snippet 12 shows the code to scale the data using the min-max
normalization. The fit_transform method in the MinMaxScalar class does
the scaling of data.
The model must be trained with the scaled data. Code Snippet 13 shows the
code to train the model.
kmeans = KMeans(n_clusters=3)
y_predict= kmeans.fit_predict(x)
y_predict
Figure 14.22 shows the clusters and the centroids which are the center point
for each cluster.
The input starts from the initial start point of the model. The output can be
various ways to achieve the target. The model gets trained by learning or
unlearning the path based on the output. Learning and unlearning occurs
continuously. The solution is based on the optimum result.
Positive lessons from the output of an event are studied. The study increases
the strength and similar behavior of the model. It has a positive effect on the
model. In Figure 14.23, the dog decides the path to take to reach its goal. The
path shown by the green arrow is the correct path.
Negative lessons are learnt from the model, and this also increases the strength
of the model. A typical example of negative reinforcement is leaving home for
work early to avoid traffic jams. In Figure 14.23, the path to the hurdle is shown
in red color and it is learned not to take the route to achieve the target.
a. Unsupervised learning
b. Reinforcement learning
c. Supervised learning
d. Semi-supervised learning
a. Clustering
b. Regression
c. Classification
d. Reinforcement learning
1 c
2 a
3 d
4 b
5 c
The word ‘Linear’ comes from the word, ‘Line’. In this method, the data points
for the independent and dependent variables are plotted on the X and Y axes,
respectively, in a graph. Then, a line is drawn to find the best possible
relationship between the two variables. The line is drawn in such a way that
that the data points are as close to the line as possible.
y = a0+a1x+E
Variable Meaning
y Line of Regression
a0 Intercept of the line
a1x Linear regression coefficient
E Random error
Figure 15.1 helps in understanding the Line of Regression. The variables a0, and
a1 are dependent and independent variables, respectively. Here, x is the
coefficient factor of a1 which can be a negative or positive number. The
coefficient is an important factor in defining the line of slope. It helps in
determining, the number of times the factor is multiplied in the creation of the
linear regression slope. E is the error factor which is possible in any statistical
analysis.
The metrics that are commonly used to evaluate linear regression are:
When there is no linear relationship between the input and output variables,
the graph obtained is not a line but a curve. In such cases, linear regression is
not enough to capture the complexity of the relationship between the
y=b0+b1x+b2x2+b3x3…..bnxn
Variable Meaning
y Line of Regression
b0 0th Coefficient
b1 1st Coefficient
bn nth coefficient
x Input variable
Figure 15.2 shows a model of the polynomial regression curve. The trend is not
linear. It goes downward in the initial stage and then picks the momentum after
some time. The coefficients decide on the path of the curve. The relationship
between dependent and independent variables defines the degree of the
polynomial (denoted by n). For the best-fit data, a higher degree of the
polynomial can be used. However, in some rare scenarios, this can lead to the
polynomial regression curve being overfit. This, in turn, will result in new data
not being represented correctly.
There are a few evaluation metrics that assess the accuracy of the model.
These metrics provide insights into how the model fits the data. These insights
help in comparing different models.
Some of the metrics used for the evaluation of polynomial regression are:
A decision tree starts with a question posed on a dataset. Based on the answer
to the question, which is usually Yes or No, the dataset is split into smaller units.
Then, questions are posed to these smaller units, which are further split into
smaller units. This process continues until a decision is taken.
In Figure 15.3, the root node is indicated by the blue box. This node is also
known as the parent node. Branches or child nodes, indicated by the orange
boxes, stem out of the root, based on the decision taken. These child nodes
may again become a parent node for decisions to be taken from there,
thereby, creating sub-trees. The leaf nodes are the end nodes, and they are
indicated by the green boxes. These are the outcomes or results of the decision
tree.
The algorithm used to implement the decision tree is called the Classification
and Regression Tree algorithm (CART). The two methods used to arrive at the
decision in a decision tree are Splitting and Pruning. Splitting is the process of
dividing the root node into child nodes and splitting child nodes into further sub
The accuracy of the decision-tree regression lies in the selection of the right
attribute at the root node or at the various child nodes of the tree. The selection
of the attribute is complex because a dataset can have n attributes. Out of
these n attributes not all of them will be important to the decision that must be
made. Randomly selecting an attribute can lead to bad results.
Therefore, to aid the selection of the right attribute, some solutions have been
identified. These solutions include:
Using these solutions, the value for each attribute can be calculated. Based
on these values, the attributes can be sorted. Then, the attributes can be
placed in the tree in descending order of their values. That is, the attribute with
the highest value can be placed at the root node and the lower values can
be placed at different levels of child nodes. In case of information gain, the
The more the number of decision-trees used, the more accurate will be the
prediction without the risk of overfitting. The concept of random-forest
regression can be depicted as shown in Figure 15.4.
In Figure 15.5, there are two classes, one indicated by the blue circles and the
other indicated by the green circles. When a new data comes in, it can be
placed in blue or green group based on its attributes. Hyperplane maintains
maximum distance from the classes. The support vectors are the boundaries of
the datapoints on both sides of the hyperplane. The distance between the
hyperplane and the support vector is called the margin. The distance between
the two support vectors is the maximum margin. The blue circle on the side of
the green circles is called an outlier. The SVM algorithm does not take outliers
Figure 15.6 shows sales data for the first 15 days of a summer month at a
particular ice cream shop.
Step 1: The first step is to load the data. The data in csv file is loaded by
executing the code in the Code Snippet 1.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.read_csv("C:\SalesData\sales_icecream.csv")
df
In this code, the pandas, numpy, and matplotlib libraries are imported and
the data is read into the dataframe (df) using the function read_csv.
Step 2: The next step is to pre-process the data, where the data is cleaned of
duplicate data, missing data, and outliers.
In the sales data used for this example, there are no missing or duplicate data.
There are also no outliers. So, proceed to the next step.
Step 3: The third step is to separate the dependent and independent variables.
In this case, Temperature (Celsius) is the independent variable and
Code Snippet 2:
x = df[['Temperature (Celsius)']]
y = df['Ice-cream Sales (kg)']
x
In this code, the values of Temperature (Celsius) are loaded into variable
x and the values of Ice-cream Sales (Kg) are loaded into variable y. Then,
the values stored in variable x are printed.
Step 4: After the data is separated into independent and dependent variables,
it should be split for training and testing purposes. Generally, the data is split in
the ratio of 70:30, where 70% of the data is used for training purposes and 30%
data is used for testing purposes. The data can be split into the required ratio
using Code Snippet 3.
In this code, four variables are being created, two for the training dataset and
two for the testing dataset. The testing dataset size is specified as 0.3, which
refers to 30%. The remaining will be used for the training dataset. Then, the
training dataset of the x variable is printed.
Step 5: Next, the model must be trained with dependent and independent
data. The function fit() of LinearRegression helps in training the model
with data.
The model is trained using the code in Code Snippet 4.
Code Snippet 4:
This code trains the model and prints the intercept and coefficient. In the linear
equation, y=a0+a1x+E, a0 is the intercept, a1 is the coefficient and E is the error
Step 6: The next step is to predict using the test data. Now that the model is
trained with train data, it can be used to predict using the data in x_test,
which is the independent training dataset. Prediction can be made using the
code in Code Snippet 5.
Code Snippet 5:
y_pred= sim_linear.predict(x_test)
sim_linear_diff = pd.DataFrame({'Actual value': y_test,
'Predicted value': y_pred})
sim_linear_diff
This code makes the prediction and stores the results of the prediction in the
y_pred variable. This variable is then, compared with the data in the y_test
variable, which is the actual test data for the dependent variable. In this way,
the model is tested for its authenticity.
Step 7: Next, the data must be plotted to find the best-fit line of regression. This
can be done using the code in Code Snippet 6.
plt.scatter(x_test,y_test,s=40,color='green')
plt.scatter(x_test,y_pred,s=40,color='red')
plt.plot(x_test, y_pred, 'blue')
plt.xlabel('Temperature (Celsius)')
plt.ylabel('Ice-cream Sales (kg)')
plt.show()
The code plots the test data in green dots and the predicted values in red dots.
The blue line is the line of regression which is also called as line of best fit.
There are five data points in green. There seem to be only four data
points in red, the reason is, two of the predicted data carry the same
value and hence dots are overlapped.
Code Snippet 7:
The R squared value of 87.30 indicates that the predicted values are 87%
accurate.
The Mean Absolute Error value indicates the difference between the predicted
values and the actual values. This value should be low. The lower the Mean
Absolute Error value, more accurate is the prediction made by the model.
A perfect predictor will have a Mean Absolute Error of 0.
a. Supervised learning
b. Reinforcement learning
c. Unsupervised learning
d. Semi-supervised learning
3. You are a data scientist working for a car dealership. Your job is to
analyze the factors that influence the prices of used cars in your region.
You have collected data on various attributes such as the age of the
car, mileage, brand, and the number of previous owners. You intend to
use linear regression to build a model that predicts the prices of used
cars based on these attributes. What type of linear regression would be
most appropriate for your analysis?
a. Entropy
b. Information gain
c. Gini index
d. Random selection
1 a
2 a
3 b
4 c
5 d
Figure 3 shows that the application takes input from the user through the
Welcome page. The user enters the information on the Welcome page and
clicks the Get Gardening Tips button. The application uses the data entered to
plant_data Database
The plant_data Database has a single table named plant. Table 1 shows the
structure of the plant table. This table will have an id column as the primary
key.
Let us create the plant_data database and the plant table in MySQL. Then,
the data in Table 2 must be inserted into the plant table.
Code Snippet 1:
Code Snippet 2:
Code Snippet 3 lists the code for Welcome.html page. This page uses a text
box for Location and drop-down lists for Climate and Preferred plants.
<!DOCTYPE html>
<html>
<head>
<title>Gardening Application</title>
<link rel="stylesheet" href="{{url_for('static',
filename='CSS/styles.css')}}">
</head>
<body>
<div>
<h1>Welcome to Blossom Buddy!</h1>
<fieldset>
<legend>Enter Your Plant Info:</legend>
<form action="/tips" method="post">
<label for="location">Location:</label>
<!--Placeholder is used for Location-->
<input type="text" name="location" id="location"
placeholder="Indoor plant or Outdoor plant"
pattern="^[A-Za-z\s]+$"
title="Please use letters and spaces only">
<br>
<!-- <label for="climate">Climate:</label>
<input type="text" name="climate" id="climate">
<br> -->
<!--Drop down used for Climate-->
<label for="climate">Climate:</label>
<select name="climate" id="climate">
<option value="Select">Select climate</option>
<option value="Dry">Dry</option>
<option value="Hot">Hot</option>
<option value="Humid">Humid</option>
<br>
<script>
</script>
</body>
</html>
Code Snippet 4:
body {
align-items: center;
background:
url("https://cdn.pixabay.com/photo/2016/02/13/16/34/flowers-
1198159_1280.jpg");
background-attachment: fixed;
background-position: relative;
background-repeat: no-repeat;
background-size: cover;
display: grid;
margin: 0;
place-items: center;
width: 300px; / Set the width of the container /
height: 200px; / Set the height of the container /
background-position: right; / Crop from the right side /
}
fieldset {
/ background-color: #bfe5c6; /
font-size: 20px;
width: 550px;
margin-left: 500px;
margin-bottom: 5px;
}
legend {
background-color: rgb(209, 216, 211);
color: rgb(80, 3, 66);
padding: 10px 20px;
margin: 10px 20px;
}
input,select {
margin: 30px;
width: 200px;
height: 30px;
}
}
.form {
display: flex;
align-items: center;
justify-content: space-between;
flex-direction: column;
padding: 0 3rem;
height: 100%;
text-align: center;
font-size: 40px;
}
h1{
align-items: center;
margin-left: 500px;
width: 400px;
}
label {
display: inline-block;
width: 150px;
}
button{
font-size: 20px;
position: absolute;
background-color:#3b0338;
color: #fff;
border:none;
border-radius:15px;
padding:15px;
min-height:10px;
min-width: 100px;
}
input.button{
float: right;
}
body{
/*For background*/
background:
url("https://cdn.pixabay.com/photo/2012/03/02/00/37/background
-20823_1280.jpg");
background-attachment: fixed;
background-position: relative;
background-repeat: no-repeat;
background-size: cover;
display: grid;
height: 100%;
margin: 0;
place-items: center;
font-size: 20px;
}
.table_div{
/*Table we used to fetch the information from MySQL*/
border: 1px;
padding:150px;
float: right;
margin: 5px;
margin-top: 50px;
</style>
<body>
<div class="table_div">
<table style="border:1px solid black;margin-
left:auto;margin-right:auto;">
<caption class="caption_table">Growing Green: A Guide
to Plant Care 🌿</caption>
<tr>
<th>Plant name</th>
<th>Tips</th>
<th>Watering_Reminder</th>
<th>Fertilizing_Reminder</th>
</tr>
{% for tip in plantdet %}
<tr>
<td>{{ tip.Plant_Name }}</td>
<td>{{ tip.Tips }}</td>
<td>{{ tip.Watering_Reminder}}</td>
<td>{{ tip.Fertilizing_Reminder }}</td>
</tr>
{% endfor %}
</table>
app = Flask(__name__)
mysql = MySQL(app)
try:
cursor =
mysql.connection.cursor(MySQLdb.cursors.DictCursor)
cursor.execute(
'SELECT Plant_Name, Watering_Reminder,
Fertilizing_Reminder, Other_Care_Reminder, Tips FROM plant
WHERE Location = %s AND Climate = %s AND Plant_Name = %s',
(location, climate, plant_name,)
)
plantdet = cursor.fetchall()
cursor.close()
except Exception as e:
# Handle database errors (e.g., connection error)
error_message = f"Database error: {str(e)}"
return render_template("Welcome.html",
error_message=error_message)
else:
# Handle the case when the form is not submitted
return render_template("Welcome.html")
if _name_ == "__main__":
app.run(debug=True)
Before running the code, change the password for MySQL server in
the given line of the Python script.
app.config['MYSQL_PASSWORD'] = 'De!!9373'
Figure 4 shows the folder structure for the application. Place the HTML files in
the templates folder. Place the CSS script in the static folder. The Python
script will remain in the root folder of the application.
Python BlossomBuddy.py
Find the optimal number of clusters and train the given dataset using
the K-means clustering algorithm to group the patients into different
clusters.
a. Load the dataset and extract the independent variables.
b. Find the optimal number of clusters in K-Means using the
elbow method.
c. Apply feature scaling using Min-Max scaling.
d. Train the model using the K-Means algorithm.
e. Visualize the cluster using different colors.