2025 - Course Kit & Lesson Plan - Business Analytics For Decision Making
2025 - Course Kit & Lesson Plan - Business Analytics For Decision Making
making with the responsibility to protect privacy, avoid bias, and ensure transparency and fairness.
Organizations must adopt ethical frameworks and practices to ensure that their analytics initiatives create
positive value without causing harm to individuals, communities, or society at large.
UNIT II
Introduction to Python
Python was developed by Guido Van Rossum in the year 1991.
Python is a high level programming language that contains features of functional programming language like
C and object oriented programming language like Java.
FEATURES OF PYTHON
Simple
Python is a simple programming language because it uses English like sentences in its programs.
Easy to learn
Python uses very few keywords. Its programs use very simple structure.
Open source
Python can be freely downloaded from www.python.org website. Its source code can be read, modified and
can be used in programs as desired by the programmers.
High level language
High level languages use English words to develop programs. These are easy to learn and use. Like COBOL,
PHP or Java, Python also uses English words in its programs and hence it is called high level programming
language.
Dynamically typed
In Python, we need not declare the variables. Depending on the value stored in the variable, Python
interpreter internally assumes the datatype.
Platform independent
Hence, Python programs are not dependant on any computer with any operating system. We can use Python
on Unix, Linux, Windows, Macintosh, Solaris, OS/2, Amiga, AROS, AS/400, etc. almost all operating
systems. This will make Python an ideal programming language for any network or Internet.
Portable
When a program yields same result on any computer in the world, then it is called a portable program.
Python programs will give same result since they are platform independent.
Procedure and Object oriented
Python is a procedure oriented as well as object oriented programming language. In procedure oriented
programming languages (e.g. C and Pascal), the programs are built using functions and procedures. But in
object oriented languages (e.g. C++ and Java), the programs use classes and objects.
An object is anything that exists physically in the real world. An object contains behavior. This behavior is
represented by its properties (or attributes) and actions. Properties are represented by variables and actions
are performed by methods. So, an object contains variables and methods.
A class represents common behavior of a group of objects. It also contains variables and methods. But a
class does not exist physically.
A class can be imagined as a model for creating objects. An object is an instance (physical form) of a class.
Interpreted
First, Python compiler translates the Python program into an intermediate code called byte code. This byte
code is then executed by PVM. Inside the PVM, an interpreter converts the byte code instructions into
machine code so that the processor will understand and run that machine code.
Extensible
There are other flavors of Python where programs from other languages can be integrated into Python. For
example, Jython is useful to integrate Java code into Python programs and run on JVM (Java Virtual
Machine). Similarly IronPython is useful to integrate .NET programs and libraries into Python programs and
run on CLR (Common Language Runtime).
Embeddable
Several applications are already developed in Python which can be integrated into other programming
languages like C, C++, Delphi, PHP, Java and .NET. It means programmers can use these applications for
their advantage in various software projects.
Huge library
Python has a big library that contains modules which can be used on any Operating system.
Scripting language
A scripting language uses an interpreter to translate the source code into machine code on the fly (while
running). Generally, scripting languages perform supporting tasks for a bigger application or software.
Python is considered as a scripting language as it is interpreted and it is used on Internet to support other
softwares.
Database connectivity
A database represents software that stores and manipulates data. Python provides interfaces to connect its
programs to all major databases like Oracle, Sybase, SQL Server or MySql.
Scalable
A program would be scalable if it could be moved to another Operating system or hardware and take full
advantage of the new environment in terms of performance.
Core Libraries in Python
The huge library of Python contains several small applications (or small packages) which are already
developed and immediately available to programmers. These libraries are called ‘batteries included’. Some
interesting batteries or packages are given here:
• argparse is a package that represents command-line parsing library.
• boto is Amazon web services library.
• CherryPy is a Object-oriented HTTP framework.
• cryptography offers cryptographic techniques for the programmers
• Fiona reads and writes big data files
• jellyfish is a library for doing approximate and phonetic matching of strings.
• matplotlib is a library for electronics and electrical drawings.
• mysql-connector-python is a driver written in Python to connect to MySQL database.
• numpy is a package for processing arrays of single or multidimensional type.
• pandas is a package for powerful data structures for data analysis, time series and statistics.
• Pillow is a Python imaging library.
• pyquery represents jquery-like library for Python.
• scipy is the scientific library to do scientific and engineering calculations.
• Sphinx is the Python documentation generator.
• sympy is a package for Computer algebra system (CAS) in Python.
• w3lib is a library of web related functions.
• whoosh contains fast and pure Python full text indexing, search and spell checking library.
To know the entire list of packages included in Python, one can visit:
https://www.pythonanywhere.com/batteries_included/
Python Virtual Machine (PVM) is a software that contains an interpreter that converts the byte code into
machine code.
PVM is most often called Python interpreter. The PVM of PyPy contains a compiler in addition to the
interpreter. This compiler is called Just In Time (JIT) compiler which is useful to speed up execution of the
Python program.
Step 4) Then click on ‘Just Me’ radio button for installing your individual copy.
Step 5) It will show a default directory to install. Click on ‘Next’.
Step 6) In the next screen, select the checkbox ‘Create start menu shortcuts’. Also, unselect other
checkboxes.
Step 7) The installation starts in the next screen. We should wait for the installation to complete.
Step 10) In the final screen, do not check the checkboxes and then click on “Finish”.
Note: Once the installation is completed, we can find a new folder by the name “Anaconda3(64-bit)” created
in Window 10 applications which can be seen by pressing Windows “Start” button. When we click on this
folder, we can find several icons including “Jupyter Notebook” and “Spyder”.
Step 5) We can enter code in the next cell and so on. In this manner, we can run the program as blocks of
code, one block at a time. When input is required, it will wait for your input to enter, as shown in the
following screen. The blue box around the cell indicates command mode.
Step 6) Type the program in the cells and run each cell to see the results produced by each cell.
Note: To save the program, click on Floppy symbol below the “File” menu. Click on “Insert” to insert a new
cell either above or below the current cell. The programs in Jupyter are saved with the extension “.ipynb”
which indicates Interactive Python Notebook file. This file stores the program and other contents in the form
of JSON (JavaScript Object Notation). Click on ‘Logout’ to terminate Jupyter. Then close the server window
also.
Step 7) To reopen the program, first enter into Jupyter Notebook Home Page. In the “Files” tab, find out the
program named “first.ipynb” and click on it to open it in another page.
Step 8) Similarly, to delete the file, first select it and then click on the Delete Bin symbol.
RUNNING A PYTHON PROGRAM
Running a Python program can be done from 3 environments: 1. Command line window 2. IDLE graphics
window 3. System prompt
In IDLE window, click on help -> ‘Python Docs’ or F1 button to get documentation help.
Save a Python program in IDLE and reopen it and run it.
COMMENTS (2 types)
# single line comments
“”” or ‘’’ multi line comments
Docstrings
If we write strings inside “”” or ‘’’ and if these strings are written as first statements in a module, function,
class or a method, then these strings are called documentation strings or docstrings. These docstrings are
useful to create an API documentation file from a Python program. An API (Application Programming
Interface) documentation file is a text file or html file that contains description of all the features of a
software, language or a product.
DATATYPES
A datatype represents the type of data stored into a variable (or memory).
Built-in datatypes
The built-in datatypes are of 5 types:
• None Type
• Numeric types
• Sequences
• Sets
• Mappings
NOTE:
Binary numbers are represented by a prefix 0b or 0B. Ex: 0b10011001
Hexadecimal numbers are represented by a prefix 0x or 0X. Ex: 0X11f9c
Octal numbers are represented by a prefix 0o or 0O. Ex: 0o145.
bool type: represents any of the two boolean values, True or False.
Ex: a = 10>5 # here a is treated as bool type variable.
print(a) #displays True
NOTE:
1. To convert a float number into integer, we can use int() function. Ex: int(num)
2. To convert an integer into float, we can use float() function.
3. bin() converts a number into binary. Ex: bin(num)
4. oct() converts a number into octal.
5. hex() converts a number into hexadecimal.
STRINGS
str datatype: represents string datatype. A string is enclosed in single quotes or double quotes.
Ex: s1 = “Welcome”
s2 = ‘Welcome’
A string occupying multiple lines can be inserted into triple single quotes or triple double quotes.
Ex: s1 = ‘’’ This is a special training on
Python programming that
gives insights into Python language.
‘’’
To display a string with single quotes.
Ex: s2 = “””This is a book ‘on Core Python’ programming”””
To find length of a string, use len() function.
Ex: s3 = ‘Core Python’
n = len(s3)
print(n) -> 11
We can do indexing, slicing and repetition of strings.
Ex: s = “Welcome to Core Python”
print(s) -> Welcome to Core Python
print(s[0]) -> W
print(s[0:7]) -> Welcome
print(s[:7]) -> Welcome
print(s[1:7:2]) -> ecm
print(s[-1] -> n
print(s[-3:-1]) -> ho
print(s[1]*3) -> eee
print(s*2) ->Welcome to CorePython Welcome to CorePython
Remove spaces using rstrip(), lstrip(), strip() methods.
Ex: name = “ Vijay Kumar “
print(name.strip())
We can find substring position in a string using find() method. It returns -1 if not found.
CHARACTERS
There is no datatype to represent a single character in Python. Characters are part of str datatype.
Ex:
str = "Hello"
print(str[0])
H
for i in str: print(i)
H
e
l
l
o
bytearray datatype: same as bytes type but its elements can be modified.
arr = [10,20,55,100,99]
x=bytearray(arr)
x[0]=11
x[1]=21
for i in x: print(i)
11
21
55
100
99
NOTE:
We can do only indexing in case of bytes or bytearray datatypes. We cannot do slicing or repetitions.
LISTS
A list is similar to an array that can store a group of elements. A list can store different types of elements and
can grow dynamically in memory. A list is represented by square braces [ ]. List elements can be modified.
Ex:
lst = [10, 20, 'Ajay', -99.5]
print(lst[2])
Ajay
To create an empty list.
lst = [] # then we can append elements to this list as lst.append(‘Vinay’)
NOTE:
Indexing, slicing and repetition are possible on lists.
print(lst[1])
20
print(lst[-3:-1])
[20, 'Ajay']
lst = lst*2
print(lst)
[10, 20, 'Ajay', -99.5, 10, 20, 'Ajay', -99.5]
We can use len() function to find the no. of elements in the list.
n = len(lst) -> 4
del() function is for deleting an element at a particular position.
del(lst[1]) -> deletes 20
remove() will remove a particular element. clear() wil delete all elements from the list.
lst.remove(‘Ajay’)
lst.clear()
We can update the list elements by assignment.
lst[0] = ‘Vinod’
lst[1:3] = 10, 15
43
max() and min() functions return the biggest and smallest elements.
max(lst)
min(lst)
TUPLES
A tuple is similar to a list but its elements cannot be modified. A tuple is represented by parentheses ( ).
Indexing, slicing and repetition are possible on tuples also.
Ex:
tpl=( ) # creates an empty tuple
tpl=(10, ) # with only one element – comma needed after the element
tpl = (10, 20, -30, "Raju")
print(tpl)
(10, 20, -30, 'Raju')
tpl[0]=-11 # error
print(tpl[0:2])
(10, 20)
tpl = tpl*2
print(tpl)
(10, 20, -30, 'Raju', 10, 20, -30, 'Raju')
NOTE: len(), count(), index(), max(), min() functions are same in case of tuples also.
We cannot use append(), extend(), insert(), remove(), clear() methods on tuples.
To sort the elements of a tuple, we can use sorted() method.
sorted(tpl) # sorts all elements into ascending order
sorted(tpl, reverse=True) # sorts all elements into descending order
To convert a list into tuple, we can use tuple() method.
tpl = tuple(lst)
RANGE DATATYPE
range represents a sequence of numbers. The numbers in the range cannot be modified. Generally, range is
used to repeat a for loop for a specified number of times.
Ex: we can create a range object that stores from 0 to 4 as:
r = range(5)
print(r[0]) -> 0
for i in r: print(i)
0
1
2
3
4
Ex: we can also mention step value as:
r = range(0, 10, 2)
for i in r: print(i)
0
2
4
6
8
r1 = range(50, 40, -2)
for i in r1: print(i)
50
48
46
44
42
SETS
A set datatype represents unordered collection of elements. A set does not accept duplicate elements where
as a list accepts duplicate elements. A set is written using curly braces { }. Its elements can be modified.
s = {1, 2, 3, "Vijaya"}
print(s)
{1, 2, 3, 'Vijaya'}
NOTE: Indexing, slicing and repetition are not allowed in case of a set.
To add elements into a set, we should use update() method as:
s.update([4, 5])
print(s)
{1, 2, 3, 4, 5, 'Vijaya'}
To remove elements from a set, we can use remove() method as:
s.remove(5)
print(s)
{1, 2, 3, 4, 'Vijaya'}
A frozenset datatype is same as set type but its elements cannot be modified.
Ex:
s = {1, 2, -1, 'Akhil'} -> this is a set
s1 = frozenset(s) -> convert it into frozenset
for i in s1: print(i)
1
2
Akhil
-1
NOTE: update() or remove() methods will not work on frozenset.
MAPPING DATATYPES
A map indicates elements in the form of key – value pairs. When key is given, we can retrieve the associated
value. A dict datatype (dictionary) is an example for a ‘map’.
d = {10: 'kamal', 11:'Subbu', 12:'Sanjana'}
print(d)
{10: 'kamal', 11: 'Subbu', 12: 'Sanjana'}
keys() method gives keys and values() method returns values from a dictionary.
k = d.keys()
for i in k: print(i)
10
11
12
for i in d.values(): print(i)
kamal
Subbu
Sanjana
To display value upon giving key, we can use as:
Ex: d = {10: 'kamal', 11:'Subbu', 12:'Sanjana'}
d[10] gives ‘kamal’
To create an empty dictionary, we can use as:
d = {}
Later, we can store the key and values into d, as:
d[10] = ‘Kamal’
d[11] = ‘Pranav’
We can update the value of a key, as: d[key] = newvalue.
Ex: d[10] = ‘Subhash’
We can delete a key and corresponding value, using del function.
Ex: del d[11] will delete a key with 11 and its value also.
PYTHON AUTOMATICALLY KNOWS ABOUT THE DATATYPE
The datatype of the variable is decided depending on the value assigned. To know the datatype of the
variable, we can use type() function.
Ex:
x = 15 #int type
print(type(x))
<class 'int'>
x = 'A' #str type
print(type(x))
<class 'str'>
x = 1.5 #float tye
print(type(x))
<class 'float'>
x = "Hello" #str type
print(type(x))
<class 'str'>
x = [1,2,3,4]
print(type(x))
<class 'list'>
x = (1,2,3,4)
print(type(x))
<class 'tuple'>
x = {1,2,3,4}
print(type(x))
<class 'set'>
Literals in Python
A literal is a constant value that is stored into a variable in a program.
a = 15
Here, ‘a’ is the variable into which the constant value ‘15’ is stored. Hence, the value 15 is called ‘literal’.
Since 15 indicates integer value, it is called ‘integer literal’.
Ex: a = ‘Srinu’ → here ‘Srinu’ is called string literal.
Ex: a = True → here, True is called Boolean type literal.
User-defined datatypes
The datatypes which are created by the programmers are called ‘user-defined’ datatypes. For example, an
array, a class, or a module is user-defined datatypes. We will discuss about these datatypes in the later
chapters.
Constants in Python
A constant is similar to variable but its value cannot be modified or changed in the course of the program
execution. For example, pi value 22/7 is a constant. Constants are written using caps as PI.
Assignment operators
To assign right side value to a left side variable.
Operator Example Meaning
= z = x+y Assignment operator. i.e. x+y
is stored into z.
+= z+=x Addition assignment
operator. i.e. z = z+x.
-= z-=x Subtraction assignment
operator. i.e. z = z-x.
*= z*=x Multiplication assignment
operator. i.e. z = z *x.
/= z/=x Division assignment operator.
i.e. z = z/x.
%= z%=x Modulus assignment
operator. i.e. z = z%x.
**= z**=y Exponentiation assignment
operator. i.e. z = z**y.
//= z//=y Floor division assignment
operator. i.e. z = z// y.
Ex:
a=b=c=5
print(a,b,c)
555
a,b,c=1,2,'Hello'
print(a,b,c)
1 2 Hello
x = [10,11,12]
a,b,c = 1.5, x, -1
print(a,b,c)
1.5 [10, 11, 12] -1
Ex:
1<2<3<4 will give True
1<2>3<4 will give False
Logical operators
Logical operators are useful to construct compound conditions. A compound condition is a combination of
more than one simple condition. 0 is False, any other number is True.
X=1, y=2
Operator Example Meaning Result
and x and y And operator. If x is False, it returns x, otherwise 2
it returns y.
or x or y Or operator. If x is False, it returns y, otherwise it 1
returns x.
not not x Not operator. If x is False, it returns True. If x is False
True it returns False.
Ex:
x=1; y=2; z=3
if(x<y or y>z):
print('Yes')
else:
print('No') -> displays Yes
Boolean operators
Boolean operators act upon ‘bool’ type values and they provide ‘bool’ type result. So the result will be again
either True or False.
x = True, y = False
Operator Example Meaning Result
and x and y Boolean and operator. False
If both x and y are
True, then it returns
True, otherwise
False.
or x or y Boolean or operator. True
If either x or y is
True, then it returns
True, else False.
not not x Boolean not operator. False
If x is True, it returns
False, else True.
INPUT AND OUTPUT
print() function for output
Example Output
print() Blank line
print(“Hai”) Hai
print(“This is the \nfirst line”) This is the
first line
print(“This is the \\nfirst line”) This is the \nfirst line
print(‘Hai’*3) HaiHaiHai
print(‘City=’+”Hyderabad”) City=Hyderabad
print(a, b) 12
print(a, b, sep=”,”) 1,2
print(a, b, sep=’-----‘) 1-----2
print("Hello") Hello
print("Dear") Dear
print("Hello", end='') HelloDear
print("Dear", end='')
a=2 You typed 2 as input
print('You typed ', a, 'as input')
%i, %f, %c, %s can be used as format strings. Hai Linda Your salary is 12000.5
name='Linda'; sal=12000.50 Hai Linda, Your salary is 12000.50
print('Hai', name, 'Your salary is', sal)
print('Hai %s, Your salary is %.2f' % (name,
sal))
print('Hai {}, Your salary is {}'.format(name, Hai Linda, Your salary is 12000.5
sal)) Hai Linda, Your salary is 12000.5
print('Hai {0}, Your salary is Hai 12000.5, Your salary is Linda
{1}'.format(name, sal))
print('Hai {1}, Your salary is
{0}'.format(name, sal))
Positional arguments
These are the arguments passed to a function in correct positional order. Here, the number of arguments and
their positions in the function definition should match exactly with the number and position of the argument
in the function call
def attach(s1, s2): # function definition
attach('New', 'York') # positional arguments
Keyword arguments
Keyword arguments are arguments that identify the parameters by their names.
def grocery(item, price): # function definition
grocery(item='Sugar', price=50.75) # key word arguments
Default arguments
We can mention some default value for the function parameters in the definition.
def grocery(item, price=40.00): # default argument is price
grocery(item='Sugar') # default value for price is used
ARRAYS
To work with arrays, we use numpy (numerical python) package.
For complete help on numpy: https://docs.scipy.org/doc/numpy/reference/
An array is an object that stores a group of elements (or values) of same datatype. Array elements should be
of same datatype. Arrays can increase or decrease their size dynamically.
NOTE: We can use for loops to display the individual elements of the array.
To work with numpy, we should import that module, as:
import numpy
import numpy as np
from numpy import *
Single dimensional (or 1D ) arrays
A 1D array contains one row or one column of elements. For example, the marks of a student in 5 subjects.
Creating single dimensional arrays
Creating arrays in numpy can be done in several ways. Some of the important ways are:
• Using array() function
• Using linspace() function
• Using logspace() function
• Using arange() function
• Using zeros() and ones() functions.
Ex:
numpy.sort(arr)
numpy.max(arr)
numpy.sqrt(arr)
Aliasing the arrays
If ‘a’ is an array, we can assign it to ‘b’, as:
b=a
This is a simple assignment that does not make any new copy of the array ‘a’. It means, ‘b’ is not a new
array and memory is not allocated to ‘b’. Also, elements from ‘a’ are not copied into ‘b’ since there is no
memory for ‘b’. Then how to understand this assignment statement? We should understand that we are
giving a new name ‘b’ to the same array referred by ‘a’. It means the names ‘a’ and ‘b’ are referencing same
array. This is called ‘aliasing’.
‘Aliasing’ is not ‘copying’. Aliasing means giving another name to the existing object. Hence, any
modifications to the alias object will reflect in the existing object and vice versa.
Viewing and Copying arrays
We can create another array that is same as an existing array. This is done by view() method. This method
creates a copy of an existing array such that the new array will also contain the same elements found in the
existing array. The original array and the newly created arrays will share different memory locations. If the
newly created array is modified, the original array will also be modified since the elements in both the arrays
will be like mirror images.
We can create a view of ‘a’ as:
b = a.view()
Viewing is nothing but copying only. But it is called ‘shallow copying’ as the elements in the view when
modified will also modify the elements in the original array. So, both the arrays will act as one and the same.
Suppose we want both the arrays to be independent and modifying one array should not affect another array,
we should go for ‘deep copying’. This is done with the help of copy() method. This method makes a
complete copy of an existing array and its elements. When the newly created array is modified, it will not
affect the existing array or vice versa. There will not be any connection between the elements of the two
arrays.
We can create a copy of ’a’ as:
b = a.copy()
Multi-dimensional arrays (2D, 3D, etc)
They represent more than one row and more than one column of elements. For example, marks obtained by a
group of students each in five subjects.
Creating multi-dimensional arrays
We can create multi dimensional arrays in the following ways:
• Using array() function
• Using ones() and zeroes() functions
• Using eye() function
• Using reshape() function discussed earlier
INTRODUCTION TO OOPS
Features of OOPS
1. classes and objects
2. encapsulation
3. abstraction
4. inheritance
5. polymorphism
Self variable
‘self’ is a default variable that contains the memory address of the instance of the current class. So, we can
use ‘self’ to refer to all the instance variables and instance methods.
Constructor
A constructor is a special method that is used to initialize the instance variables of a class. In the constructor,
we create the instance variables and initialize them with some starting values. The first parameter of the
constructor will be ‘self’ variable that contains the memory address of the instance.
A constructor may or may not have parameters.
Ex:
def __init__(self): # default constructor
self.name = ‘Vishnu’
self.marks = 900
Ex:
def __init__(self, n = ‘’, m=0): # parameterized constructor with 2 parameters
self.name = n
self.marks = m
Types of variables
The variables which are written inside a class are of 2 types:
• Instance variables
• Class variables or Static variables
Instance variables are the variables whose separate copy is created in every instance (or object). Instance
variables are defined and initialized using a constructor with ‘self’ parameter. Also, to access instance
variables, we need instance methods with ‘self’ as first parameter. Instance variables can be accessed as:
obj.var
Unlike instance variables, class variables are the variables whose single copy is available to all the instances
of the class. If we modify the copy of class variable in an instance, it will modify all the copies in the other
instances. A class method contains first parameter by default as ‘cls’ with which we can access the class
variables. For example, to refer to the class variable ‘x’, we can use ‘cls.x’.
NOTE: class variables are also called ‘static variables’. class methods are marked with the decorator
@classmethod .
NOTE: instance variables can be accessed as: obj.var or classname.var
Namespaces
A namespace represents a memory block where names are mapped (or linked) to objects. A class maintains
its own namespace, called ‘class namespace’. In the class namespace, the names are mapped to class
variables. Similarly, every instance (object) will have its own name space, called ‘instance namespace’. In
the instance namespace, the names are mapped to instance variables.
When we modify a class variable in the class namespace, its modified value is available to all instances.
When we modify a class variable in the instance namespace, then it is confined to only that instance. Its
modified value will not be available to other instances.
Types of methods
By this time, we got some knowledge about the methods written in a class. The purpose of a method is to
process the variables provided in the class or in the method. We already know that the variables declared in
the class are called class variables (or static variables) and the variables declared in the constructor are called
instance variables. We can classify the methods in the following 3 types:
Instance methods
(a) Accessor methods
(b) Mutator methods
Class methods
Static methods
An instance method acts on instance variables. There are two types of methods.
1. Accessor methods: They read the instance vars. They do not modify them. They are also called getter()
methods.
2. Mutator methods: They not only read but also modify the instance vars. They are also called setter()
methods.
PROGRAMS
4. Create getter and setter methods for a Manager with name and salary instance variables.
Static methods
We need static methods when the processing is at class level but we need not involve the class or instances.
Static methods are used when some processing is related to the class but does not need the class or its
instances to perform any work. For example, setting environmental variables, counting the number of
instances of the class or changing an attribute in another class etc. are the tasks related to a class. Such tasks
are handled by static methods. Static methods are written with a decorator @staticmethod above them. Static
methods are called in the form of classname.method().
Inner classes
Writing a class within another class is called inner class or nested class. For example, if we write class B
inside class A, then B is called inner class or nested class. Inner classes are useful when we want to sub
group the data of a class.
Encapsulation
Bundling up of data and methods as a single unit is called ‘encapsulation’. A class is an example for
encapsulation.
Abstraction
Hiding unnecessary data from the user is called ‘abstraction’. By default all the members of a class are
‘public’ in Python. So they are available outside the class. To make a variable private, we use double
underscore before the variable. Then it cannot be accessed from outside of the class. To access it from
outside the class, we should use: obj._Classname__var. This is called name mangling.
Inheritance
Creating new classes from existing classes in such a way that all the features of the existing classes are
available to the newly created classes – is called ‘inheritance’. The existing class is called ‘base class’ or
‘super class’. The newly created class is called ‘sub class’ or ‘derived class’.
Sub class object contains a copy of the super class object. The advantage of inheritance is ‘reusability’ of
code. This increases the overall performance of the organization.
Syntax: class Subclass(Baseclass):
Constructors in inheritance
In the previous programs, we have inherited the Student class from the Teacher class. All the methods and
the variables in those methods of the Teacher class (base class) are accessible to the Student class (sub
class). The constructors of the base class are also accessible to the sub class.
When the programmer writes a constructor in the sub class, then only the sub class constructor will get
executed. In this case, super class constructor is not executed. That means, the sub class constructor is
replacing the super class constructor. This is called constructor overriding.
super() method
super() is a built-in method which is useful to call the super class constructor or methods from the sub class.
super().__init__() # call super class constructor
super().__init__(arguments) # call super class constructor and pass arguments
super().method() # call super class method
Types of inheritance
There are two types:
1. Single inheritance: deriving sub class from a single super class.
Syntax: class Subclass(Baseclass):
2. Multiple inheritance: deriving sub class from more than one super class.
Syntax: class Subclass(Baseclass1, Baseclass2, … ):
NOTE: ‘object’ is the super class for all classes in Python.
Polymorphism
poly + morphos = many + forms
If something exists in various forms, it is called ‘Polymorphism’. If an operator or method performs various
tasks, it is called polymorphism.
Ex:
Duck typing: Calling a method on any object without knowing the type (class) of the object.
Operator overloading: same operator performing more than one task.
Method overloading: same method performing more than one task.
Method overriding: executing only sub class method in the place of super class method.
ABSTRACT CLASSES AND INTERFACES
An abstract method is a method whose action is redefined in the sub classes as per the requirement of the
objects. Generally abstract methods are written without body since their body will be defined in the sub
classes
anyhow. But it is possible to write an abstract method with body also. To mark a method as abstract, we
should use the decorator @abstractmethod. On the other hand, a concrete method is a method with body.
An abstract class is a class that generally contains some abstract methods. PVM cannot create objects to an
abstract class.
Once an abstract class is written, we should create sub classes and all the abstract methods should be
implemented (body should be written) in the sub classes. Then, it is possible to create objects to the sub
classes.
A meta class is a class that defines the behavior of other classes. Any abstract class should be derived from
the meta class ABC that belongs to ‘abc’ module. So import this module, as:
from abc import ABC, abstractmethod
(or) from abc import *
Interfaces in Python
We learned that an abstract class is a class which contains some abstract methods as well as concrete
methods also. Imagine there is a class that contains only abstract methods and there are no concrete methods.
It becomes an interface. This means an interface is an abstract class but it contains only abstract methods.
None of the methods in the interface will have body. Only method headers will be written in the interface.
So an interface can be defined as a specification of method headers. Since, we write only abstract methods in
the interface, there is possibility for providing different implementations (body) for those abstract methods
depending on the requirements of objects. In Python, we have to use abstract classes as interfaces.
Since an interface contains methods without body, it is not possible to create objects to an interface. In this
case, we can create sub classes where we can implement all the methods of the interface. Since the sub
classes will have all the methods with body, it is possible to create objects to the sub classes. The flexibility
lies in the fact that every sub class can provide its own implementation for the abstract methods of the
interface.
EXCEPTIONS
An exception is a runtime error which can be handled by the programmer. That means if the programmer can
guess an error in the program and he can do something to eliminate the harm caused by that error, then it is
called an ‘exception’. If the programmer cannot do anything in case of an error, then it is called an ‘error’
and not an exception.
All exceptions are represented as classes in Python. The exceptions which are already available in Python
are called ‘built-in’ exceptions. The base class for all built-in exceptions is ‘BaseException’ class. From
BaseException class, the sub class ‘Exception’ is derived. From Exception class, the sub classes
‘StandardError’ and ‘Warning’ are derived.
All errors (or exceptions) are defined as sub classes of StandardError. An error should be compulsorily
handled otherwise the program will not execute. Similarly, all warnings are derived as sub classes from
‘Warning’ class. A
warning represents a caution and even though it is not handled, the program will execute. So, warnings can
be neglected but errors cannot be neglected.
Just like the exceptions which are already available in Python language, a programmer can also create his
own exceptions, called ‘user-defined’ exceptions. When the programmer wants to create his own exception
class, he should derive his class from ‘Exception’ class and not from ‘BaseException’ class. In the Figure,
we are showing important classes available in Exception hierarchy.
Exception handling
The purpose of handling the errors is to make the program robust. The word ‘robust’ means ‘strong’. A
robust program does not terminate in the middle. Also, when there is an error in the program, it will display
an appropriate message to the user and continue execution. Designing the programs in this way is needed in
any software development. To handle exceptions, the programmer should perform the following 3 tasks:
Step 1: The programmer should observe the statements in his program where there may be a possibility of
exceptions. Such statements should be written inside a ‘try’ block. A try block looks like as follows:
try:
statements
The greatness of try block is that even if some exception arises inside it, the program will not be terminated.
When PVM understands that there is an exception, it jumps into an ‘except’ block.
Step 2: The programmer should write the ‘except’ block where he should display the exception details to the
user. This helps the user to understand that there is some error in the program. The programmer should also
display a message regarding what can be done to avoid this error. Except block looks like as follows:
except exceptionname:
statements # these statements form handler
The statements written inside an except block are called ‘handlers’ since they handle the situation when the
exception occurs.
Step 3: Lastly, the programmer should perform clean up actions like closing the files and terminating any
other processes which are running. The programmer should write this code in the finally block. Finally block
looks like as follows:
finally:
statements
The specialty of finally block is that the statements inside the finally block are executed irrespective of
whether there is an exception or not. This ensures that all the opened files are properly closed and all the
running processes are properly terminated. So, the data in the files will not be corrupted and the user is at the
safe-side.
However, the complete exception handling syntax will be in the following format:
try:
statements
except Exception1:
handler1
except Exception2:
handler2
else:
statements
finally:
statements
‘try’ block contains the statements where there may be one or more exceptions. The subsequent ‘except’
blocks handle these exceptions. When ‘Exception1’ occurs, ‘handler1’ statements are executed. When
‘Exception2’ occurs, ‘hanlder2’ statements are executed and so forth. If no exception is raised, the
statements inside the ‘else’ block are executed. Even if the exception occurs or does not occur, the code
inside ‘finally’ block is always executed. The following points are noteworthy:
FILES IN PYTHON
A file represents storage of data. A file stores data permanently so that it is available to all the programs.
Types of files in Python
In Python, there are 2 types of files. They are:
• Text files
• Binary files
Text files store the data in the form of characters. For example, if we store employee name “Ganesh”, it will
be stored as 6 characters and the employee salary 8900.75 is stored as 7 characters. Normally, text files are
used to store characters or strings.
Binary files store entire data in the form of bytes, i.e. a group of 8 bits each. For example, a character is
stored as a byte and an integer is stored in the form of 8 bytes (on a 64 bit machine). When the data is
retrieved from the binary file, the programmer can retrieve the data as bytes. Binary files can be used to store
text, images, audio and video.
Opening a file
We should use open() function to open a file. This function accepts ‘filename’ and ‘open mode’ in which to
open the file.
file handler = open(“file name”, “open mode”, “buffering”)
Ex: f = open(“myfile.txt”, “w”)
Here, the ‘file name’ represents a name on which the data is stored. We can use any name to reflect the
actual data. For example, we can use ‘empdata’ as file name to represent the employee data. The file ‘open
mode’ represents the purpose of opening the file. The following table specifies the file open modes and their
meanings.
File open mode Description
w To write data into file. If any data is
already present in the file, it would be
deleted and the present data will be
stored.
r To read data from the file. The file
pointer is positioned at the beginning
of the file.
a To append data to the file. Appending
means adding at the end of existing
data. The file pointer is placed at the
end of the file. If the file does not
exist, it will create a new file for
writing data.
w+ To write and read data of a file. The
previous data in the file will be
deleted.
r+ To read and write data into a file. The
previous data in the file will not be
deleted. The file pointer is placed at
the beginning of the file.
a+ To append and read data of a file. The
file pointer will be at the end of the
file if the file exists. If the file does
not exist, it creates a new file for
reading and writing.
x Open the file in exclusive creation
mode. The file creation fails if the file
already exists.
The above Table represents file open modes for text files. If we attach ‘b’ for them, they represent modes for
binary files. For example, wb, rb, ab, w+b, r+b, a+b are the modes for binary files.
A buffer represents a temporary block of memory. ‘buffering’ is an optional integer used to set the size of
the buffer for the file. If we do not mention any buffering integer, then the default buffer size used is 4096 or
8192 bytes.
Closing a file
A file which is opened should be closed using close() method as:
f.close()
Files with characters
To write a group of characters (string), we use: f.write(str)
To read a group of characters (string), we use: str = f.read()
PROGRAMS
25. Create a file and store a group of chars.
26. Read the chars from the file.
Files with strings
To write a group of strings into a file, we need a loop that repeats: f.write(str+”\n”)
To read all strings from a file, we can use: str = f.read()
Knowing whether a file exists or not
The operating system (os) module has a sub module by the name ‘path’ that contains a method isfile(). This
method can be used to know whether a file that we are opening really exists or not. For example,
os.path.isfile(fname) gives True if the file exists otherwise False. We can use it as:
if os.path.isfile(fname): # if file exists,
f = open(fname, 'r') # open it
else:
print(fname+' does not exist')
sys.exit() # terminate the program
with statement
‘with’ statement can be used while opening a file. The advantage of with statement is that it will take care of
closing a file which is opened by it. Hence, we need not close the file explicitly. In case of an exception also,
‘with’ statement will close the file before the exception is handled. The format of using ‘with’ is:
with open(“filename”, “openmode”) as fileobject:
Ex: writing into a flie
# with statement to open a file
with open('sample.txt', 'w') as f:
f.write('Iam a learner\n')
f.write('Python is attractive\n')
Ex: reading from a file
# using with statement to open a file
with open('sample.txt', 'r') as f:
for line in f:
print(line)
Data Science
To work with datascience, we need the following packages to be installed:
C:\> pip install pandas
C:\> pip install xlrd //to extract data from Excel sheets
C:\> pip install matplotlib
Data plays important role in our lives. For example, a chain of hospitals contain data related to medical
reports and prescriptions of their patients. A bank contains thousands of customers’ transaction details. Share
market data represents minute to minute changes in the share values. In this way, the entire world is roaming
around huge data.
Every piece of data is precious as it may affect the business organization which is using that data. So, we
need some mechanism to store that data. Moreover, data may come from various sources. For example in a
business organization, we may get data from Sales department, Purchase department, Production department,
etc. Such data is stored in a system called ‘data warehouse’. We can imagine data warehouse as a central
repository of integrated data from different sources.
Once the data is stored, we should be able to retrieve it based on some pre-requisites. A business company
wants to know about how much amount they spent in the last 6 months on purchasing the raw material or
how many items found defective in their production unit. Such data cannot be easily retrieved from the huge
data available in the data warehouse. We have to retrieve the data as per the needs of the business
organization. This is called data analysis or data analytics where the data that is retrieved will be analyzed to
answer the questions raised by the management of the organization. A person who does data analysis is
called ‘data analyst’.
Once the data is analyzed, it is the duty of the IT professional to present the results in the form of pictures or
graphs so that the management will be able to understand it easily. Such graphs will also help them to
forecast the future of their company. This is called data visualization. The primary goal of data visualization
is to communicate information clearly and efficiently using statistical graphs, plots and diagrams.
Data science is a term used for techniques to extract information from the data warehouse, analyze them and
present the necessary data to the business organization in order to arrive at important conclusions and
decisions. A person who is involved in this work is called ‘data scientist’. We can find important differences
between the roles of data scientist and data analyst in following table:
Data Scientist Data Analyst
Data scientist formulates the questions that Data analyst receives questions from the
will help a business organization and then business team and provides answers to
proceed in solving them. them.
Data scientist will have strong data Data analyst simply analyzes the data and
visualization skills and the ability to provides information requested by the
convert data into a business story. team.
Perfection in mathematics, statistics and Perfection in data warehousing, big data
programming languages like Python and R concepts, SQL and business intelligence is
are needed for a Data scientist. needed for a Data analyst.
A Data scientist estimates the unknown A Data analyst looks at the known data
information from the known data. from a new perspective.
Please see the following sample data in the excel file: empdata.xlsx.
CREATING DATA FRAMES
is possible from csv files, excel files, python dictionaries, tuples list, json data etc.
Creating data frame from .csv file
>>> import pandas as pd
>>> df = pd.read_csv("f:\\python\PANDAS\empdata.csv")
>>> df
empid ename sal doj
0 1001 Ganesh Rao 10000.00 10-10-00
1 1002 Anil Kumar 23000.50 3-20-2002
2 1003 Gaurav Gupta 18000.33 03-03-02
3 1004 Hema Chandra 16500.50 10-09-00
4 1005 Laxmi Prasanna 12000.75 08-10-00
5 1006 Anant Nag 9999.99 09-09-99
Data Wrangling
Data wrangling, also known as data munging, is the process of cleaning, transforming, and organizing raw
data into a format that is suitable for analysis. Python is a popular language for data wrangling due to its
powerful libraries and tools. Below is an overview of the key steps and libraries used in data wrangling with
Python:
Key Steps in Data Wrangling
1. Data Collection: Gather data from various sources (e.g., CSV files, databases, APIs, web scraping).
2. Data Cleaning: Handle missing values, remove duplicates, correct inconsistencies, and fix errors.
3. Data Transformation: Reshape, aggregate, or filter data to make it suitable for analysis.
4. Data Integration: Combine data from multiple sources.
5. Data Validation: Ensure data quality and consistency.
6. Data Export: Save the cleaned and transformed data into a usable format (e.g., CSV, Excel,
database).
Python Libraries for Data Wrangling
1. Pandas: The most widely used library for data manipulation and analysis.
o Key features: DataFrames, handling missing data, merging datasets, reshaping data.
o Example: import pandas as pd
2. NumPy: Used for numerical computations and handling arrays.
o Example: import numpy as np
3. OpenPyXL: For working with Excel files.
o Example: from openpyxl import Workbook
4. SQLAlchemy: For interacting with databases.
o Example: from sqlalchemy import create_engine
5. BeautifulSoup and Requests: For web scraping and collecting data from websites.
o Example: from bs4 import BeautifulSoup, import requests
6. PySpark: For handling large-scale data wrangling tasks in distributed environments.
Data Transformation
# Rename columns
df.rename(columns={'old_name': 'new_name'}, inplace=True)
# Concatenate DataFrames
df_concat = pd.concat([df1, df2], axis=0)
Exporting Data
# Save
Visualizing Data
DATA VISUALIZATION USING MATPLOTLIB
Complete reference is available at:
https://matplotlib.org/api/pyplot_summary.html
CREATE DATAFRAME FROM DICTIONARY
>>> empdata = {"empid": [1001, 1002, 1003, 1004, 1005, 1006],
"ename": ["Ganesh Rao", "Anil Kumar", "Gaurav Gupta", "Hema Chandra", "Laxmi Prasanna", "Anant
Nag"],
"sal": [10000, 23000.50, 18000.33, 16500.50, 12000.75, 9999.99],
"doj": ["10-10-2000", "3-20-2002", "3-3-2002", "9-10-2000", "10-8-2000", "9-9-1999"]}
>>> import pandas as pd
>>> df = pd.DataFrame(empdata)
TAKE ONLY THE COLUMNS TO PLOT
>>> x = df['empid']
>>> y = df['sal']
DRAW THE BAR GRAPH
Bar chart shows data in the form of bars. It is useful for comparing values.
>>> import matplotlib.pyplot as plt
>>> plt.bar(x,y)
<Container object of 6 artists>
>>> plt.xlabel('employee id nos')
Text(0.5,0,'employee id nos')
>>> plt.ylabel('employee salaries')
Text(0,0.5,'employee salaries')
>>> plt.title('XYZ COMPANY')
Text(0.5,1,'XYZ COMPANY')
>>> plt.legend()
>>> plt.show()
Feature Engineering
Feature engineering is the process of creating new features or transforming existing ones to better represent
the underlying problem and improve model performance.
Common Techniques for Feature Engineering
Handling Missing Values:
o Fill with mean, median, or mode.
o Use advanced techniques like KNN imputation.
2. df['column'].fillna(df['column'].mean(), inplace=True)
# Standardization
scaler = StandardScaler()
df['scaled_column'] = scaler.fit_transform(df[['column']])
# Min-Max Scaling
minmax_scaler = MinMaxScaler()
df['scaled_column'] = minmax_scaler.fit_transform(df[['column']])
Binning:
• Convert continuous variables into discrete bins.
• df['binned_column'] = pd.cut(df['continuous_column'], bins=5)
vectorizer = TfidfVectorizer()
tfidf_matrix = vectorizer.fit_transform(df['text_column'])
Feature Selection
Feature selection involves identifying and selecting the most relevant features to improve model
performance and reduce overfitting.
Common Techniques for Feature Selection
1. Filter Methods:
o Use statistical measures to select features.
o Examples: Correlation, Chi-Square, Mutual Information.
# Correlation-based feature selection
correlation_matrix = df.corr()
relevant_features = correlation_matrix['target'].abs().sort_values(ascending=False)
Wrapper Methods:
• Use a machine learning model to evaluate feature subsets.
• Examples: Recursive Feature Elimination (RFE).
from sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
rfe = RFE(model, n_features_to_select=5)
rfe.fit(X, y)
selected_features = X.columns[rfe.support_]
Embedded Methods:
• Features are selected as part of the model training process.
• Examples: Lasso Regression, Decision Trees.
from sklearn.linear_model import Lasso
lasso = Lasso(alpha=0.01)
lasso.fit(X, y)
selected_features = X.columns[lasso.coef_ != 0]
Dimensionality Reduction:
• Reduce the number of features while preserving information.
• Examples: PCA, t-SNE.
from sklearn.decomposition import PCA
pca = PCA(n_components=10)
X_pca = pca.fit_transform(X)
Feature Engineering and Selection Workflow
import pandas as pd
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.feature_selection import SelectKBest, f_classif
from sklearn.model_selection import train_test_split
# Load data
df = pd.read_csv('data.csv')
# Feature Engineering
# Handle missing values
df['column'].fillna(df['column'].mean(), inplace=True)
# One-Hot Encoding
df = pd.get_dummies(df, columns=['category_column'])
# Scaling
scaler = StandardScaler()
df['scaled_column'] = scaler.fit_transform(df[['column']])
# Feature Selection
X = df.drop('target', axis=1)
y = df['target']
Feature Engineering
Feature engineering involves creating new features or transforming existing ones to improve model
performance.
Common Techniques for Feature Engineering
Handling Missing Values:
o Fill missing values with mean, median, mode, or use advanced techniques like KNN
imputation.
Encoding Categorical Variables:
• One-Hot Encoding: Convert categorical variables into binary columns.
• Label Encoding: Convert categories into numerical labels.
Scaling and Normalization:
• Standardization: Scale features to have zero mean and unit variance.
• Min-Max Scaling: Scale features to a specific range (e.g., 0 to 1).
Creating Interaction Features:
• Combine two or more features to create new ones.
Binning:
• Convert continuous variables into discrete bins.
Date/Time Feature Extraction:
• Extract useful information from date/time columns (e.g., day, month, year).
Polynomial Features:
• Create polynomial combinations of features.
Example: Feature Extraction and Engineering Workflow
import pandas as pd
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler, OneHotEncoder
# Load data
df = pd.read_csv('data.csv')
# Feature Engineering
# Handle missing values
df['column'].fillna(df['column'].mean(), inplace=True)
# One-Hot Encoding
df = pd.get_dummies(df, columns=['category_column'])
# Scaling
scaler = StandardScaler()
df['scaled_column'] = scaler.fit_transform(df[['column']])
# Feature Extraction
X = df.drop('target', axis=1)
y = df['target']
Feature Engineering on Numeric Data, Categorical Data, Text Data, & Image Data
Feature engineering is the process of transforming raw data into meaningful features that improve the
performance of machine learning models. The techniques used depend on the type of data (numeric,
categorical, text, or image). Below is a detailed guide on feature engineering for each type of data:
Feature Scaling and Feature Selection are two important preprocessing techniques in machine learning and
data analysis. They play a crucial role in improving model performance, reducing computational complexity,
and ensuring better interpretability of the data.
Feature Scaling
Feature scaling is the process of normalizing or standardizing the range of independent variables (features)
in the dataset. This is particularly important for algorithms that are sensitive to the magnitude of the data,
such as distance-based algorithms or gradient descent-based optimization.
Why is Feature Scaling Important?
• Ensures that all features contribute equally to the model.
• Improves convergence speed for optimization algorithms (e.g., gradient descent).
• Prevents features with larger magnitudes from dominating those with smaller magnitudes.
Common Techniques for Feature Scaling
1. Normalization (Min-Max Scaling):
o Scales features to a fixed range, usually [0, 1].
o Formula: Xscaled=X−XminXmax−XminXscaled=Xmax−XminX−Xmin
o Suitable for algorithms like neural networks and k-nearest neighbors (KNN).
2. Standardization (Z-score Normalization):
o Scales features to have a mean of 0 and a standard deviation of 1.
o Formula: Xscaled=X−μσXscaled=σX−μ, where μμ is the mean and σσ is the standard
deviation.
o Suitable for algorithms like linear regression, logistic regression, and support vector machines
(SVM).
3. Robust Scaling:
o Uses the median and interquartile range (IQR) to scale features, making it robust to outliers.
o Formula: Xscaled=X−medianIQRXscaled=IQRX−median
4. Max Abs Scaling:
o Scales each feature by its maximum absolute value.
o Suitable for sparse data.
Feature Selection
Feature selection is the process of selecting a subset of relevant features (variables) to use in model
construction. It helps reduce overfitting, improve model interpretability, and decrease computational costs.
Why is Feature Selection Important?
• Reduces the dimensionality of the dataset, which can improve model performance.
• Removes irrelevant or redundant features, reducing noise.
• Speeds up training and inference times.
Common Techniques for Feature Selection
1. Filter Methods:
o Select features based on statistical measures (e.g., correlation, mutual information, chi-
square).
o Examples:
▪ Correlation coefficient for linear relationships.
▪ Mutual information for non-linear relationships.
▪ Chi-square test for categorical features.
2. Wrapper Methods:
o Use a machine learning model to evaluate the performance of subsets of features.
o Examples:
▪ Forward Selection: Start with no features and add one at a time.
▪ Backward Elimination: Start with all features and remove one at a time.
▪ Recursive Feature Elimination (RFE): Recursively removes the least important
features.
3. Embedded Methods:
o Perform feature selection as part of the model training process.
o Examples:
▪ Lasso (L1 regularization): Penalizes less important features by shrinking their
coefficients to zero.
▪ Ridge (L2 regularization): Reduces the impact of less important features but does not
eliminate them.
▪ Tree-based methods: Feature importance scores from decision trees, random forests,
or gradient boosting.
4. Dimensionality Reduction:
o Transform features into a lower-dimensional space.
o Examples:
▪ Principal Component Analysis (PCA): Reduces dimensions while preserving variance.
▪ Linear Discriminant Analysis (LDA): Reduces dimensions while preserving class
separability.
▪ t-SNE and UMAP: Non-linear dimensionality reduction for visualization.
Unit-3