0% found this document useful (0 votes)
2 views47 pages

Python Unit 5.Notes

The document provides an overview of scientific computing using Python libraries such as SciPy and NumPy, focusing on numerical routines, data manipulation, and analysis. It covers topics like linear algebra, creating and manipulating arrays, arithmetic operations, and slicing techniques. Additionally, it highlights the use of Python's mathematical libraries for solving equations and performing matrix operations.

Uploaded by

chithra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views47 pages

Python Unit 5.Notes

The document provides an overview of scientific computing using Python libraries such as SciPy and NumPy, focusing on numerical routines, data manipulation, and analysis. It covers topics like linear algebra, creating and manipulating arrays, arithmetic operations, and slicing techniques. Additionally, it highlights the use of Python's mathematical libraries for solving equations and performing matrix operations.

Uploaded by

chithra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 47

ACHARIYA

COLLEGE OF ENGINEERING TECHNOLOGY


(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

SCIENTIFIC COMPUTING: Numerical Routines. SciPy and NumPy - Basics, Creating


arrays, Arithmetic, Slicing, Matrix Operations, Special Functions, Random Numbers,
Linear Algebra, Solving Nonlinear Equations, Numerical Integration, Solving ODEs.
Data Manipulation and Analysis – Pandas: Reading Data from Files Using Pandas, Data
Structures: Series and Data Frame, Extracting Information from a Data Frame,
Grouping, and Aggregation.

Numerical Routines: SciPy and NumPy


SciPy is a Python library of mathematical routines. Many of the SciPy routines are Python
“wrappers”, that is, Python routines that provide a Python interface for numerical libraries and
routines originally written in Fortran, C, or C++. Thus, SciPy lets you take advantage of the decades
of work that has gone into creating and optimizing numerical routines for science and engineering.
Because the Fortran, C, or C++ code that Python accesses is compiled, these routines typically run
very fast. Therefore, there is no real downside—no speed penalty—for using Python in these cases.
We have already encountered one of SciPy’s routines, scipy.optimize.leastsq, for fitting nonlinear
functions to experimental data, which was introduced in the the chapter on Curve Fitting. Here we
will provide a further introduction to a number of other SciPy packages, in particular those on special
functions, numerical integration, including routines for numerically solving ordinary differential
equations (ODEs), discrete Fourier transforms, linear algebra, and solving non-linear equations. Our
introduction to these capabilities does not include extensive background on the numerical methods
employed; that is a topic for another text. Here we simply introduce the SciPy routines for performing
some of the more frequently required numerical tasks.
One final note: SciPy makes extensive use of NumPy arrays, so NumPy should always be imported
with SciPy

Linear algebra
Python’s mathematical libraries, NumPy and SciPy, have extensive tools for numerically solving
problems in linear algebra. Here we focus on two problems that arise commonly in scientific and
engineering settings: (1) solving a system of linear equations and (2) eigenvalue problems. In
addition, we also show how to perform a number of other basic computations, such as finding the
determinant of a matrix, matrix inversion, and decomposition. The SciPy package for linear
algebra is called scipy.linalg.

1. Basic computations in linear algebra


SciPy has a number of routines for performing basic operations with matrices. The determinant of a
matrix is computed using the scipy.linalg.det function:
In [1]: import scipy.linalg
In [2]: a = array([[-2, 3], [4, 5]])
In [3]: a
Out[4]: array([[-2, 3],
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

[ 4, 5]])

In [5]: scipy.linalg.det(a)
Out[5]: -22.0
The inverse of a matrix is computed using the scipy.linalg.inv function, while the product of two
matrices is calculated using the NumPy dot function:
In [6]: b = scipy.linalg.inv(a)

In [6]: b
Out[6]: array([[-0.22727273, 0.13636364],
[ 0.18181818, 0.09090909]])

In [7]: dot(a,b)
Out[7]: array([[ 1., 0.],
[ 0., 1.]])

2. Solving systems of linear equations


Solving systems of equations is nearly as simple as constructing a coefficient matrix and a column
vector. Suppose you have the following system of linear equations to solve:

The first task is to recast this set of equations as a matrix equation of the form . In this case,
we have:

Next we construct the array and vector as NumPy arrays:


In [8]: A = array([[2, 4, 6], [1, -3, -9], [8, 5, -7]])
In [9]: b = array([4, -11, 2])
Finally we use the SciPy function scipy.linalg.solve to find , , and .
In [10]: scipy.linalg.solve(A,b)
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

Out[10]: array([ -8.91304348, 10.2173913 , -3.17391304])

which gives the results: , , and .


Of course, you can get the same answer by noting that . Following this approach, we can
use the scipy.linalg.inv introduced in the previous section:
Ainv = scipy.linalg.inv(A)

In [10]: dot(Ainv, b)
Out[10]: array([ -8.91304348, 10.2173913 , -3.17391304])
which is the same answer we obtained using scipy.linalg.solve. Using scipy.linalg.solve is numerically
more stable and a faster than using , so it is the preferred method for solving systems of
equations.
You might wonder what happens if the system of equations are not all linearly independent. For
example if the matrix is given by

where the third row is a multiple of the first row. Let’s try it out and see what happens. First we
change the bottom row of the matrix and then try to solve the system as we did before.
In [11]: A[2] = array([1, 2, 3])

In [12]: A
Out[12]: array([[ 2, 4, 6],
[ 1, -3, -9],
[ 1, 2, 3]])

In [13]: scipy.linalg.solve(A,b)
LinAlgError: Singular matrix

In [14]: Ainv = scipy.linalg.inv(A)


LinAlgError: Singular matrix
Whether we use scipy.linalg.solve or scipy.linalg.inv, SciPy raises an error because the matrix is
singular.
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

Basics of Creating Arrays: Definition & Syntax


Definition:
An array is a collection of elements (values or variables), each identified by an index or a key. Arrays
are used to store multiple values in a single variable, instead of declaring separate variables for each
value.
Arrays are especially useful when you want to work with lists of data, like a series of numbers,
strings, etc.

In Python:
Python doesn't have built-in array data structures like some other languages (e.g., C or Java), but it
uses:
1. Lists (most commonly used like arrays)
2. Array module (for arrays of uniform type)
3. NumPy arrays (for numerical computations)

1. Python List (used like an array)


Syntax:
my_list = [10, 20, 30, 40]
 my_list[0] → 10 (access first element)
 You can store mixed data types in a list: ["apple", 3.14, 7]

2. Array using Python's array module


Syntax:
import array
my_array = array.array('i', [1, 2, 3, 4])
 'i' stands for integer type code.
 All elements must be of the same type.
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

3. NumPy Array (for scientific computing)


Syntax:
import numpy as np
my_array = np.array([1, 2, 3, 4])
 Supports multi-dimensional arrays.
 Offers fast mathematical operations.

Python Arithmetic Operators


Python operators are fundamental for performing mathematical calculations. Arithmetic operators are
symbols used to perform mathematical operations on numerical values. Arithmetic operators include
addition (+), subtraction (-), multiplication (*), division (/), and modulus (%).

Operator Description Syntax

+ Addition: adds two operands x+y

Subtraction: subtracts two


x–y
– operands

Multiplication: multiplies two


x*y
* operands

Division (float): divides the first


x/y
/ operand by the second

Division (floor): divides the first


x // y
// operand by the second

Modulus: returns the remainder


when the first operand is divided x%y
% by the second

** Power: Returns first raised to x ** y


ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

Operator Description Syntax

power second

Addition Operator
In Python, + is the addition operator. It is used to add 2 values.
val1 = 2
val2 = 3
# using the addition operator
res = val1 + val2
print(res)
Output:
5
Subtraction Operator
In Python, - is the subtraction operator. It is used to subtract the second value from the first value.
val1 = 2
val2 = 3
# using the subtraction operator
res = val1 - val2
print(res)
Output:
-1
Multiplication Operator
Python * operator is the multiplication operator. It is used to find the product of 2 values.
val1 = 2
val2 = 3
# using the multiplication operator
res = val1 * val2
print(res)
Output :
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

6
Division Operator
In Python programming language Division Operators allow us to divide two numbers and return a
quotient, i.e., the first number or number at the left is divided by the second number or number at the
right and returns the quotient.
There are two types of division operators:
1. Float division
2. Floor division
Float division
The quotient returned by this operator is always a float number, no matter if two numbers are integers.
For example:
Example:
print(5/5)
print(10/2)
print(-10/2)
print(20.0/2)

Output
1.0
5.0
-5.0
10.0
Integer division( Floor division)
The quotient returned by this operator is dependent on the argument being passed. If any of the
numbers is float, it returns output in float. It is also known as Floor division because, if any number is
negative, then the output will be floored. For example:
Example:
print(10//3)
print (-5//2)
print (5.0//2)
print (-5.0//2)
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

Output
3
-3
2.0
-3.0
Modulus Operator
The % in Python is the modulus operator. It is used to find the remainder when the first operand is
divided by the second.
val1 = 3
val2 = 2
# using the modulus operator
5
res = val1 % val2
print(res)
Output:
1

Exponentiation Operator
In Python, ** is the exponentiation operator. It is used to raise the first operand to the power of the
second.
1
val1 = 2
val2 = 3
# using the exponentiation operator
res = val1 ** val2
print(res)
Output:
8
Precedence of Arithmetic Operators in Python
Let us see the precedence and associativity of Python Arithmetic operators.
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

Operator Description Associativity

** Exponentiation Operator right-to-left

%, Modulos, Multiplication, Division, and Floor


left-to-right
*, /, // Division

+, - Addition and Subtraction operators left-to-right

Basic Methods for Slicing in Python


As I said, slicing is a core feature in Python, allowing developers to extract portions of sequences like
lists, strings, and tuples. Python offers two primary ways to slice sequences without importing
anything: the slicing : syntax and the slice() function. Understanding both methods is useful since you
are very likely to see both methods used.

Using the : Python slicing syntax


The slicing syntax sequence[start:stop:step] is the most common way to access parts of a sequence.
Each parameter—start, stop, and step—controls how the slicing is performed:
 start: Index where the slice begins (inclusive). Defaults to 0 if omitted.
 stop: Index where the slice ends (exclusive). Defaults to the sequence's length if omitted.
 step: Determines the interval between elements. Defaults to 1 if omitted.
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

Let's try an example:


numbers = [10, 20, 30, 40, 50, 60]
print(numbers[1:4])
# Output: [20, 30, 40]
print(numbers[:3])
# Output: [10, 20, 30]
print(numbers[::2])
# Output: [10, 30, 50]

Using the Python slice() function


Python’s slice() function provides an alternative definition of slicing parameters as reusable slice
objects. These objects encapsulate slicing logic and can be applied across multiple sequences.

Syntax and usage


The slice() function follows the format:
slice(start, stop, step)
Powered By
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

Here is an example:
# Create a slice object
slice_obj = slice(1, 4)

# Apply to a list
numbers = [10, 20, 30, 40, 50]
print(numbers[slice_obj]) # Output: [20, 30, 40]

# Apply to a string
text = "Python"
print(text[slice_obj]) # Output: "yth"
Powered By

Advantages of Python slice()


Personally, I like using the slice() function because it allows me to reuse the same slice object across
different sequences, avoiding repetitive slicing logic. It also makes the code easier to read and
maintain.
As you can see in the following example, we define a slice object once and reuse it across multiple
sequences. This eliminates the need to repeatedly specify the same start, end, and step values. It also
improves reusability because changing the slice boundaries in a single place automatically updates all
uses of that slice
# Define a reusable slice
my_slice = slice(2, 5)

# Apply to multiple sequences


data_list = [100, 200, 300, 400, 500]
data_string = "SlicingExample"

print(data_list[my_slice]) # Output: [300, 400, 500]


print(data_string[my_slice]) # Output: "ici"

NumPy Matrix Operations


ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

A matrix is a two-dimensional data structure where numbers are arranged into rows and columns. For
example,

A matrix is a two-dimensional data


structure.
The above matrix is a 3x3 (pronounced "three by three") matrix because it has 3 rows and 3 columns.

NumPy Matrix Operations


Here are some of the basic matrix operations provided by NumPy.

Functions Descriptions

array() creates a matrix

dot() performs matrix multiplication

transpose() transposes a matrix

linalg.inv() calculates the inverse of a matrix

linalg.det() calculates the determinant of a matrix


ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

flatten() transforms a matrix into 1D array

Create Matrix in NumPy


In NumPy, we use the np.array() function to create a matrix. For example,
import numpy as np
# create a 2x2 matrix
matrix1 = np.array([[1, 3],
[5, 7]])
print("2x2 Matrix:\n",matrix1)
# create a 3x3 matrix
matrix2 = np.array([[2, 3, 5],
[7, 14, 21],
[1, 3, 5]])
print("\n3x3 Matrix:\n",matrix2)
Run Code
Output
2x2 Matrix:
[[1 3]
[5 7]]
3x3 Matrix:
[[ 2 3 5]
[ 7 14 21]
[ 1 3 5]]
Here, we have created two matrices: 2x2 matrix and 3x3 matrix by passing a list of lists to
the np.array() function respectively.

Perform Matrix Multiplication in NumPy


We use the np.dot() function to perform multiplication between two matrices.
Let's see an example.
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

import numpy as np
# create two matrices
matrix1 = np.array([[1, 3],
[5, 7]])
matrix2 = np.array([[2, 6],
[4, 8]])
# calculate the dot product of the two matrices
result = np.dot(matrix1, matrix2)
print("matrix1 x matrix2: \n",result)
Run Code
Output
matrix1 x matrix2:
[[14 30]
[38 86]]
In this example, we have used the np.dot(matrix1, matrix2) function to perform matrix multiplication
between two matrices: matrix1 and matrix2.
To learn more about Matrix multiplication, please visit NumPy Matrix Multiplication.
Note: We can only take a dot product of matrices when they have a common dimension size. For
example, For A = (M x N) and B = (N x K) when we take a dot product of C = A . B the resulting
matrix is of size C = (M x K).

Transpose NumPy Matrix


The transpose of a matrix is a new matrix that is obtained by exchanging the rows and columns. For
2x2 matrix,
Matrix:
a11 a12
a21 a22

Transposed Matrix:
a11 a21
a12 a22
In NumPy, we can obtain the transpose of a matrix using the np.transpose() function. For example,
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

import numpy as np

# create a matrix
matrix1 = np.array([[1, 3],
[5, 7]])
# get transpose of matrix1
result = np.transpose(matrix1)
print(result)
Run Code
Output
[[1 5]
[3 7]]
Here, we have used the np.transpose(matrix1) function to obtain the transpose of matrix1.
Note: Alternatively, we can use the .T attribute to get the transpose of a matrix. For example, if we
used matrix1.T in our previous example, the result would be the same.

Calculate Inverse of a Matrix in NumPy


In NumPy, we use the np.linalg.inv() function to calculate the inverse of the given matrix.
However, it is important to note that not all matrices have an inverse. Only square matrices that have a
non-zero determinant have an inverse.
Now, let's use np.linalg.inv() to calculate the inverse of a square matrix.
import numpy as np

# create a 3x3 square matrix


matrix1 = np.array([[1, 3, 5],
[7, 9, 2],
[4, 6, 8]])
# find inverse of matrix1
result = np.linalg.inv(matrix1)
print(result)
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

Run Code
Output
[[-1.11111111 -0.11111111 0.72222222]
[ 0.88888889 0.22222222 -0.61111111]
[-0.11111111 -0.11111111 0.22222222]]
Note: If we try to find the inverse of a non-square matrix, we will get an error
message: numpy.linalg.linalgerror: Last 2 dimensions of the array must be square

Find Determinant of a Matrix in NumPy


We can find the determinant of a square matrix using the np.linalg.det() function to calculate the
determinant of the given matrix.
Suppose we have a 2x2 matrix A:
ab
cd
So, the determinant of a 2x2 matrix will be:
det(A) = ad - bc
where a, b, c, and d are the elements of the matrix.
Let's see an example.
import numpy as np

# create a matrix
matrix1 = np.array([[1, 2, 3],
[4, 5, 1],
[2, 3, 4]])
# find determinant of matrix1
result = np.linalg.det(matrix1)
print(result)
Run Code
Output
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

-5.00
Here, we have used the np.linalg.det(matrix1) function to find the determinant of the square matrix

Special functions in SciPy


In this article, we are going to see about special functions in Scipy. The special functions in scipy are
used to perform mathematical operations on the given data. Special function in scipy is a module
available in scipy package. Inside this special function, the available methods are:
 cbrt - which gives the cube root of the given number
 comb - gives the combinations of the elements
 exp10 - gives the number with raise to 10 power of the given number
 exprel - gives the relative error exponential, (exp(x) - 1)/x.
 gamma - returns the value by calculating the z*gamma(z) = gamma(z+1) and gamma(n+1) =
n!, for a natural number ‘n’.
 lambertw - computes the W(z) * exp(W(z)) for any complex number z, where W is the
lambertw function
 logsumexp - gives the log of the sum of exponential of given number
 perm - gives the permutations of the elements
Let's understand about these functions in detail.
1. cbrt()
This is used to return the cube root of the given number.
Syntax: cbrt(number)
Example: Program to find the cube root
from scipy.special import cbrt
# cube root of 64
print(cbrt(64))
# cube root of 78
print(cbrt(78))
# cube root of 128
print(cbrt(128))
Output:
4.0
4.272658681697917
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

5.039684199579493
Example: Program to find cube root in the given array elements.
from scipy.special import cbrt
# cube root of elements in an array
arr = [64, 164, 564, 4, 640]
arr = list(map(cbrt,arr))
print(arr)
Output:
[4.0, 5.473703674798428, 8.26214922566535, 1.5874010519681994, 8.617738760127535]
2. comb()
It is known as combinations and returns the combination of a given value.
Syntax: scipy.special.comb(N, k)
Where, N is the input value and k is the number of repetitions.
Example 1:
# import combinations
from scipy.special import comb
# combinations of input 4
print(comb(4,1))
Output:
4.0
Example 2:
# import combinations module
from scipy.special import comb
# combinations of 4
print([comb(4,1),comb(4,2),comb(4,3),
comb(4,4),comb(4,5)])
# combinations of 6
print([comb(6,1),comb(6,2),comb(6,3),
comb(6,4),comb(6,5)])
Output:
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

[4.0, 6.0, 4.0, 1.0, 0.0]


[6.0, 15.0, 20.0, 15.0, 6.0]
3. exp10()
This method gives the number with raise to 10 power of the given number.
Syntax: exp10(value)
Where value is the number which is given as the input.
Example: Program to find the power of 10
from scipy.special import exp10
# 10 to the power of 2
print(exp10(2))
Output:
100.0
Example: Program to find the powers of 10 for a range
from scipy.special import exp10
# exponent raise to power 10
# for a range
for i in range(1,10):
print(exp10(i)
Output:
10.0
100.0
1000.0
10000.0
100000.0
1000000.0
10000000.0
100000000.0
1000000000.0
4. exprel()
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

It is known as the Relative Error Exponential Function. It returns the error value for a given variable.
If x is near zero, then exp(x) is near 1.
Syntax: scipy.special.exprel(input_data)
Example 1:
# import exprel
from scipy.special import exprel
# calculate exprel of 0
print(exprel(0))
Output:
1.0
Example 2:
# import exprel
from scipy.special import exprel
# list of elements
arr = [0,1,2,3,4,5]
print(list(map(exprel,arr)))
Output:
[1.0, 1.718281828459045, 3.194528049465325, 6.361845641062556, 13.399537508286059,
29.48263182051532]
5. gamma()
It is known as Gamma function. It is the generalized factorial since z*gamma(z) = gamma(z+1) and
gamma(n+1) = n!, for a natural number ‘n’.
Syntax: scipy.special.gamma(input_data)
Where, input data is the input number.
Example 1:
# import gamma function
from scipy.special import gamma
print(gamma(56))
Output:
1.2696403353658278e+73
Example 2:
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

# import gamma function


from scipy.special import gamma
print([gamma(56), gamma(156), gamma(0),
gamma(1), gamma(5)])
Output:
[1.2696403353658278e+73, 4.789142901463394e+273, inf, 1.0, 24.0]
6. lambertw()
It is also known as Lambert Function. It calculates the value of W(z) is such that z = W(z) *
exp(W(z)) for any complex number z, where W is known as the Lambert Function
Syntax: scipy.special.lambertw(input_data)
Example:
# import lambert function
from scipy.special import lambertw
# calculate W value
print([lambertw(1),lambertw(0),lambertw(56),
lambertw(68),lambertw(10)])
Output:
[(0.5671432904097838+0j), 0j, (2.9451813101206707+0j), (3.0910098540499797+0j),
(1.7455280027406994+0j)]
7. logsumexp()
It is known as Log Sum Exponential Function. It will return the log of the sum of the exponential of
input elements.
Syntax: scipy.special.logsumexp(input_value)
where, input value is the input data.
Example 1:
from scipy.special import logsumexp
# logsum exp of numbers from
# 1 to 10
a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
print(logsumexp(a))
Output:
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

10.45862974442671
Example 2:
from scipy.special import logsumexp
# logsum exp of numbers from
# 1 to 10
a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
# logsum exp of numbers from
# 10 to 15
b = [10, 11, 12, 13, 14, 15]
print([logsumexp(a), logsumexp(b)])
Output:
[10.45862974442671, 15.456193316018123]
8. perm()
The perm stands for the permutation. It will return the permutation of the given numbers.
Syntax: scipy.special.perm(N,k)
where N is the input value and k is the no of repetitions.
Example:
# import permutations module
from scipy.special import perm
# permutations of 4
print([perm(4, 1), perm(4, 2), perm(4, 3),
perm(4, 4), perm(4, 5)])
# permutations of 6
print([perm(6, 1), perm(6, 2), perm(6, 3),
perm(6, 4), perm(6, 5)])
Output:
[4.0, 12.0, 24.0, 24.0, 0.0]
[6.0, 30.0, 120.0, 360.0, 720.0]
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

Random Numbers in Python


Python defines a set of functions that are used to generate or manipulate random numbers through
the random module.
Functions in the random module rely on a pseudo-random number generator function random(),
which generates a random float number between 0.0 and 1.0. These particular type of functions is
used in a lot of games, lotteries, or any application requiring a random number generation.
Let us see an example of generating a random number in Python using the random() function.
import random
num = random.random()
print(num)
Output:
0.30078080420602904

Random number using seed()


Python random.seed() function is used to save the state of a random function so that it can generate
some random numbers in Python on multiple executions of the code on the same machine or on
different machines (for a specific seed value). The seed value is the previous value number generated
by the generator. For the first time when there is no previous value, it uses the current system time.
# importing "random" for random operations
import random
# using random() to generate a random number
# between 0 and 1
print("A random number between 0 and 1 is : ", end="")
print(random.random())
# using seed() to seed a random number
random.seed(5)
# printing mapped random number
print("The mapped random number with 5 is : ", end="")
print(random.random())
# using seed() to seed different random number
random.seed(7)
# printing mapped random number
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

print("The mapped random number with 7 is : ", end="")


print(random.random())
# using seed() to seed to 5 again
random.seed(5)
# printing mapped random number
print("The mapped random number with 5 is : ", end="")
print(random.random())
# using seed() to seed to 7 again
random.seed(7)
# printing mapped random number
print("The mapped random number with 7 is : ", end="")
print(random.random())
Output:
A random number between 0 and 1 is : 0.510721762520941
The mapped random number with 5 is : 0.6229016948897019
The mapped random number with 7 is : 0.32383276483316237
The mapped random number with 5 is : 0.6229016948897019
The mapped random number with 7 is : 0.32383276483316237

Random number using uniform()


The uniform() function is used to generate a floating point Python random number between the
numbers mentioned in its arguments. It takes two arguments, lower limit(included in generation) and
upper limit(not included in generation).
# Python code to demonstrate the working of
# shuffle() and uniform()
# importing "random" for random operations
import random
# Initializing list
li = [1, 4, 5, 10, 2]
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

# Printing list before shuffling


print("The list before shuffling is : ", end="")
for i in range(0, len(li)):
print(li[i], end=" ")
print("\r")
# using shuffle() to shuffle the list
random.shuffle(li)
# Printing list after shuffling
print("The list after shuffling is : ", end="")
for i in range(0, len(li)):
print(li[i], end=" ")
print("\r")
# using uniform() to generate random floating number in range
# prints number between 5 and 10
print("The random floating point number between 5 and 10 is : ", end="")
print(random.uniform(5, 10))
Output:
The list before shuffling is : 1 4 5 10 2
The list after shuffling is : 2 1 4 5 10
The random floating point number between 5 and 10 is : 5.183697823553464

Random number using choice()


Python random.choice() is an inbuilt function in the Python programming language that returns a
random item from a list, tuple, or string.
# import random
import random
# prints a random value from the list
list1 = [1, 2, 3, 4, 5, 6]
print(random.choice(list1))
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

# prints a random item from the string


string = "striver"
print(random.choice(string))
Output:
5
t

Linear Algebra
Linear algebra is an important topic across a variety of subjects. It allows you to solve problems
related to vectors, matrices, and linear equations. In Python, most of the routines related to this subject
are implemented in scipy.linalg, which offers very fast linear algebra capabilities.
In particular, linear models play an important role in a variety of real-world problems,
and scipy.linalg provides tools to compute them in an efficient way.
In this tutorial, you’ll learn how to:
 Study linear systems using determinants and solve problems using matrix inverses
 Interpolate polynomials to fit a set of points using linear systems
 Use Python to solve linear regression problems
 Use linear regression to predict prices based on historical data
This is the second part of a series of tutorials on linear algebra using scipy.linalg. So, before
continuing, make sure to take a look at the first tutorial of the series before reading this one.

Understanding Vectors, Matrices, and the Role of Linear Algebra


A vector is a mathematical entity used to represent physical quantities that have both magnitude and
direction. It’s a fundamental tool for solving engineering and machine learning problems. So
are matrices, which are used to represent vector transformations, among other applications.
Note: In Python, NumPy is the most used library for working with matrices and vectors. It uses a
special type called ndarray to represent them. As an example, imagine that you need to create the
following matrix:

With NumPy, you can use np.array() to create it, providing a nested list containing the elements of
each row of the matrix:
Python
In [1]: import numpy as np
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

In [2]: np.array([[1, 2], [3, 4], [5, 6]])


Out[2]:
array([[1, 2],
[3, 4],
[5, 6]])
NumPy provides several functions to facilitate working with vector and matrix computations. You can
find more information on how to use NumPy to represent vectors and matrices and perform operations
with them in the previous tutorial in this series.
A linear system or, more precisely, a system of linear equations, is a set of equations linearly relating
to a set of variables. Here’s an example of a linear system relating to the variables x₁ and x₂:

Here you have two equations involving two variables. In order to have a linear system, the values that
multiply the variables x₁ and x₂ must be constants, like the ones in this example. It’s common to write
linear systems using matrices and vectors. For example, you can write the previous system as the
following matrix product:

Comparing the matrix product form with the original system, you can notice the elements of
matrix A correspond to the coefficients that multiply x₁ and x₂. Besides that, the values in the right-
hand side of the original equations now make up vector b.
Linear algebra is a mathematical discipline that deals with vectors, matrices, and vector spaces and
linear transformations more generally. By using linear algebra concepts, it’s possible to build
algorithms to perform computations for several applications, including solving linear systems.
When there are just two or three equations and variables, it’s feasible to perform the
calculations manually, combine the equations, and find the values for the variables.
However, in real-world applications, the number of equations can be very large, making it infeasible
to do calculations manually. That’s precisely when linear algebra concepts and algorithms come
handy, allowing you to develop usable applications for engineering and machine learning, for
example.
In Working With Linear Systems in Python With scipy.linalg, you’ve seen how to solve linear systems
using scipy.linalg.solve(). Now you’re going to learn how to use determinants to study the possible
solutions and how to solve problems using the concept of matrix inverses.
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

Calculating Inverses and Determinants With scipy.linalg


You can calculate matrix inverses and determinants using scipy.linalg.inv() and scipy.linalg.det().
For example, consider the meal plan problem that you worked on in the previous tutorial of this series.
Recall that the linear system for this problem could be written as a matrix product:

Previously, you used scipy.linalg.solve() to obtain the solution 10, 10, 20, 20, 10 for the variables x₁
to x₅, respectively. But as you’ve just learned, it’s also possible to use the inverse of the coefficients
matrix to obtain vector x, which contains the solutions for the problem. You have to
calculate x = A⁻¹b, which you can do with the following program:
Python
In [1]: import numpy as np
...: from scipy import linalg
In [2]: A = np.array(
...: [
...: [1, 9, 2, 1, 1],
...: [10, 1, 2, 1, 1],
...: [1, 0, 5, 1, 1],
...: [2, 1, 1, 2, 9],
...: [2, 1, 2, 13, 2],
...: ]
...: )
14In [3]: b = np.array([170, 180, 140, 180, 350]).reshape((5, 1))
16In [4]: A_inv = linalg.inv(A)
18In [5]: x = A_inv @ b
...: x
Out[5]:
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

array([[10.],
[10.],
[20.],
[20.],
[10.]])
Here’s a breakdown of what’s happening:
 Lines 1 and 2 import NumPy as np, along with linalg from scipy. These imports allow you to
use linalg.inv().
 Lines 4 to 12 create the coefficients matrix as a NumPy array called A.
 Line 14 creates the independent terms vector as a NumPy array called b. To make it a column
vector with five elements, you use .reshape((5, 1)).
 Line 16 uses linalg.inv() to obtain the inverse of matrix A.
 Lines 18 and 19 use the @ operator to perform the matrix product in order to solve the linear
system characterized by A and b. You store the result in x, which is printed.

solve a pair of nonlinear equations


A nonlinear equation is an equation in which the minimum degree of at least one
variable term is 2 or more than two and the relationship between a nonlinear equation's variables
cannot be represented by a straight line when plotted on a graph.

Prerequisite

Install these Python libraries using the following commands in your terminal:.
pip install numpy
pip install scipy
pip install sympy

Solve a Pair of Nonlinear Equations Using Python

Below are some ways by which we can solve a pair of nonlinear equations using Python:
 Using fsolve from scipy.optimize
 Using root from scipy.optimize
 Using minimize from scipy.optimize (Optimization Method)
 Using nsolve from SymPy
 Using Newton's method with NumPy
We will perform all methods on these equations:
Equation 1: x2 + y2 = 25
Equation 2: x2 - y = 0
Solve Non-Linear Equations Using fsolve from SciPy
This Python code uses the fsolve function from the scipy.optimize library to find the numerical
solution to a system of nonlinear equations. The equations are defined in the equations function,
where eq1 and eq2 represent the equations. The initial guess for the solution is set to [1, 1] for [x,y],
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

and fsolve is used to iteratively use this guess until it reaches to a solution. The final solution is then
printed.
from scipy.optimize import fsolve

def equations(vars):
x, y = vars
eq1 = x**2 + y**2 - 25
eq2 = x**2 - y
return [eq1, eq2]

initial_guess = [1, 1]
solution = fsolve(equations, initial_guess)
print("Solution:", solution)
Output:
Solution: [2.12719012 4.52493781]

Solve a Pair of NonLinear Equations Using root from SciPy

This Python code uses a method called root from the scipy.optimize library to find the
solution to a set of math equations. The code starts with a guess for the solution, [1, 1], and the root
function uses this guess until it finds the correct answer. The solution is then printed.
from scipy.optimize import root

def equations(vars):
x, y = vars
eq1 = x**2 + y**2 - 25
eq2 = x**2 - y
return [eq1, eq2]

initial_guess = [1, 1]
solution = root(equations, initial_guess)
print("Solution:", solution.x)
Output:
Solution: [2.12719012 4.52493781]

Solve Non-Linear Equations Using minimize from SciPy

This Python code uses the minimize function from the scipy.optimize library to find the optimal
solution for equations. The equations are defined as equation1 and equation2. The objective function
is representing the combined error of the equations and it is minimized to find the solution. The initial
guess for the solution is set to [1, 1], and the optimized solution is printed using result.x.
from scipy.optimize import minimize
# Define the equations
def equation1(x, y):
return x**2 + y**2 - 25
def equation2(x, y):
return x**2 - y
# Define the objective function for optimization
def objective(xy):
x, y = xy
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

return equation1(x, y)**2 + equation2(x, y)**2


# Initial guess
initial_guess = [1, 1]
# Perform optimization
result = minimize(objective, initial_guess)
solution_optimization = result.x
print("Optimization Method Solution:", solution_optimization)

Output:
Optimization Method Solution: [2.12719023 4.52493776]

Numerical Integration
You will probably encounter many situations in which analytical integration of a function or a
differential equation is difficult or impossible. In this section we show how Scientific Python can help
through its high level mathematical algorithms. You will learn how to develop you own numerical
integration method and how to get a specified accuracy. The package scipy.integrate can do
integration in quadrature and can solve differential equations

1. The Basic Trapezium Rule


Scipy uses three methods to integrate a one-dimensional function: trapezoidal (integrate.trapz),
Simpson (integrate.simps) and Romberg (integrate.romb). The trapezium (trapezoidal) method is
the most straightforward of the three. The simple trapezium formula calculates the integral of a
function f(x) as the area under the curve representing f(x) by approximating it with the sum of
trapeziums:

The area of each trapezium is calculated as width times the average height.
Example: Evaluate the integral:
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

using the basic trapezium rule.


We shall write a small program to evaluate the integral. Of course we have to estimate the number of
trapeziums to use; the accuracy of our method depends on this number.
python code
import math
#the function to be integrated:
def f(x):
return x ** 4 * (1 - x) ** 4 / (1 + x ** 2)
#define a function to do integration of f(x) btw. 0 and 1:
def trap(f, n):
h = 1 / float(n)
intgr = 0.5 * h * (f(0) + f(1))
for i in range(1, int(n)):
intgr = intgr + h * f(i * h)
return intgr
print(trap(f, 100))

2. Integrating a function with scipy.integrate


Let's look at:

Our simple integration program will divide the interval 0 to 2 in equally spaced slices and spend the
same time calculating the integrand in each of these slices. If we have a closer look at the integrand
and plot it, we would notice that at low x-values the function hardly varies, so our program will waste
time in that region. In contrast, the integrate.quad() routine from Scipy is arbitrary callable (adaptive),
in the sense that it can adjust the function evaluations to concentrate on the more important regions
(quad is short for quadrature, an older name for integration). Let’s see how Scipy could simplify our
work:
python code
import scipy
from scipy.integrate import quad
from math import *
def f(x):
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

return x ** 4 * log(x + sqrt(x ** 2 + 1))


print(quad(f, 0, 2))

3. Integrating ordinary differential equations with odeint


Many physical phenomena are modeled by differential equations: oscillations of simple systems
(spring-mass, pendulum, etc.), fluid mechanics (Navier-Stokes, Laplace's, etc.), quantum mechanics
(Schrödinger’s) and many others. Here we’ll show you how to numerically solve these equations. The
example we shall use in this tutorial is the dynamics of a spring-mass system in the presence of a drag
force.
Writing Newton’s second law for the system, we have to combine the elastic force

with the drag force whose model for a slowly moving object is

We obtain the formula

or

where L is the length of the unstretched/uncompressed spring.


To find an approximate solution to the equation of motion above, we’ll have to use a finite difference
approximation for the derivative, which will generate an algorithm for solving the equation. Most
such algorithms are based on first order differential equations, so it will probably not be a bad idea to
start by putting our second-order equation in the form of a system of two first-order differential
equations:

To write the numerical integration program, we shall use odeint, which is part of scipy.integrate. A
function call to odeint looks something like this:
python code
scipy.integrate.odeint(func, y0, t, args=())

Data manipulation and data analysis


Data manipulation and data analysis are distinct but related processes. Data manipulation focuses on
transforming and organizing data to make it usable and meaningful, while data analysis involves
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

exploring and interpreting that data to extract insights.. Data manipulation is often a preparatory step
before analysis, ensuring the data is clean, structured, and ready for further exploration.

Key Differences:
 Focus:
Data manipulation is concerned with the form and structure of the data, while data analysis focuses on
the meaning and implications of the data.
 Techniques:
Data manipulation involves techniques like cleaning, filtering, restructuring, and aggregating
data. Data analysis uses techniques like statistical analysis, modeling, and visualization.
 Purpose:
Data manipulation prepares data for analysis, while data analysis extracts insights and draws
conclusions.
Data Manipulation:
 Definition:
Data manipulation involves organizing, cleaning, and transforming raw data into a usable format.
 Process:
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

This process can include extracting data, cleaning it, structuring it, filtering information, and
constructing databases.
 Tools:
Various tools, including spreadsheets, SQL, and programming languages like Python (with libraries
like Pandas), are used for data manipulation.

 Examples:
Sorting data alphabetically, aggregating data into categories, or converting data from one format to
another.
Data Analysis:
 Definition:
Data analysis is the process of examining data to identify patterns, trends, and insights.
 Process:
This involves collecting, cleaning, exploring, and interpreting data using various analytical methods.
 Tools:
Statistical software, data visualization tools, and machine learning algorithms are commonly used for
data analysis.
 Examples:
Analyzing sales data to identify trends, using statistical models to predict future outcomes, or
visualizing data to communicate findings.

Relationship:
Data manipulation is a crucial prerequisite for effective data analysis. By preparing data in a usable
format, data analysis can be performed more efficiently and accurately, leading to better insights.

Pandas: Reading Data from Files Using Pandas


Pandas is a very popular Python library that offers a set of functions and data structures that aid in
data analysis more efficiently. The Pandas package is mainly used for data pre-processing purposes
such as data cleaning, manipulation, and transformation. Hence, it is a very handy tool for data
scientists and analysts. Let’s find out how to read and write files using pandas.
We will cover the following sections:
 Data Structures in Pandas
 Writing a File Using Pandas
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

 Reading a File Using Pandas


 Importing a CSV File into the DataFrame
 Endnotes
Data Structures in Pandas
There are two main types of Data Structures in Pandas –
 Pandas Series: 1D labeled homogeneous array, size-immutable
 Pandas DataFrame: 2D labeled tabular structure, size-mutable
Mutability refers to the tendency to change. When we say a value is mutable, it means that it can be
changed.
DataFrame is a widely used and one of the most important data structures. It stores data as stored in
Excel with rows and columns.
Let’s see how we create a DataFrame using Pandas, shall we?
 Importing Pandas Library
 Creating a Pandas DataFrame
Importing Pandas Library
#Importing Pandas Library
import pandas as pd
Copy code
Creating a Pandas DataFrame
#Creating a Sample DataFrame
data = pd.DataFrame({
'id': [ 1, 2, 3, 4, 5, 6, 7],
'age': [ 27, 32, 23, 41, 37, 31, 49],
'gender': [ 'M', 'F', 'F', 'M', 'M', 'M', 'F'],
'occupation': [ 'Salesman', 'Doctor', 'Manager', 'Teacher', 'Mechanic', 'Lawyer', 'Nurse']
})

Data
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

Reading a File Using Pandas


Once your data is saved in a file, you can load it whenever required using
pandas .read_csv() function:

#Reading the CSV file


df = pd.read_csv('data.csv')
df

You can use this function to load your Text file as well.
But to load your Excel file, you will use the pandas .read_excel() function:
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

#Reading the Excel file


df2 = pd.read_excel('data2.xlsx')
df2

There are various other file formats that you can import your data from. For example:
 .read_json()
 .read_html()
 .read_sql()
Take note that this isn’t an exhaustive list either. There are more formats you can read from, but they
are out of the scope of this article.
For now, we are going to focus on the most common data format you are going to work with – the
CSV format.
Importing a CSV File into the DataFrame
 File path
 Header
 Column delimiter
 Index column
 Select columns
 Omit rows
 Missing values
 Alter values
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

 Compress and decompress files

File path
We have seen above how you can read a CSV file into a DataFrame. You can specify the path where
your file is stored when you’re using the .read_csv() function:
#Specifying the file path
df.read_csv('C:/Users/abc/Desktop/file_name.csv')
Copy code
Header
You can use the header parameter to specify the row that will be used as column names for your
DataFrame. By default, header=0 which means that the first row is considered as the header.
When the header is specified as None, there will be no header:

df = pd.read_csv('data.csv', header=None)
df

Column delimiter
You can use the sep parameter to specify a custom delimiter for the CSV input. By default, it is a
comma.
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

#Use tab to separate values


df = pd.read_csv('data.csv', header=None, sep='\t')
df

Data Structures: Series and Data Frame


Pandas is a widely-used Python library for data analysis that provides two essential data
structures: Series and DataFrame. These structures are potent tools for handling and examining data,
but they have different features and applications.
In this article, we will explore the differences between Series and DataFrames.
Table of Content
 What are pandas?
 What is the Pandas series?
 Key Features of Series data structure:
 What is Pandas Dataframe?
 Key Features of Data Frame data structures:
 DataFrame vs Series
 A Pandas Series is a one-dimensional array-like object that can hold data of any type
(integer, float, string, etc.). It is labelled, meaning each element has a unique identifier
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

called an index. You can think of a Series as a column in a spreadsheet or a single column
of a database table. Series are a fundamental data structure in Pandas and are commonly
used for data manipulation and analysis tasks. They can be created from lists, arrays,
dictionaries, and existing Series objects. Series are also a building block for the more
complex Pandas DataFrame, which is a two-dimensional table-like structure consisting of
multiple Series objects.
Creating a Series data structure from a list, dictionary, and custom
index:
import pandas as pd
# Initializing a Series from a list
data = [1, 2, 3, 4, 5]
series_from_list = pd.Series(data)
print(series_from_list)
# Initializing a Series from a dictionary
data = {'a': 1, 'b': 2, 'c': 3}
series_from_dict = pd.Series(data)
print(series_from_dict)
# Initializing a Series with custom index
data = [1, 2, 3, 4, 5]
index = ['a', 'b', 'c', 'd', 'e']
series_custom_index = pd.Series(data, index=index)
print(series_custom_index)

Output:
0 1
1 2
2 3
3 4
4 5
dtype: int64
a 1
b 2
c 3
dtype: int64
a 1
b 2
c 3
d 4
e 5
dtype: int64

Key Features of Series data structure:


Indexing:
Each element in a Series has a corresponding index, which can be used to access or manipulate the
data.
 1
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

print(series_from_list[0])
print(series_from_dict['b'])
Vectorized Operations:
Series supports vectorized operations, allowing you to perform arithmetic operations on the entire
series efficiently.
series_a = pd.Series([1, 2, 3])
series_b = pd.Series([4, 5, 6])
sum_series = series_a + series_b
print(sum_series)
Output:
0 5
1 7
2 9
dtype: int64

Extracting Information from Text DataFrames


1. Splitting Text into Columns with str.split()
The str.split() function allows you to split a column of text data into multiple columns based on a
delimiter. For example, suppose we have a DataFrame with a column "Name" containing full names
in the format "First Last". We can split this column into two columns "First" and "Last" using the
following code:

import pandas as pd

df = pd.DataFrame({'Name': ['John Doe', 'Jane Smith']})


df[['First', 'Last']] = df['Name'].str.split(' ', expand=True)
print(df)
Output:
Name First Last
0 John Doe John Doe
1 Jane Smith Jane Smith
The expand=True argument returns a DataFrame with one column for each split element, while
expand=False returns a Series.
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

Here's a more detailed look at data extraction:


1. Why is data extraction important?
 Enables data analysis:
It allows businesses to access and utilize data from various sources, enabling them to gain insights
and make informed decisions.
 Facilitates data integration:
It helps to consolidate data from different sources into a centralized system.
 Supports data workflows:
It's the starting point for data transformation and loading into databases or data warehouses.
 Builds data quality:
It helps to identify and correct inconsistencies and redundancies in data, improving its overall quality.
2. Types of Data Extraction
 Incremental Extraction:
This extracts only the data that has changed since the last extraction, making it efficient for
monitoring dynamic data.
 Full Extraction:
This extracts all the data from a source, which is useful for creating a baseline dataset or for initial
data analysis.
3. Common Data Sources for Extraction
 Databases:
Extracting data from relational databases (e.g., SQL databases) is a common task.
 Documents:
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

Extracting data from documents (e.g., PDFs, text files) often involves using techniques like OCR and
NLP.
 Web Pages:
Data scraping from websites is a technique used to extract data from web pages.
 APIs:
Many applications and platforms expose APIs that allow data extraction.
4. Techniques and Tools for Data Extraction
 OCR (Optical Character Recognition): Used to convert scanned documents or images into
editable text.
 NLP (Natural Language Processing): Used to extract and understand unstructured text data,
including techniques like tokenization, named entity recognition, and sentiment analysis.
 Data Scraping: Used to extract data from websites.
 ETL/ELT Tools: Tools like Talend, Matillion, and Stitch provide pre-built connectors and
functionalities for extracting, transforming, and loading data.
 RPA (Robotic Process Automation) tools: Tools like Automation Anywhere can automate
data extraction tasks from various sources.
5. Examples of Data Extraction in Action
 Financial institutions:
Extracting customer transaction data from various systems to track revenue and identify trends.
 E-commerce businesses:
Extracting customer order data to personalize marketing and improve customer service.
 Healthcare organizations:
Extracting patient medical records to analyze treatment outcomes and improve patient care.

Grouping and Aggregating


Aggregation in Pandas
Aggregation means applying a mathematical function to summarize data. It can be used to get a
summary of columns in our dataset like getting sum, minimum, maximum etc. from a particular
column of our dataset. The function used for aggregation is agg() the parameter is the function we
want to perform. Some functions used in the aggregation are:
 sum() : Compute sum of column values
 min() : Compute min of column values
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

 max() : Compute max of column values


 mean() : Compute mean of column
 size() : Compute column sizes
 describe() : Generates descriptive statistics
 first() : Compute first of group values
 last() : Compute last of group values
 count() : Compute count of column values
 std() : Standard deviation of column
 var() : Compute variance of column
 sem() : Standard error of the mean of column
Creating a Sample Dataset
Let's create a small dataset of student marks in Maths, English, Science and History.
import pandas as pd

df = pd.DataFrame([[9, 4, 8, 9],
[8, 10, 7, 6],
[7, 6, 8, 5]],
columns=['Maths', 'English',
'Science', 'History'])

print(df)
Output:

Now that we have a dataset let’s perform aggregation.


1. Summing Up All Values (sum())
The sum() function adds up all values in each column.
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

df.sum()
Output:

2. Getting a Summary (describe())


Instead of calculating sum, mean, min and max separately we can use describe() which provides all
important statistics in one go.
df.describe()
Output:

3. Applying Multiple Aggregations at Once (agg())


The .agg() function lets you apply multiple aggregation functions at the same time.
df.agg(['sum', 'min', 'max'])
Output:
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

Grouping in Pandas
Grouping in Pandas means organizing your data into groups based on some columns. Once grouped
you can perform actions like finding the total, average, count or even pick the first row from each
group. This method follows a split-apply-combine process:
 Split the data into groups
 Apply some calculation like sum, average etc.
 Combine the results into a new table.
Let’s understand grouping in Pandas using a small bakery order dataset as an example.
1
import pandas as pd
data = {
'Item': ['Cake', 'Cake', 'Bread', 'Pastry', 'Cake'],
'Flavor': ['Chocolate', 'Vanilla', 'Whole Wheat', 'Strawberry', 'Chocolate'],
}
df = pd.DataFrame(data)
print(df)
Output:

You might also like