Python Unit 5.Notes
Python Unit 5.Notes
Linear algebra
Python’s mathematical libraries, NumPy and SciPy, have extensive tools for numerically solving
problems in linear algebra. Here we focus on two problems that arise commonly in scientific and
engineering settings: (1) solving a system of linear equations and (2) eigenvalue problems. In
addition, we also show how to perform a number of other basic computations, such as finding the
determinant of a matrix, matrix inversion, and decomposition. The SciPy package for linear
algebra is called scipy.linalg.
[ 4, 5]])
In [5]: scipy.linalg.det(a)
Out[5]: -22.0
The inverse of a matrix is computed using the scipy.linalg.inv function, while the product of two
matrices is calculated using the NumPy dot function:
In [6]: b = scipy.linalg.inv(a)
In [6]: b
Out[6]: array([[-0.22727273, 0.13636364],
[ 0.18181818, 0.09090909]])
In [7]: dot(a,b)
Out[7]: array([[ 1., 0.],
[ 0., 1.]])
The first task is to recast this set of equations as a matrix equation of the form . In this case,
we have:
In [10]: dot(Ainv, b)
Out[10]: array([ -8.91304348, 10.2173913 , -3.17391304])
which is the same answer we obtained using scipy.linalg.solve. Using scipy.linalg.solve is numerically
more stable and a faster than using , so it is the preferred method for solving systems of
equations.
You might wonder what happens if the system of equations are not all linearly independent. For
example if the matrix is given by
where the third row is a multiple of the first row. Let’s try it out and see what happens. First we
change the bottom row of the matrix and then try to solve the system as we did before.
In [11]: A[2] = array([1, 2, 3])
In [12]: A
Out[12]: array([[ 2, 4, 6],
[ 1, -3, -9],
[ 1, 2, 3]])
In [13]: scipy.linalg.solve(A,b)
LinAlgError: Singular matrix
In Python:
Python doesn't have built-in array data structures like some other languages (e.g., C or Java), but it
uses:
1. Lists (most commonly used like arrays)
2. Array module (for arrays of uniform type)
3. NumPy arrays (for numerical computations)
power second
Addition Operator
In Python, + is the addition operator. It is used to add 2 values.
val1 = 2
val2 = 3
# using the addition operator
res = val1 + val2
print(res)
Output:
5
Subtraction Operator
In Python, - is the subtraction operator. It is used to subtract the second value from the first value.
val1 = 2
val2 = 3
# using the subtraction operator
res = val1 - val2
print(res)
Output:
-1
Multiplication Operator
Python * operator is the multiplication operator. It is used to find the product of 2 values.
val1 = 2
val2 = 3
# using the multiplication operator
res = val1 * val2
print(res)
Output :
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE
6
Division Operator
In Python programming language Division Operators allow us to divide two numbers and return a
quotient, i.e., the first number or number at the left is divided by the second number or number at the
right and returns the quotient.
There are two types of division operators:
1. Float division
2. Floor division
Float division
The quotient returned by this operator is always a float number, no matter if two numbers are integers.
For example:
Example:
print(5/5)
print(10/2)
print(-10/2)
print(20.0/2)
Output
1.0
5.0
-5.0
10.0
Integer division( Floor division)
The quotient returned by this operator is dependent on the argument being passed. If any of the
numbers is float, it returns output in float. It is also known as Floor division because, if any number is
negative, then the output will be floored. For example:
Example:
print(10//3)
print (-5//2)
print (5.0//2)
print (-5.0//2)
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE
Output
3
-3
2.0
-3.0
Modulus Operator
The % in Python is the modulus operator. It is used to find the remainder when the first operand is
divided by the second.
val1 = 3
val2 = 2
# using the modulus operator
5
res = val1 % val2
print(res)
Output:
1
Exponentiation Operator
In Python, ** is the exponentiation operator. It is used to raise the first operand to the power of the
second.
1
val1 = 2
val2 = 3
# using the exponentiation operator
res = val1 ** val2
print(res)
Output:
8
Precedence of Arithmetic Operators in Python
Let us see the precedence and associativity of Python Arithmetic operators.
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE
Here is an example:
# Create a slice object
slice_obj = slice(1, 4)
# Apply to a list
numbers = [10, 20, 30, 40, 50]
print(numbers[slice_obj]) # Output: [20, 30, 40]
# Apply to a string
text = "Python"
print(text[slice_obj]) # Output: "yth"
Powered By
A matrix is a two-dimensional data structure where numbers are arranged into rows and columns. For
example,
Functions Descriptions
import numpy as np
# create two matrices
matrix1 = np.array([[1, 3],
[5, 7]])
matrix2 = np.array([[2, 6],
[4, 8]])
# calculate the dot product of the two matrices
result = np.dot(matrix1, matrix2)
print("matrix1 x matrix2: \n",result)
Run Code
Output
matrix1 x matrix2:
[[14 30]
[38 86]]
In this example, we have used the np.dot(matrix1, matrix2) function to perform matrix multiplication
between two matrices: matrix1 and matrix2.
To learn more about Matrix multiplication, please visit NumPy Matrix Multiplication.
Note: We can only take a dot product of matrices when they have a common dimension size. For
example, For A = (M x N) and B = (N x K) when we take a dot product of C = A . B the resulting
matrix is of size C = (M x K).
Transposed Matrix:
a11 a21
a12 a22
In NumPy, we can obtain the transpose of a matrix using the np.transpose() function. For example,
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE
import numpy as np
# create a matrix
matrix1 = np.array([[1, 3],
[5, 7]])
# get transpose of matrix1
result = np.transpose(matrix1)
print(result)
Run Code
Output
[[1 5]
[3 7]]
Here, we have used the np.transpose(matrix1) function to obtain the transpose of matrix1.
Note: Alternatively, we can use the .T attribute to get the transpose of a matrix. For example, if we
used matrix1.T in our previous example, the result would be the same.
Run Code
Output
[[-1.11111111 -0.11111111 0.72222222]
[ 0.88888889 0.22222222 -0.61111111]
[-0.11111111 -0.11111111 0.22222222]]
Note: If we try to find the inverse of a non-square matrix, we will get an error
message: numpy.linalg.linalgerror: Last 2 dimensions of the array must be square
# create a matrix
matrix1 = np.array([[1, 2, 3],
[4, 5, 1],
[2, 3, 4]])
# find determinant of matrix1
result = np.linalg.det(matrix1)
print(result)
Run Code
Output
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE
-5.00
Here, we have used the np.linalg.det(matrix1) function to find the determinant of the square matrix
5.039684199579493
Example: Program to find cube root in the given array elements.
from scipy.special import cbrt
# cube root of elements in an array
arr = [64, 164, 564, 4, 640]
arr = list(map(cbrt,arr))
print(arr)
Output:
[4.0, 5.473703674798428, 8.26214922566535, 1.5874010519681994, 8.617738760127535]
2. comb()
It is known as combinations and returns the combination of a given value.
Syntax: scipy.special.comb(N, k)
Where, N is the input value and k is the number of repetitions.
Example 1:
# import combinations
from scipy.special import comb
# combinations of input 4
print(comb(4,1))
Output:
4.0
Example 2:
# import combinations module
from scipy.special import comb
# combinations of 4
print([comb(4,1),comb(4,2),comb(4,3),
comb(4,4),comb(4,5)])
# combinations of 6
print([comb(6,1),comb(6,2),comb(6,3),
comb(6,4),comb(6,5)])
Output:
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE
It is known as the Relative Error Exponential Function. It returns the error value for a given variable.
If x is near zero, then exp(x) is near 1.
Syntax: scipy.special.exprel(input_data)
Example 1:
# import exprel
from scipy.special import exprel
# calculate exprel of 0
print(exprel(0))
Output:
1.0
Example 2:
# import exprel
from scipy.special import exprel
# list of elements
arr = [0,1,2,3,4,5]
print(list(map(exprel,arr)))
Output:
[1.0, 1.718281828459045, 3.194528049465325, 6.361845641062556, 13.399537508286059,
29.48263182051532]
5. gamma()
It is known as Gamma function. It is the generalized factorial since z*gamma(z) = gamma(z+1) and
gamma(n+1) = n!, for a natural number ‘n’.
Syntax: scipy.special.gamma(input_data)
Where, input data is the input number.
Example 1:
# import gamma function
from scipy.special import gamma
print(gamma(56))
Output:
1.2696403353658278e+73
Example 2:
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE
10.45862974442671
Example 2:
from scipy.special import logsumexp
# logsum exp of numbers from
# 1 to 10
a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
# logsum exp of numbers from
# 10 to 15
b = [10, 11, 12, 13, 14, 15]
print([logsumexp(a), logsumexp(b)])
Output:
[10.45862974442671, 15.456193316018123]
8. perm()
The perm stands for the permutation. It will return the permutation of the given numbers.
Syntax: scipy.special.perm(N,k)
where N is the input value and k is the no of repetitions.
Example:
# import permutations module
from scipy.special import perm
# permutations of 4
print([perm(4, 1), perm(4, 2), perm(4, 3),
perm(4, 4), perm(4, 5)])
# permutations of 6
print([perm(6, 1), perm(6, 2), perm(6, 3),
perm(6, 4), perm(6, 5)])
Output:
[4.0, 12.0, 24.0, 24.0, 0.0]
[6.0, 30.0, 120.0, 360.0, 720.0]
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE
Linear Algebra
Linear algebra is an important topic across a variety of subjects. It allows you to solve problems
related to vectors, matrices, and linear equations. In Python, most of the routines related to this subject
are implemented in scipy.linalg, which offers very fast linear algebra capabilities.
In particular, linear models play an important role in a variety of real-world problems,
and scipy.linalg provides tools to compute them in an efficient way.
In this tutorial, you’ll learn how to:
Study linear systems using determinants and solve problems using matrix inverses
Interpolate polynomials to fit a set of points using linear systems
Use Python to solve linear regression problems
Use linear regression to predict prices based on historical data
This is the second part of a series of tutorials on linear algebra using scipy.linalg. So, before
continuing, make sure to take a look at the first tutorial of the series before reading this one.
With NumPy, you can use np.array() to create it, providing a nested list containing the elements of
each row of the matrix:
Python
In [1]: import numpy as np
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE
Here you have two equations involving two variables. In order to have a linear system, the values that
multiply the variables x₁ and x₂ must be constants, like the ones in this example. It’s common to write
linear systems using matrices and vectors. For example, you can write the previous system as the
following matrix product:
Comparing the matrix product form with the original system, you can notice the elements of
matrix A correspond to the coefficients that multiply x₁ and x₂. Besides that, the values in the right-
hand side of the original equations now make up vector b.
Linear algebra is a mathematical discipline that deals with vectors, matrices, and vector spaces and
linear transformations more generally. By using linear algebra concepts, it’s possible to build
algorithms to perform computations for several applications, including solving linear systems.
When there are just two or three equations and variables, it’s feasible to perform the
calculations manually, combine the equations, and find the values for the variables.
However, in real-world applications, the number of equations can be very large, making it infeasible
to do calculations manually. That’s precisely when linear algebra concepts and algorithms come
handy, allowing you to develop usable applications for engineering and machine learning, for
example.
In Working With Linear Systems in Python With scipy.linalg, you’ve seen how to solve linear systems
using scipy.linalg.solve(). Now you’re going to learn how to use determinants to study the possible
solutions and how to solve problems using the concept of matrix inverses.
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE
Previously, you used scipy.linalg.solve() to obtain the solution 10, 10, 20, 20, 10 for the variables x₁
to x₅, respectively. But as you’ve just learned, it’s also possible to use the inverse of the coefficients
matrix to obtain vector x, which contains the solutions for the problem. You have to
calculate x = A⁻¹b, which you can do with the following program:
Python
In [1]: import numpy as np
...: from scipy import linalg
In [2]: A = np.array(
...: [
...: [1, 9, 2, 1, 1],
...: [10, 1, 2, 1, 1],
...: [1, 0, 5, 1, 1],
...: [2, 1, 1, 2, 9],
...: [2, 1, 2, 13, 2],
...: ]
...: )
14In [3]: b = np.array([170, 180, 140, 180, 350]).reshape((5, 1))
16In [4]: A_inv = linalg.inv(A)
18In [5]: x = A_inv @ b
...: x
Out[5]:
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE
array([[10.],
[10.],
[20.],
[20.],
[10.]])
Here’s a breakdown of what’s happening:
Lines 1 and 2 import NumPy as np, along with linalg from scipy. These imports allow you to
use linalg.inv().
Lines 4 to 12 create the coefficients matrix as a NumPy array called A.
Line 14 creates the independent terms vector as a NumPy array called b. To make it a column
vector with five elements, you use .reshape((5, 1)).
Line 16 uses linalg.inv() to obtain the inverse of matrix A.
Lines 18 and 19 use the @ operator to perform the matrix product in order to solve the linear
system characterized by A and b. You store the result in x, which is printed.
Prerequisite
Install these Python libraries using the following commands in your terminal:.
pip install numpy
pip install scipy
pip install sympy
Below are some ways by which we can solve a pair of nonlinear equations using Python:
Using fsolve from scipy.optimize
Using root from scipy.optimize
Using minimize from scipy.optimize (Optimization Method)
Using nsolve from SymPy
Using Newton's method with NumPy
We will perform all methods on these equations:
Equation 1: x2 + y2 = 25
Equation 2: x2 - y = 0
Solve Non-Linear Equations Using fsolve from SciPy
This Python code uses the fsolve function from the scipy.optimize library to find the numerical
solution to a system of nonlinear equations. The equations are defined in the equations function,
where eq1 and eq2 represent the equations. The initial guess for the solution is set to [1, 1] for [x,y],
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE
and fsolve is used to iteratively use this guess until it reaches to a solution. The final solution is then
printed.
from scipy.optimize import fsolve
def equations(vars):
x, y = vars
eq1 = x**2 + y**2 - 25
eq2 = x**2 - y
return [eq1, eq2]
initial_guess = [1, 1]
solution = fsolve(equations, initial_guess)
print("Solution:", solution)
Output:
Solution: [2.12719012 4.52493781]
This Python code uses a method called root from the scipy.optimize library to find the
solution to a set of math equations. The code starts with a guess for the solution, [1, 1], and the root
function uses this guess until it finds the correct answer. The solution is then printed.
from scipy.optimize import root
def equations(vars):
x, y = vars
eq1 = x**2 + y**2 - 25
eq2 = x**2 - y
return [eq1, eq2]
initial_guess = [1, 1]
solution = root(equations, initial_guess)
print("Solution:", solution.x)
Output:
Solution: [2.12719012 4.52493781]
This Python code uses the minimize function from the scipy.optimize library to find the optimal
solution for equations. The equations are defined as equation1 and equation2. The objective function
is representing the combined error of the equations and it is minimized to find the solution. The initial
guess for the solution is set to [1, 1], and the optimized solution is printed using result.x.
from scipy.optimize import minimize
# Define the equations
def equation1(x, y):
return x**2 + y**2 - 25
def equation2(x, y):
return x**2 - y
# Define the objective function for optimization
def objective(xy):
x, y = xy
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE
Output:
Optimization Method Solution: [2.12719023 4.52493776]
Numerical Integration
You will probably encounter many situations in which analytical integration of a function or a
differential equation is difficult or impossible. In this section we show how Scientific Python can help
through its high level mathematical algorithms. You will learn how to develop you own numerical
integration method and how to get a specified accuracy. The package scipy.integrate can do
integration in quadrature and can solve differential equations
The area of each trapezium is calculated as width times the average height.
Example: Evaluate the integral:
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE
Our simple integration program will divide the interval 0 to 2 in equally spaced slices and spend the
same time calculating the integrand in each of these slices. If we have a closer look at the integrand
and plot it, we would notice that at low x-values the function hardly varies, so our program will waste
time in that region. In contrast, the integrate.quad() routine from Scipy is arbitrary callable (adaptive),
in the sense that it can adjust the function evaluations to concentrate on the more important regions
(quad is short for quadrature, an older name for integration). Let’s see how Scipy could simplify our
work:
python code
import scipy
from scipy.integrate import quad
from math import *
def f(x):
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE
with the drag force whose model for a slowly moving object is
or
To write the numerical integration program, we shall use odeint, which is part of scipy.integrate. A
function call to odeint looks something like this:
python code
scipy.integrate.odeint(func, y0, t, args=())
exploring and interpreting that data to extract insights.. Data manipulation is often a preparatory step
before analysis, ensuring the data is clean, structured, and ready for further exploration.
Key Differences:
Focus:
Data manipulation is concerned with the form and structure of the data, while data analysis focuses on
the meaning and implications of the data.
Techniques:
Data manipulation involves techniques like cleaning, filtering, restructuring, and aggregating
data. Data analysis uses techniques like statistical analysis, modeling, and visualization.
Purpose:
Data manipulation prepares data for analysis, while data analysis extracts insights and draws
conclusions.
Data Manipulation:
Definition:
Data manipulation involves organizing, cleaning, and transforming raw data into a usable format.
Process:
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE
This process can include extracting data, cleaning it, structuring it, filtering information, and
constructing databases.
Tools:
Various tools, including spreadsheets, SQL, and programming languages like Python (with libraries
like Pandas), are used for data manipulation.
Examples:
Sorting data alphabetically, aggregating data into categories, or converting data from one format to
another.
Data Analysis:
Definition:
Data analysis is the process of examining data to identify patterns, trends, and insights.
Process:
This involves collecting, cleaning, exploring, and interpreting data using various analytical methods.
Tools:
Statistical software, data visualization tools, and machine learning algorithms are commonly used for
data analysis.
Examples:
Analyzing sales data to identify trends, using statistical models to predict future outcomes, or
visualizing data to communicate findings.
Relationship:
Data manipulation is a crucial prerequisite for effective data analysis. By preparing data in a usable
format, data analysis can be performed more efficiently and accurately, leading to better insights.
Data
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE
You can use this function to load your Text file as well.
But to load your Excel file, you will use the pandas .read_excel() function:
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE
There are various other file formats that you can import your data from. For example:
.read_json()
.read_html()
.read_sql()
Take note that this isn’t an exhaustive list either. There are more formats you can read from, but they
are out of the scope of this article.
For now, we are going to focus on the most common data format you are going to work with – the
CSV format.
Importing a CSV File into the DataFrame
File path
Header
Column delimiter
Index column
Select columns
Omit rows
Missing values
Alter values
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE
File path
We have seen above how you can read a CSV file into a DataFrame. You can specify the path where
your file is stored when you’re using the .read_csv() function:
#Specifying the file path
df.read_csv('C:/Users/abc/Desktop/file_name.csv')
Copy code
Header
You can use the header parameter to specify the row that will be used as column names for your
DataFrame. By default, header=0 which means that the first row is considered as the header.
When the header is specified as None, there will be no header:
df = pd.read_csv('data.csv', header=None)
df
Column delimiter
You can use the sep parameter to specify a custom delimiter for the CSV input. By default, it is a
comma.
ACHARIYA
COLLEGE OF ENGINEERING TECHNOLOGY
(Approved by AICTE New Delhi & Affiliated to Pondicherry University)
An ISO 9001: 2008 Certified Institution
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE
called an index. You can think of a Series as a column in a spreadsheet or a single column
of a database table. Series are a fundamental data structure in Pandas and are commonly
used for data manipulation and analysis tasks. They can be created from lists, arrays,
dictionaries, and existing Series objects. Series are also a building block for the more
complex Pandas DataFrame, which is a two-dimensional table-like structure consisting of
multiple Series objects.
Creating a Series data structure from a list, dictionary, and custom
index:
import pandas as pd
# Initializing a Series from a list
data = [1, 2, 3, 4, 5]
series_from_list = pd.Series(data)
print(series_from_list)
# Initializing a Series from a dictionary
data = {'a': 1, 'b': 2, 'c': 3}
series_from_dict = pd.Series(data)
print(series_from_dict)
# Initializing a Series with custom index
data = [1, 2, 3, 4, 5]
index = ['a', 'b', 'c', 'd', 'e']
series_custom_index = pd.Series(data, index=index)
print(series_custom_index)
Output:
0 1
1 2
2 3
3 4
4 5
dtype: int64
a 1
b 2
c 3
dtype: int64
a 1
b 2
c 3
d 4
e 5
dtype: int64
print(series_from_list[0])
print(series_from_dict['b'])
Vectorized Operations:
Series supports vectorized operations, allowing you to perform arithmetic operations on the entire
series efficiently.
series_a = pd.Series([1, 2, 3])
series_b = pd.Series([4, 5, 6])
sum_series = series_a + series_b
print(sum_series)
Output:
0 5
1 7
2 9
dtype: int64
import pandas as pd
Extracting data from documents (e.g., PDFs, text files) often involves using techniques like OCR and
NLP.
Web Pages:
Data scraping from websites is a technique used to extract data from web pages.
APIs:
Many applications and platforms expose APIs that allow data extraction.
4. Techniques and Tools for Data Extraction
OCR (Optical Character Recognition): Used to convert scanned documents or images into
editable text.
NLP (Natural Language Processing): Used to extract and understand unstructured text data,
including techniques like tokenization, named entity recognition, and sentiment analysis.
Data Scraping: Used to extract data from websites.
ETL/ELT Tools: Tools like Talend, Matillion, and Stitch provide pre-built connectors and
functionalities for extracting, transforming, and loading data.
RPA (Robotic Process Automation) tools: Tools like Automation Anywhere can automate
data extraction tasks from various sources.
5. Examples of Data Extraction in Action
Financial institutions:
Extracting customer transaction data from various systems to track revenue and identify trends.
E-commerce businesses:
Extracting customer order data to personalize marketing and improve customer service.
Healthcare organizations:
Extracting patient medical records to analyze treatment outcomes and improve patient care.
df = pd.DataFrame([[9, 4, 8, 9],
[8, 10, 7, 6],
[7, 6, 8, 5]],
columns=['Maths', 'English',
'Science', 'History'])
print(df)
Output:
df.sum()
Output:
Grouping in Pandas
Grouping in Pandas means organizing your data into groups based on some columns. Once grouped
you can perform actions like finding the total, average, count or even pick the first row from each
group. This method follows a split-apply-combine process:
Split the data into groups
Apply some calculation like sum, average etc.
Combine the results into a new table.
Let’s understand grouping in Pandas using a small bakery order dataset as an example.
1
import pandas as pd
data = {
'Item': ['Cake', 'Cake', 'Bread', 'Pastry', 'Cake'],
'Flavor': ['Chocolate', 'Vanilla', 'Whole Wheat', 'Strawberry', 'Chocolate'],
}
df = pd.DataFrame(data)
print(df)
Output: