0% found this document useful (0 votes)

211 views152 pages

Python Data Analysis for Beginners

- NumPy is a Python library used for working with arrays and matrices for numerical computing. - NumPy provides multidimensional arrays and matrices, along with tools to work with these numeric data structures. - Common NumPy functions include np.array() for creating arrays, np.zeros() and np.ones() for creating arrays of zeros or ones, and np.random.rand() for generating random numbers.

Uploaded by

Gizliusta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

211 views152 pages

Python Data Analysis for Beginners

Uploaded by

Gizliusta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Data Analysis with Python (For Beginners)

[email protected]
About CloudxLab

Making learning fun and for life

Videos Quizzes Hands-On Projects Case Studies

Real Life Use Cases

CloudxLab - Playground with
Feedback
Playground for hands-on. System evaluates the code automatically
and nudges the user by giving appropriate feedback

Content Playground

Feedback
CloudxLab - Online Cloud Based Lab

Cloud-based Lab with pre-installed tools and software for

practicing AI, Machine Learning, Deep Learning, Data Science, Big
Data and related technologies
CloudxLab - Online Cloud Based Lab

Real-world Experience Seamless Experience

Lab setup is exactly same as of setup in No endless downloading/ installations. No
Enterprises. Become job ready from hardware, permissions or conﬁguration
Day 1 issues

Central Dataset Any Device Anywhere

Upload your own dataset Connect from ANY browser,
Or use open source datasets available on lab SSH, device or operating system
CloudxLab - Social
We learn better with peers. Social proof and leaderboard
increases engagement and motivation
CloudxLab - Hiring Partners
Dedicated Job Portal → Upgrade career, enhance salary & move
jobs by applying to jobs posted by our hiring partners
CloudxLab - University Partners
Instructors / Authors

Praveen
Sandeep Giri Abhinav Singh
Pavithran
Founder at CloudxLab.com | AI CTO/Co-Founder at Yatis | IOT, Co-Founder, CloudxLab.com | AI,
Advisor at Algoworks | Speaker - ML, Computer Vision, Edge ML & Big Data | Visiting Faculty at
AI, Machine Learning, Deep SCMHRD
Learning,Big Data Cypress Semiconductors, Philips,
Multiple patents Byjus, HashCube
Amazon, InMobi, D.E.Shaw conference papers, 9+ Years of Exp. in EdTech, Game
18+ Years of Exp. in Enterprise IIT Bombay Dual Degree Development & Building Product
Softwares, Machine Learning &
Churning Humongous Data
What is Python

[email protected]
What is Python

- Python is a interpreted,
high-level language

[email protected]
What is Python

- Python is a interpreted,
high-level language
- Invented in 1991 by Guido van
Rossum

[email protected]
What is Python

- Python is a interpreted,
high-level language
- Invented in 1991 by Guido van
Rossum
- It is easy to use and improves
engineer productivity

[email protected]
What is Python

- Python is a interpreted,
high-level language
- Invented in 1991 by Guido van
Rossum
- It is easy to use and improves
engineer productivity
- Libraries for multiple
applications

[email protected]
What is Python

- Python is a interpreted,
high-level language
- Invented in 1991 by Guido van
Rossum
- It is easy to use and improves
engineer productivity
- Libraries for multiple
applications
- Django framework for web
applications
- We will focus on libraries for
Data Analysis
[email protected]
What is Python

[email protected]
What is NumPy

Stands for "Numeric Python" or "Numerical Python".

● Open Source
● Module of Python
● Provides fast mathematical functions

[email protected]
What is NumPy

scikitlearn tensorflow

numpy
Python
matplotlib
pandas

The complete Machine Learning eco-system.

[email protected]
Why use NumPy ?

● Array-oriented computing
● Efficiently implemented multi-dimensional arrays
● Designed for scientific computation
● Library of high-level mathematical functions

[email protected]
Numpy - Introduction

● NumPy’s main object is the homogeneous multidimensional

array
● It is a table of elements
○ usually numbers
○ all of the same type
○ indexed by a tuple of positive integers
● In NumPy dimensions are called axes
● The number of axes is rank

[email protected]
Numpy - Introduction

First Dimension / Axis, Len = 4

Second Dimension / Axis, Len = 3

[[ 0., 0., 0., 0.],

[ 0., 0., 0., 0.],

[ 0., 0., 0., 0.]])

The above array has a rank of 2 since it is 2

dimensional.

[email protected]
Creating Numpy arrays
np.array - Creating NumPy array from Python Lists/Tuple

Numpy arrays can be created from Python lists or tuple in the

following way.

>>> import numpy as np

>>> a = np.array([1, 2, 3])
>>> type(a)
<type 'numpy.ndarray'>
>>> b = np.array((3, 4, 5))
>>> type(b)
<type 'numpy.ndarray'>

[email protected]
Creating Numpy arrays
np.zeroes - An array with all Zeroes

To create an array with all zeroes the function np.zeroes is

used

>>> x = np.zeros( (3,4) )

>>> x
array([[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.]])

[email protected]
Creating Numpy arrays
np.ones - An array with all Ones

To create an array with all ones the function np.ones is used.

>>> np.ones( (3,4), dtype=np.int16 )

array([[ 1, 1, 1, 1],
[ 1, 1, 1, 1],
[ 1, 1, 1, 1]])

[email protected]
Creating Numpy arrays
np.full - An array with a given value

To create an array with a given shape and a given value np.full

is used.

>>> np.full( (3,4), 0.11 )

array([[ 0.11, 0.11, 0.11, 0.11],
[ 0.11, 0.11, 0.11, 0.11],
[ 0.11, 0.11, 0.11, 0.11]])

[email protected]
Creating Numpy arrays
np.arange - Creating sequence of Numbers

>>> np.arange( 10, 30, 5 )

array([10, 15, 20, 25])
>>> np.arange( 0, 2, 0.3 )
# it accepts float arguments
array([ 0. , 0.3, 0.6, 0.9, 1.2, 1.5, 1.8])

[email protected]
Creating Numpy arrays
np.linspace - Creating an array with evenly distributed numbers

● Returns an array having a specific number of points

● Evenly distributed between two values
● The maximum value is included, contrary to arange
Ending Number Total Number of points
Starting Number

>>> np.linspace(0, 5/3, 6)

array([0. , 0.33333333 , 0.66666667 , 1. , 1.33333333 1.66666667])

[email protected]
Creating Numpy arrays
np.random.rand - Creating an array with random numbers

Make a 2x3 matrix having random floats between 0 and 1:

>>> np.random.rand(2,3)
array([[ 0.55365951, 0.60150511, 0.36113117],
[ 0.5388662 , 0.06929014, 0.07908068]])

[email protected]
Creating Numpy arrays
np.empty - Creating an empty array

To create an uninitialised array with a given shape. Its content

is not predictable.

>>> np.empty((2,3))
array([[ 0.21288689, 0.20662218, 0.78018623],
[ 0.35294004, 0.07347101, 0.54552084]])

[email protected]
Important attributes of a NumPy object

The NumPy’s array class is called ndarray. The important

attributes of a ndarray object are -

ndarray.ndim
the number of axes (dimensions) of the array.
[[ 1., 0., 0.],
[ 0., 1., 2.]]

For the above array the value of ndarray.ndim is 2.

[email protected]
Important attributes of a NumPy object

ndarray.shape
the dimensions of the array. This is a tuple of integers
indicating the size of the array in each dimension.
[[ 1., 0., 0.],
[ 0., 1., 2.]]
For the above array the value of ndarray.shape is (2,3)

[email protected]
Important attributes of a NumPy object

ndarray.size
the total number of elements of the array. This is equal to
the product of the elements of shape.
[[ 1., 0., 0.],
[ 0., 1., 2.]]

For the above array the value of ndarray.size is 6.

[email protected]
Important attributes of a NumPy object

ndarray.dtype
Tells the datatype of the elements in the numpy array. All
the elements in a numpy array have the same type.
>>> c = np.arange(1, 5)
>>> c.dtype
dtype('int64')

[email protected]
Important attributes of a NumPy object

ndarray.itemsize
The itemsize attribute returns the size (in bytes) of each
item:
>>> c = np.arange(1, 5)
>>> c.itemsize
8

[email protected]
Reshaping Arrays

The function reshape is used to reshape the numpy array.

The following example illustrates this.

>>> a = np.arange(6)
>>> print(a)
[0 1 2 3 4 5]
>>> b = a.reshape(2, 3)
>>> print(b)
[[0 1 2],
[3 4 5]]

[email protected]
Indexing and Accessing NumPy arrays

[email protected]
Indexing one dimensional NumPy Arrays

0 1 2 3 4 5 6 Index

>>> a = np.array([1, 5, 3, 19, 13, 7, 3])

>>> a[3]
19
>>> a[2:5] #range
array([ 3, 19, 13])
>>> a[2::2] # How many to jump
array([ 3, 13, 3])
>>> a[::-1] #Go reverse
array([ 3, 7, 13, 19, 3, 5, 1])

[email protected]
Difference with regular Python arrays

1. If you assign a single value to an ndarray slice, it is copied

across the whole slice :
>>> a = np.array([1, 2, 5, 7, 8])
>>> a[1:3] = -1
>>> a
array([ 1, -1, -1, 7, 8])
----
>>> b = [1, 2, 5, 7, 8]
>>> b[1:3] = -1
TypeError: can only assign an iterable

[email protected]
Difference with regular Python arrays

2. ndarray slices are actually views on the same data buffer. If

you modify it, it is going to modify the original ndarray as well.

>>> a = np.array([1, 2, 5, 7, 8])

>>> a_slice = a[1:5]
>>> a_slice[1] = 1000
>>> a
array([ 1, 2, 1000, 7, 8])
# Original array was modified

[email protected]
Important attributes of a NumPy object

3. If you want a copy of the data, you need to use the copy
method as another_slice = a[2:6].copy() ,
if we modify another_slice, a remains same.

[email protected]
Indexing multi dimensional NumPy arrays
Multi-dimensional arrays can be accessed as
>>> b[1, 2] # row 1, col 2
>>> b[1, :] # row 1, all columns
>>> b[:, 1] # all rows, column 1

The following format is used while indexing multi-dimensional

arrays
Array[row_start_index:row_end_index, column_start_index:
column_end_index]

[email protected]
Boolean Indexing

We can also index arrays using an ndarray of boolean values on

one axis to specify the indices that we want to access.

>>> a = np.arange(12).reshape(3, 4)
>>> rows_on = np.array([ True, False, True])
>>> a[rows_on , : ] # Rows 0 and 3, all columns
array([[ 0, 1, 2, 3],
[ 8, 9, 10, 11]])

[email protected]
Linear Algebra with NumPy

[email protected]
Vectors

● A vector is a quantity defined by a magnitude and a direction.

● A vector can be represented by an array of numbers called
scalars.

[email protected]
Vectors

For example, say the rocket is going up at a slight angle: it has a

vertical speed of 5,000 m/s, and also a slight speed towards the
East at 10 m/s, and a slight speed towards the North at 50 m/s.
The rocket's velocity may be represented by the following
vector:

velocity 50 m/s

10 m/s

5,000 m/s
[email protected]
Use of Vectors in Machine Learning
● Vectors have many purposes in Machine Learning, most
notably to represent observations and predictions.
● For example, say we built a Machine Learning system to
classify videos into 3 categories (good, spam, clickbait) based
on what we know about them.
Good

Spam

Clickbait

[email protected]
Use of Vectors in Machine Learning
● For each video, we would have a vector representing what
we know about it, such as:

Video

● This vector could represent a video that lasts 10.5 minutes,

but only 5.2% viewers watch for more than a minute, it gets
3.25 views per day on average, and it was flagged 7 times as
spam. As you can see, each axis may have a different
meaning.

[email protected]
Use of Vectors in Machine Learning

● Based on this vector our Machine Learning system may

predict that there is an 80% probability that it is a spam
video, 18% that it is clickbait, and 2% that it is a good video.
This could be represented as the following vector:
Spam

class_probabilities Clickbait
Good

[email protected]
Representing Vectors in Python

● In python, a vector can be represented in many ways, the

simplest being a regular python list of numbers.
○ [1,1,1,1]
● Since Machine Learning requires lots of scientific calculations,
it is much better to use NumPy's ndarray, which provides a
lot of convenient and optimized implementations of essential
mathematical operations on vectors.
● numpy.array([1,1,1,1])

[email protected]
Vectorized Operations

● Vectorized operations are far more efficient

● Than loops written in Python to do the same thing
● Let’s test it

[email protected]
Vectorized Operations

Matrix multiplication
1. Using for loop
>>> def multiply_loops(A, B):
C = np.zeros((A.shape[0], B.shape[1]))
for i in range(A.shape[1]):
for j in range(B.shape[0]):
C[i, j] = A[i, j] * B[j, i]
return C

2. Using NumPy's matrix-matrix multiplication operator

>>> def multiply_vector(A, B):
return A @ B

[email protected]
Vectorized Operations

Matrix multiplication - Sample data

# Two randomly-generated, 100x100 matrices

>>> X = np.random.random((100, 100))

>>> Y = np.random.random((100, 100))

[email protected]
Vectorized Operations
Matrix multiplication - Loops - timeit Matrix multiplication - Vector - timeit

# First, using the explicit # Second, the NumPy

loops: multiplication:
>>> %timeit >>> %timeit
multiply_loops(X, Y) multiply_vector(X, Y)

4.23 ms ± 107 µs per loop 46.6 µs ± 346 ns per loop

(mean ± std. dev. of 7 runs, (mean ± std. dev. of 7 runs,
100 loops each) 10000 loops each)

Result - It took about 4.23 Result - 46.6 microseconds (46.4

milliseconds (4.23∗10−3 seconds) to ∗10−6 seconds) per multiplication
perform one matrix-matrix
multiplication Conclusion - Two orders of
magnitude faster

[email protected]
Basic Operations on NumPy arrays

[email protected]
Addition in NumPy arrays

Addition can be performed on NumPy arrays as shown below.

They apply element wise.

>>> a = np.array( [20, 30, 40, 50] )

>>> b = np.arange( 4 )
>>> b
array([0, 1, 2, 3])
>>> c = a + b
>>> c
array([20, 31, 42, 53])

[email protected]
Subtraction in NumPy arrays

Subtraction can be performed on NumPy arrays as shown

below. They apply element wise.
>>> a = np.array( [20, 30, 40, 50] )
>>> b = np.arange( 4 )
>>> b
array([0, 1, 2, 3])
>>> c = a - b
>>> c
array([20, 29, 38, 47])

[email protected]
Element wise product in NumPy arrays

Element wise product can be performed on NumPy arrays as

shown below.
>>> A = np.array( [[1,1],
... [0,1]] )
>>> B = np.array( [[2,0],
... [3,4]] )
>>> A*B # element wise product
array([[2, 0],
[0, 4]])

[email protected]
Matrix Product in NumPy arrays

Matrix product can be performed on NumPy arrays as shown

below.
>>> A = np.array( [[1,1],
... [0,1]] )
>>> B = np.array( [[2,0],
... [3,4]] )
>>> np.dot(A, B) # matrix product
array([[5, 4],
[3, 4]])

[email protected]
Division in NumPy arrays

Division can be performed on NumPy arrays as shown below.

They apply element wise.

a = np.array( [20, 30, 40, 50] )

b = np.arange(1, 5)
c = a / b
c
array([ 20. , 15. , 13.33333333, 12.5
])

[email protected]
Integer Division in NumPy arrays

Division can be performed on NumPy arrays as shown below.

They apply element wise.

a = np.array( [20, 30, 40, 50] )

b = np.arange(1, 5)
c = a // b
c
array([20, 15, 13, 12])

[email protected]
Modulus in NumPy arrays

Modulus operator can be applied on NumPy arrays as shown

below. They apply element wise.
a = np.array( [20, 30, 40, 50] )
b = np.arange(1, 5)
c = a % b
c
array([0, 0, 1, 2])

[email protected]
Exponents in NumPy arrays

We can find the exponent of each element in a NumPy array

in the following way. It is applied element wise.

a = np.array( [20, 30, 40, 50] )

b = np.arange(1, 5)
c = a ** b
c
array([ 20, 900, 64000, 6250000])

[email protected]
Conditional Operators on NumPy arrays

Conditional operators are also applied element-wise

m = np.array([20, -5, 30, 40])
m < [15, 16, 35, 36]
array([False, True, True, False], dtype=bool)

m < 25
array([ True, True, False, False], dtype=bool)

To get the elements below 25

m[m < 25]
array([20, -5])

[email protected]
Broadcasting in NumPy arrays

[email protected]
What is Broadcasting ?

1 2 0 2 1 4

4 5 3 4 7 9

1 2 5
???
4 5 7

[email protected]
What is Broadcasting ?

In general, when NumPy expects arrays of the same shape but

finds that this is not the case, it applies the so-called
broadcasting rules.

Basically there are 2 rules of Broadcasting to remember.

[email protected]
First rule of Broadcasting

[[[1, 3 ]]] + [5] [[[6, 8]]]

Shape (1, 1, 2) (1, ) (1, 1, 2)

If the arrays do not have the same rank, then a 1 will be

prepended to the smaller ranking arrays until their ranks match.

[email protected]
First rule of Broadcasting

>>> h = np.arange(5).reshape(1, 1, 5)
h
>>> array([[[0, 1, 2, 3, 4]]])
Let's try to add a 1D array of shape (5,) to this 3D array of
shape (1,1,5), applying the first rule of broadcasting.
h + [10, 20, 30, 40, 50] # same as: h + [[[10, 20, 30, 40, 50]]]
array([[[10, 21, 32, 43, 54]]])

[email protected]
Second rule of Broadcasting

On adding a 2D array of shape (2,1) to a 2D ndarray of shape

(2, 3). NumPy will apply the second rule of broadcasting

>>> k = np.arange(6).reshape(2, 3)
>>> k
array([[0, 1, 2],
[3, 4, 5]])

>>> k + [100, 200, 300]

array([[100, 201, 302],
[103, 204, 305]])

[email protected]
Mathematical and statistical
functions on NumPy arrays

[email protected]
Finding Mean of NumPy array elements

The ndarray object has a method mean() which finds the mean
of all the elements in the array regardless of the shape of the
numpy array.

>>> a = np.array([[-2.5, 3.1, 7], [10, 11, 12]])

>>> print("mean =", a.mean())
mean = 6.76666666667

[email protected]
Other useful ndarray methods

Similar to mean there are other ndarray methods which can be

used for various computations.

min - returns the minimum element in the ndarray

max - returns the maximum element in the ndarray
sum - returns the sum of the elements in the ndarray
prod - returns the product of the elements in the ndarray
std - returns the standard deviation of the elements in the
ndarray.
var - returns the variance of the elements in the ndarray.

[email protected]
Other useful ndarray methods
>>> a = np.array([[-2.5, 3.1, 7], [10, 11, 12]])

>>> for func in (a.min, a.max, a.sum, a.prod, a.std,

a.var):
print(func.__name__, "=", func())

min = -2.5
max = 12.0
sum = 40.6
prod = -71610.0
std = 5.08483584352
var = 25.8555555556
[email protected]
Summing across different axes
We can sum across different axes of a numpy array by
specifying the axis parameter of the sum function.

>>> c=np.arange(24).reshape(2,3,4)
>>> c
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],

[[12, 13, 14, 15],

[16, 17, 18, 19],
[20, 21, 22, 23]]])

[email protected]
Summing across different axes

>>> c.sum(axis=0) # sum across matrices

array([[12, 14, 16, 18],
[20, 22, 24, 26],
[28, 30, 32, 34]])

[email protected]
Transposing Matrices
The T attribute is equivalent to calling transpose() when the
rank is ≥2

>>> m1 = np.arange(6).reshape(2,3)
>>> m1
array([[0, 1, 2],
[3, 4, 5]])
>>> m1.T
array([[0, 3],
[1, 4],
[2, 5]])

[email protected]
Solving a system of linear scalar equations
The solve function solves a system of linear scalar equations,
such as:

2x + 6y = 6
5x + 3y = -9

[email protected]
Solving a system of linear scalar equations
>>> coeffs = np.array([[2, 6], [5, 3]])
>>> depvars = np.array([6, -9])
>>> solution = linalg.solve(coeffs, depvars)
>>> solution
array([-3., 2.])

[email protected]
Solving a system of linear scalar equations
Let’s check the solution.

>>> coeffs.dot(solution), depvars

(array([ 6., -9.]), array([ 6, -9]))

[email protected]
References

● NumPy
○ https://docs.scipy.org/doc/

[email protected]
Questions?
https://discuss.cloudxlab.com
[email protected]
Pandas

[email protected]
What is Pandas?

● One of the most widely used Python libraries in Data Science after
NumPy and Matplotlib
● The Pandas library Provides
○ High-performance
○ Easy-to-use data structures and
○ Data analysis tools

[email protected]
Pandas - DataFrame

● The main data structure is the DataFrame

● In memory 2D table

○ Like Spreadsheet with column names and row label

[email protected]
Pandas - Data Analysis

● Many features available in Excel are available programmatically like

○ Creating pivot tables

○ Computing columns based on other columns

○ Plotting graphs

[email protected]
Pandas - Data Structures

● Series objects

○ 1D array, similar to a column in a spreadsheet

● DataFrame objects

○ 2D table, similar to a spreadsheet

● Panel objects

○ Dictionary of DataFrames

[email protected]
Pandas - Series Objects

Creating a Series
>>> import pandas as pd
>>> s = pd.Series([2,-1,3,5])

Output -
0 2
1 -1
2 3
3 5
dtype: int64

[email protected]
Pandas - Series Objects

Pass as parameters to NumPy functions

>>> import numpy as np
>>> np.square(s)

Output -
0 4
1 1
2 9
3 25
dtype: int64

[email protected]
Pandas - Series Objects

Arithmetic operation on the series

>>> s + [1000,2000,3000,4000]

Output -
0 1002
1 1999
2 3003
3 4005
dtype: int64

[email protected]
Pandas - Series Objects

Broadcasting
>>> s + 1000

Output -
0 1002
1 999
2 1003
3 1005
dtype: int64

[email protected]
Pandas - Series Objects

Binary and conditional operations

>>> s < 0

Output -
0 False
1 True
2 False
3 False
dtype: bool

[email protected]
Pandas - Series Objects

Index labels - Integer location

>>> s2 = pd.Series([68, 83, 112, 68])
>>> print(s2)

Output -
0 68
1 83
2 112
3 68
dtype: int64

[email protected]
Pandas - Series Objects

Index labels - Set Manually

>>> s2 = pd.Series([68, 83, 112, 68],
index=["alice", "bob", "charles", "darwin"])
>>> print(s2)

Output -
alice 68
bob 83
charles 112
darwin 68
dtype: int64

[email protected]
Pandas - Series Objects

Access the items in series

● By specifying integer location

>>> s2[1]

● By specifying label

>>> s2["bob"]

[email protected]
Pandas - Series Objects

Access the items in series - Recommendations

● Use the loc attribute when accessing by label

>>> s2.loc["bob"]

● Use iloc attribute when accessing by integer location

>>> s2.iloc[1]

[email protected]
Pandas - Series Objects

Init from Python dict

>>> weights = {"alice": 68, "bob": 83, "colin": 86,

"darwin": 68}
>>> s3 = pd.Series(weights)
>>> print(s3)

Output -
alice 68
bob 83
colin 86
darwin 68
dtype: int64
[email protected]
Pandas - Series Objects

Control the elements to include and specify their order

>>> s4 = pd.Series(weights, index = ["colin", "alice"])

>>> print(s4)

Output -
colin 86
alice 68
dtype: int64

[email protected]
Pandas - Series Objects

Automatic alignment

● When an operation involves multiple Series objects

● Pandas automatically aligns items by matching index labels

[email protected]
Pandas - Series Objects

Automatic alignment - example

>>> print(s2+s3)
Output -
alice 136.0
bob 166.0
charles NaN
colin NaN
darwin 136.0
dtype: float64

* Note NaN

[email protected]
Pandas - Series Objects

Automatic alignment

Do not forget to set the right index labels, else you may get surprising
results
>>> s5 = pd.Series([1000,1000,1000,1000])
>>> print(s2 + s5)
Output-
alice NaN
bob NaN
charles NaN
darwin NaN
0 NaN
1 NaN
[email protected]
Pandas - Series Objects

Init with a scalar

>>> meaning = pd.Series(42, ["life", "universe",

"everything"])
>>> print(meaning)

Output-

life 42
universe 42
everything 42
dtype: int64

[email protected]
Pandas - Series Objects

Series name - A Series can have a name

>>> s6 = pd.Series([83, 68], index=["bob", "alice"],

name="weights")
>>> print(s6)

* Here series name is weights

Output-
bob 83
alice 68
Name: weights, dtype: int64

[email protected]
Pandas - Series Objects

Plotting a series

>>> %matplotlib inline

>>> import matplotlib.pyplot as plt
>>> temperatures =
[4.4,5.1,6.1,6.2,6.1,6.1,5.7,5.2,4.7,4.1,3.9,3.5]
>>> s7 = pd.Series(temperatures, name="Temperature")
>>> s7.plot()
>>> plt.show()

[email protected]
Pandas - DataFrame Objects

● A DataFrame object represents

○ A spreadsheet,
○ With cell values,
○ Column names
○ And row index labels

● Visualize DataFrame as dictionaries of Series

[email protected]
Pandas - DataFrame Objects

Creating a DataFrame - Pass a dictionary of Series objects

>>> people_dict = {
"weight": pd.Series([68, 83, 112],index=["alice",
"bob", "charles"]),

"birthyear": pd.Series([1984, 1985, 1992],

index=["bob", "alice", "charles"], name="year"),

"children": pd.Series([0, 3], index=["charles",

"bob"]),

"hobby": pd.Series(["Biking", "Dancing"],

index=["alice", "bob"]),
}
[email protected]
Pandas - DataFrame Objects

Creating a DataFrame

>>> people = pd.DataFrame(people_dict)

>>> people

[email protected]
Pandas - DataFrame Objects

Creating a DataFrame - Important Notes

● The Series were automatically aligned based on their index

● Missing values are represented as NaN
● Series names are ignored (the name "year" was dropped)

[email protected]
Pandas - DataFrame Objects

DataFrame - Access a column

>>> people["birthyear"]

Output -

alice 1985
bob 1984
charles 1992
Name: birthyear, dtype: int64

[email protected]
Pandas - DataFrame Objects

DataFrame - Access the multiple columns

>>> people[["birthyear", "hobby"]]

Output -

[email protected]
Pandas - DataFrame Objects

Creating DataFrame - Include columns and/or rows and

guarantee order

>>> d2 = pd.DataFrame(
people_dict,
columns=["birthyear", "weight", "height"],
index=["bob", "alice", "eugene"]
)
>>> print(d2)

[email protected]
Pandas - DataFrame Objects

DataFrame - Accessing rows

● Using loc
○ people.loc["charles"]
● Using iloc
○ People.iloc[2]
Output -
birthyear 1992
children 0
hobby NaN
weight 112
Name: charles, dtype: object
[email protected]
Pandas - DataFrame Objects

DataFrame - Get a slice of rows

>>> people.iloc[1:3]

Output -

[email protected]
Pandas - DataFrame Objects

DataFrame - Pass a boolean array

>>> people[np.array([True, False, True])]

Output -

[email protected]
Pandas - DataFrame Objects

DataFrame - Pass boolean expression

>>> people[people["birthyear"] < 1990]

Output -

[email protected]
Pandas - DataFrame Objects

DataFrame - Adding and removing columns

>>> # Adds a new column "age"

>>> people["age"] = 2016 - people["birthyear"]

>>> # Adds another column "over 30"

>>> people["over 30"] = people["age"] > 30

>>> # Removes "birthyear" and "children" columns

>>> birthyears = people.pop("birthyear")
>>> del people["children"]

>>> people

[email protected]
Pandas - DataFrame Objects

DataFrame - A new column must have the same number of rows

>>> # alice is missing, eugene is ignored

>>> people["pets"] = pd.Series({

"bob": 0,
"charles": 5,
"eugene":1
})

>>> people

[email protected]
Pandas - DataFrame Objects

DataFrame - Add a new column using insert method after an

existing column

>>> people.insert(1, "height", [172, 181, 185])

>>> people

[email protected]
Pandas - DataFrame Objects

DataFrame - Add new columns using assign method

>>> (people
.assign(body_mass_index = lambda df:df["weight"]
/ (df["height"] / 100) ** 2)
.assign(overweight = lambda df:
df["body_mass_index"] > 25)
)

[email protected]
Pandas - DataFrame Objects

DataFrame - Sorting a DataFrame

● Use sort_index method

○ It sorts the rows by their index label
○ In ascending order
○ Reverse the order by passing ascending=False
○ Returns a sorted copy of DataFrame

[email protected]
Pandas - DataFrame Objects

DataFrame - Sorting a DataFrame

>>> people.sort_index(ascending=False)

[email protected]
Pandas - DataFrame Objects

DataFrame - Sorting a DataFrame - inplace argument

>>> people.sort_index(inplace=True)
>>> people

[email protected]
Pandas - DataFrame Objects

DataFrame - Sorting a DataFrame - Sort By Value

>>> people.sort_values(by="age", inplace=True)

>>> people

[email protected]
Pandas - DataFrame Objects

Plotting a DataFrame

>>> people.plot(
kind = "line",
x = "body_mass_index",
y = ["height", "weight"]
)
>>> plt.show()

[email protected]
Pandas - DataFrame Objects

DataFrames - Saving and Loading

● Pandas can save DataFrames to various backends such as

○ CSV
○ Excel (requires openpyxl library)
○ JSON
○ HTML
○ SQL database

[email protected]
Pandas - DataFrame Objects

DataFrames - Saving

Let’s create a new DataFrame my_df and save it in various formats

>>> my_df = pd.DataFrame(

[
["Biking", 68.5, 1985, np.nan],
["Dancing", 83.1, 1984, 3]
],
columns=["hobby","weight","birthyear","children"],
index=["alice", "bob"]
)
>>> my_df

[email protected]
Pandas - DataFrame Objects

DataFrames - Saving

● Save to CSV
○ >>> my_df.to_csv("my_df.csv")
● Save to HTML
○ >>> my_df.to_html("my_df.html")
● Save to JSON
○ >>> my_df.to_json("my_df.json")

[email protected]
Pandas - DataFrame Objects

DataFrames - What was saved?

>>> for filename in ("my_df.csv", "my_df.html",

"my_df.json"):
print("#", filename)
with open(filename, "rt") as f:
print(f.read())
print()

[email protected]
Pandas - DataFrame Objects

DataFrames - What was saved?

Note that the index is saved as the first column (with no name) in a CSV file

[email protected]
Pandas - DataFrame Objects
DataFrames - What was saved?

Note that the index is saved as <th> tags in HTML

[email protected]
Pandas - DataFrame Objects

DataFrames - What was saved?

Note that the index is saved as keys in JSON

[email protected]
Pandas - DataFrame Objects

DataFrames - Loading

● read_csv # For loading CSV files

● read_html # For loading HTML files

● read_excel # For loading Excel files

[email protected]
Pandas - DataFrame Objects

DataFrames - Load CSV file

>>> my_df_loaded = pd.read_csv("my_df.csv", index_col=0)

>>> my_df_loaded

[email protected]
Pandas - DataFrame Objects

DataFrames - Overview

● When dealing with large DataFrames, it is useful to get a quick overview

of its content
● Load housing.csv inside dataset directory to create a DataFrame and
get a quick overview

[email protected]
Pandas - DataFrame Objects

DataFrames - Overview

● Let’s understand below methods

○ head()
○ tail()
○ info()
○ describe()

[email protected]
Pandas - DataFrame Objects

DataFrames - Overview - head()

● The head method returns the top 5 rows

>>> housing = pd.read_csv("dataset/housing.csv")

>>> housing.head()

[email protected]
Pandas - DataFrame Objects

DataFrames - Overview - tail()

● The tail method returns the bottom 5 rows

● We can also pass the number of rows we want

>>> housing.tail(n=2)

[email protected]
Pandas - DataFrame Objects

DataFrames - Overview - info()

● The info method prints out the summary of each column's contents

>>> housing.info()

[email protected]
Pandas - DataFrame Objects

DataFrames - Overview - describe()

● The describe method gives a nice overview of the main aggregated

values over each column
○ count: number of non-null (not NaN) values
○ mean: mean of non-null values
○ std: standard deviation of non-null values
○ min: minimum of non-null values
○ 25%, 50%, 75%: 25th, 50th and 75th percentile of non-null values
○ max: maximum of non-null values
[email protected]
References

● Pandas
○ http://pandas.pydata.org/pandas-docs/stable/

[email protected]
Questions?
https://discuss.cloudxlab.com
[email protected]
Matplotlib

[email protected]
Matplotlib - Overview

● Matplotlib is a Python 2D plotting library

● Produces publication quality figures in a variety of
○ Hardcopy formats and
○ Interactive environments

[email protected]
Matplotlib - Overview

● Matplotlib can be used in

○ Python scripts
○ Python and IPython shell
○ Jupyter notebook
○ Web application servers
○ GUI toolkits

[email protected]
Matplotlib - pyplot Module

● matplotlib.pyplot
○ Collection of functions that make matplotlib work like MATLAB
○ Majority of plotting commands in pyplot have MATLAB analogs with
similar arguments

[email protected]
Matplotlib - pyplot Module

● matplotlib.pyplot
○ Collection of functions that make matplotlib work like MATLAB
○ Majority of plotting commands in pyplot have MATLAB analogs with
similar arguments

[email protected]
Matplotlib - pyplot Module - plot()

>>> import matplotlib.pyplot as plt

>>> plt.plot([1,2,3,4])
>>> plt.ylabel('some numbers')
>>> plt.show()

[email protected]
Matplotlib - pyplot Module - plot()

plot x versus y
>>> import matplotlib.pyplot as plt
>>> plt.plot([1, 2, 3, 4], [1, 4, 9, 16])
>>> plt.ylabel('some numbers')
>>> plt.show()

[email protected]
Matplotlib - pyplot Module - Histogram

>>> import matplotlib.pyplot as plt

>>> x =
[21,22,23,4,5,6,77,8,9,10,31,32,33,34,35,36,37,18,49,50,
100]
>> num_bins = 5
>> plt.hist(x, num_bins, facecolor='blue')
>> plt.show()

[email protected]
References

● Matplotlib
○ https://matplotlib.org/tutorials/index.html

[email protected]
Questions?
https://discuss.cloudxlab.com
[email protected]

Python Final Print Vision 22
No ratings yet
Python Final Print Vision 22
112 pages
LAB Manual
No ratings yet
LAB Manual
100 pages
Python I Compiled Notes
100% (3)
Python I Compiled Notes
321 pages
Pythonfree PDF
100% (1)
Pythonfree PDF
77 pages
CSB4231 Python Programming Laboratory
No ratings yet
CSB4231 Python Programming Laboratory
82 pages
Lab Manual Python
No ratings yet
Lab Manual Python
114 pages
Introduction to Python Programming
No ratings yet
Introduction to Python Programming
48 pages
What Is Python?
No ratings yet
What Is Python?
28 pages
Intermediate Python Nanodegree Program Syllabus
No ratings yet
Intermediate Python Nanodegree Program Syllabus
10 pages
Python: A Beginner's Guide
No ratings yet
Python: A Beginner's Guide
126 pages
Python Programming
100% (2)
Python Programming
114 pages
Python Basics for Beginners
100% (1)
Python Basics for Beginners
57 pages
Python Programming Notes by CodingClub PDF
No ratings yet
Python Programming Notes by CodingClub PDF
141 pages
Python Enthusiasts' Advanced Reads
0% (1)
Python Enthusiasts' Advanced Reads
104 pages
003 12 Rules To Learn To Code 1
No ratings yet
003 12 Rules To Learn To Code 1
35 pages
Python Project Documentation: Release 1.0
No ratings yet
Python Project Documentation: Release 1.0
15 pages
A Python Book
No ratings yet
A Python Book
148 pages
Python for Data Analytics Essentials
No ratings yet
Python for Data Analytics Essentials
41 pages
Python Debugging Handbook - How To Debug Your Python Code
No ratings yet
Python Debugging Handbook - How To Debug Your Python Code
40 pages
Learning Python
100% (3)
Learning Python
210 pages
ML Notesv1
100% (2)
ML Notesv1
300 pages
Python Workshop for Beginners
No ratings yet
Python Workshop for Beginners
111 pages
CodeWithHarry Python Course Review
No ratings yet
CodeWithHarry Python Course Review
17 pages
Python Guide
No ratings yet
Python Guide
82 pages
00 01 Python Course Guide PDF
100% (1)
00 01 Python Course Guide PDF
148 pages
Django
No ratings yet
Django
126 pages
CBT Nugget - Python Programming Language Notes
No ratings yet
CBT Nugget - Python Programming Language Notes
95 pages
Python Programming
No ratings yet
Python Programming
89 pages
STAT 451: Intro To Machine Learning Lecture Notes
100% (1)
STAT 451: Intro To Machine Learning Lecture Notes
17 pages
Python Basics for Beginners
No ratings yet
Python Basics for Beginners
19 pages
Beginner's Guide to Python Basics
No ratings yet
Beginner's Guide to Python Basics
221 pages
SENG419-python 98745
No ratings yet
SENG419-python 98745
103 pages
Data Science Course for Programmers
No ratings yet
Data Science Course for Programmers
18 pages
Fundamentals of Python Programming
No ratings yet
Fundamentals of Python Programming
651 pages
Python
No ratings yet
Python
151 pages
Python Beyond Automate The Boring Stuff With Python - Real-World Automation & Mastery
No ratings yet
Python Beyond Automate The Boring Stuff With Python - Real-World Automation & Mastery
20 pages
Python Tutorial
No ratings yet
Python Tutorial
60 pages
OOP in Python-Textbok
No ratings yet
OOP in Python-Textbok
221 pages
6 Numpy VI
No ratings yet
6 Numpy VI
126 pages
Unit 3 Numpy
No ratings yet
Unit 3 Numpy
23 pages
Working With NumPy For Class 12th PDF
No ratings yet
Working With NumPy For Class 12th PDF
5 pages
NumPy Guide for Python Beginners
No ratings yet
NumPy Guide for Python Beginners
67 pages
NumPy Quickstart
No ratings yet
NumPy Quickstart
26 pages
Unit3 Notes
No ratings yet
Unit3 Notes
23 pages
Unit - Iii
No ratings yet
Unit - Iii
79 pages
Print
No ratings yet
Print
296 pages
Numpy
No ratings yet
Numpy
27 pages
Unit 3
No ratings yet
Unit 3
42 pages
UNIT-03 Numpy
No ratings yet
UNIT-03 Numpy
49 pages
Tentative NumPy Tutorial
No ratings yet
Tentative NumPy Tutorial
30 pages
Numpy
No ratings yet
Numpy
28 pages
Python Data Analysis for Beginners
No ratings yet
Python Data Analysis for Beginners
100 pages
python-notes-BCC-302 (Unit - 05)
No ratings yet
python-notes-BCC-302 (Unit - 05)
25 pages
Ch2 Numpy Pandas
No ratings yet
Ch2 Numpy Pandas
87 pages
Numpy
No ratings yet
Numpy
44 pages
NumPy Basics: Arrays and Operations
No ratings yet
NumPy Basics: Arrays and Operations
49 pages
Numpy ML - AI
No ratings yet
Numpy ML - AI
135 pages
NumPy Library and Function
No ratings yet
NumPy Library and Function
129 pages
NumPy Basics: Arrays and Usage
No ratings yet
NumPy Basics: Arrays and Usage
46 pages
(Chapman & Hall - CRC The Python Series) William J.B. Mattingly - Introduction To Python For Humanists-CRC Press - Chapman & Hall (2023)
No ratings yet
(Chapman & Hall - CRC The Python Series) William J.B. Mattingly - Introduction To Python For Humanists-CRC Press - Chapman & Hall (2023)
362 pages
32 TheGAMETRAPP ProjPost-editingNMTofResearchAbstr in A Gamified-Env WSeal
No ratings yet
32 TheGAMETRAPP ProjPost-editingNMTofResearchAbstr in A Gamified-Env WSeal
5 pages
10.1515 - CSH 2023 0015
No ratings yet
10.1515 - CSH 2023 0015
22 pages
Visual Text Analysis in Digital Humanities: Forum
No ratings yet
Visual Text Analysis in Digital Humanities: Forum
25 pages
Westfahl1993 - Neologism Science Fiction
No ratings yet
Westfahl1993 - Neologism Science Fiction
15 pages
Review of Computer-Assisted Translation
No ratings yet
Review of Computer-Assisted Translation
21 pages
The English Journal Volume 62 Issue 7 1973 (Doi 10.2307 - 813614) Friend, Beverly - Strange Bedfellows - Science Fiction Linguistics & Education
No ratings yet
The English Journal Volume 62 Issue 7 1973 (Doi 10.2307 - 813614) Friend, Beverly - Strange Bedfellows - Science Fiction Linguistics & Education
7 pages
Machine Learning For Sociology: Annual Review of Sociology
No ratings yet
Machine Learning For Sociology: Annual Review of Sociology
19 pages
Study Guide 1.2 - Limit of Algebraic Functions
No ratings yet
Study Guide 1.2 - Limit of Algebraic Functions
9 pages
Untitled
No ratings yet
Untitled
40 pages
2, Promag P300 - BA01393DEN - 0522-00
No ratings yet
2, Promag P300 - BA01393DEN - 0522-00
206 pages
I/A Series Hardware: ® Product Specifications
No ratings yet
I/A Series Hardware: ® Product Specifications
12 pages
CS402 Mcqs MidTerm by Vu Topper RM
No ratings yet
CS402 Mcqs MidTerm by Vu Topper RM
50 pages
1.3 OS Structures
No ratings yet
1.3 OS Structures
5 pages
DeltaX MCQ's
No ratings yet
DeltaX MCQ's
14 pages
Exercise 4A & 4B
No ratings yet
Exercise 4A & 4B
9 pages
Huawei GPON Portfolio
No ratings yet
Huawei GPON Portfolio
24 pages
Lenis Scale Manual
No ratings yet
Lenis Scale Manual
43 pages
Lecture Slide 1
No ratings yet
Lecture Slide 1
17 pages
Academic Calendar B.TECH III, V and VII SEMESTER 2024-25 (ODD)
No ratings yet
Academic Calendar B.TECH III, V and VII SEMESTER 2024-25 (ODD)
3 pages
Flutter Developer
No ratings yet
Flutter Developer
4 pages
Manual E-Learning HWS01
No ratings yet
Manual E-Learning HWS01
8 pages
HMT Pinjore Tractor Division Report
No ratings yet
HMT Pinjore Tractor Division Report
26 pages
(Cô Vũ Mai Phương) Đề Thi Thử THPT 2025 - THCS-THPT Nguyễn Khuyến - Bình Dương
No ratings yet
(Cô Vũ Mai Phương) Đề Thi Thử THPT 2025 - THCS-THPT Nguyễn Khuyến - Bình Dương
6 pages
Solar Installation Cost Estimate
No ratings yet
Solar Installation Cost Estimate
2 pages
The Neon Anarchist Cookbook - GM Binder
No ratings yet
The Neon Anarchist Cookbook - GM Binder
50 pages
Stainless Steel Design Guide
No ratings yet
Stainless Steel Design Guide
43 pages
AY2122 CON4122 Quiz
No ratings yet
AY2122 CON4122 Quiz
4 pages
5th Activity 2 SketchUp
No ratings yet
5th Activity 2 SketchUp
6 pages
HVAC Calculation Sheet
No ratings yet
HVAC Calculation Sheet
48 pages
03-Design of Plate Heat Exchanger
No ratings yet
03-Design of Plate Heat Exchanger
5 pages
Architectural Structure Final Project - Cantos, Talaisa, Villalobos - Compressed
No ratings yet
Architectural Structure Final Project - Cantos, Talaisa, Villalobos - Compressed
25 pages
Math Exam for Class IX Students
No ratings yet
Math Exam for Class IX Students
3 pages
The List
No ratings yet
The List
7 pages
TP1 I4EE B1 Group5 2025 ECN
No ratings yet
TP1 I4EE B1 Group5 2025 ECN
16 pages
Dense-Inception U-Net for Medical Segmentation
No ratings yet
Dense-Inception U-Net for Medical Segmentation
40 pages
School Module
No ratings yet
School Module
14 pages
Emt 11 - 12 Q1 0304 Aw1 FD
No ratings yet
Emt 11 - 12 Q1 0304 Aw1 FD
2 pages