0% found this document useful (0 votes)

13 views42 pages

Python File Semester-4

The document outlines a series of practical programming exercises focused on implementing basic Python libraries such as NumPy, SciPy, Matplotlib, and Pandas. Each exercise includes specific aims, code examples, and explanations of concepts like array creation, indexing, and data manipulation. The document serves as a tutorial for understanding and utilizing these libraries for data analysis and visualization.

Uploaded by

tonyalwyn28

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views42 pages

Python File Semester-4

Uploaded by

tonyalwyn28

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 42

INDEX

SR.NO. PRACTICALS REMARKS

1. WRITE A PROGRAM TO IMPLEMENT OF BASIC

PYTHON LIBRARIES-NUMPY, SCIPY.

2. WRITE A PROGRAM TO IMPLEMENT OF BASIC

PYTHON LIBRARIES-MATPLOTLIB, PANDAS,
SCIKITLEARN.
3. WRITE A PROGRAM TO CREATE SAMPLES FROM
POPULATION.

4. WRITE A PROGRAM TO EVALUATE MEAN, MEDIAN,

MODE OF DATASET.

5. WRITE A PROGRAM TO EVALUATE MEAN, MEDIAN,

MODE OF DATASET.

6. WRITE A PROGRAM TO IMPLEMENT MEASURE OF

SPREAD IN DATSET.

7. WRITE A PROGRAM TO IMPLEMENT PMF, PDF AND

CDF.

8. WRITE A PROGRAM TO IMPLEMENT DIFFERENT

VISUALIZATION TECHNIQUES ON SAMPLE
DATASET.

1
Sania Dixit(8522244)
Experiment 1

Aim:- Write a program to implement of Basic Python Libraries-numpy, scipy.

Code:-
NumPy:- It provides a high- performance multidimensional array object, and tools for
working with these arrays. It is the fundamental package for scientific computing with
Python. It contains various features including these important ones:
 A powerful N-dimensional array object
 Sophisticated (broadcasting) functions
 Tools for integrating C/C++ and Fortran code
 Useful linear algebra, Fourier transform, and random number capabilities

1. Arrays in NumPy: NumPy’s main object is the homogeneous multidimensional

array.
 It is a table of elements (usually numbers), all of the same type, indexed by a tuple
of positive integers.
 In NumPy dimensions are called axes. The number of axes is rank.
 NumPy’s array class is called ndarray. It is also known by the alias array.
Example :
[[ 1, 2, 3],
[ 4, 2, 5]]
Here,
rank = 2 (as it is 2-dimensional or it has 2 axes)
first dimension(axis) length = 2, second dimension has length = 3
overall shape can be expressed as: (2, 3)
2. Array creation: There are various ways to create arrays in NumPy.
 For example, you can create an array from a regular Python list or tuple using
the array function. The type of the resulting array is deduced from the type of the
elements in the sequences.
 Often, the elements of an array are originally unknown, but its size is known.
Hence, NumPy offers several functions to create arrays with initial placeholder
content. These minimize the necessity of growing arrays, an expensive
operation. For example: np.zeros, np.ones, np.full, np.empty, etc.
 To create sequences of numbers, NumPy provides a function analogous to
range that returns arrays instead of lists.

3. Array Indexing: Knowing the basics of array indexing is important for analysing
and manipulating the array object. NumPy offers many ways to do array indexing.

2
Sania Dixit(8522244)
 Slicing: Just like lists in python, NumPy arrays can be sliced. As arrays can be
multidimensional, you need to specify a slice for each dimension of the array.
 Integer array indexing: In this method, lists are passed for indexing for each
dimension. One to one mapping of corresponding elements is done to construct
a new arbitrary array.

4. Basic operations: Plethora of built-in arithmetic functions are provided in NumPy.

 Operations on single array: We can use overloaded arithmetic operators to
do element-wise operation on array to create a new array. In case of +=, -=, *=
operators, the existing array is modified.

 Unary operators: Many unary operations are provided as a method

of ndarray class. This includes sum, min, max, etc. These functions can also be
applied row-wise or column-wise by setting an axis parameter.

 Binary operators: These operations apply on array elementwise and a new

array is created. You can use all basic arithmetic operators like +, -, /, , etc. In
case of
+=, -=, = operators, the existing array is modified.

Program of NumPy showing implementation of these operations :-

import numpy as np

arr = np.array( [[ 1, 2, 3],[ 4, 2, 5]] )

# Printing type of arr object
print("Array is of type: ", type(arr))
print("No. of dimensions: ",
arr.ndim) # Printing shape of array
print("Shape of array: ", arr.shape)
# Printing size (total number of elements) of array
print("Size of array: ", arr.size)
# Printing type of elements in array
print("Array stores elements of type: ", arr.dtype)

#Creating array from list with type float

a = np.array([[1, 2, 4], [5, 8, 7]], dtype = 'float')
print ("Array created using passed list:\n", a)

# Create an array with random

values e = np.random.random((2,
2))
print ("\nA random array:\n", e)

# An exemplar array
arr = np.array([[-1, 2, 0, 4],[4, -0.5, 6, 0],[2.6, 0, 7, 8],[3, -7, 4, 2.0]])
# Slicing array
temp = arr[:2, ::2]
3
Sania Dixit(8522244)
print ("Array with first 2 rows and alternate" "columns(0 and 2):\n", temp)

4
Sania Dixit(8522244)
arr = np.array([[1, 5, 6],[4, 7, 2],[3, 1, 9]])

# maximum element of array

print ("Largest element is:", arr.max())
print ("Row-wise maximum elements:",arr.max(axis = 1))
# minimum element of array
print ("Column-wise minimum elements:",arr.min(axis = 0))
# sum of array elements
print ("Sum of all array elements:",arr.sum())
# cumulative sum along each row
print ("Cumulative sum along each row:\n",arr.cumsum(axis = 1))

a = np.array([[1, 2],[3, 4]])

b = np.array([[4, 3],[2, 1]])
# add arrays
print ("Array sum:\n", a + b)
# multiply arrays (elementwise multiplication)
print ("Array multiplication:\n", a*b)
# matrix multiplication
print ("Matrix multiplication:\n", a.dot(b))

Output-

5
Sania Dixit(8522244)
Scipy:-
SciPy is a collection of mathematical algorithms and convenience functions built on
the NumPy extension of Python. It adds significant power to the interactive Python
session by providing the user with high-level commands and classes for manipulating
and visualizing data. With SciPy, an interactive Python session becomes a data-
processing and system-prototyping environment rivaling systems, such as MATLAB,
IDL, Octave, R-Lab, and SciLab.

The additional benefit of basing SciPy on Python is that this also makes a powerful
programming language available for use in developing sophisticated programs and
specialized applications. Scientific applications using SciPy benefit from the
development of additional modules in numerous niches of the software landscape by
developers across the world. Everything from parallel programming to web and data-
base subroutines and classes have been made available to the Python programmer. All
of this power is available in addition to the mathematical libraries in SciPy.

Spatial Data:-

Spatial data refers to data that is represented in a geometric space.

E.g. points on a coordinate system.
We deal with spatial data problems on many tasks.
E.g. finding if a point is inside a boundary or not.
SciPy provides us with the module scipy.spatial, which has functions for working
with spatial data.

Triangulation:-

A Triangulation of a polygon is to divide the polygon into multiple triangles with which
we can compute an area of the polygon.
A Triangulation with points means creating surface composed triangles in which all of
the given points are on at least one vertex of any triangle in the surface.
One method to generate these triangulations through points is
the Delaunay() Triangulation
plt.scatter: This function is used to create a scatter plot. It takes two arguments: the x-
coordinates of the points (stored in points[:, 0]) and the y-coordinates of the points
(stored in points[:, 1]).
- points[:, 0]: This extracts the x-coordinates of all points in the points array.
- points[:, 1]: This extracts the y-coordinates of all points in the points array.
- color='r': This parameter specifies the color of the points. In this case, 'r' indicates red.

6
Sania Dixit(8522244)
Code:-

import numpy as np
from scipy.spatial import Delaunay
import matplotlib.pyplot as plt

points = np.array([
[2, 4],
[3, 4],
[3, 0],
[2, 2],
[4, 1]
])

simplices = Delaunay(points).simplices

plt.triplot(points[:, 0], points[:, 1], simplices)

plt.scatter(points[:, 0], points[:, 1], color='r')

plt.show()
Output:-

7
Sania Dixit(8522244)
Experiment 2

Aim:-Write a program to implement of Basic Python Libraries-matplotlib, pandas,

Scikitlearn.

Code:-
Python Pandas is defined as an open-source library that provides high-
performance data manipulation in Python. This tutorial is designed for both
beginners and professionals. It is used for data analysis in Python and developed by
Wes McKinney in 2008. Our Tutorial provides all the basic and advanced concepts of
Python Pandas, such as Numpy, Data operation and Time Series

Python Pandas Introduction

Pandas is defined as an open-source library that provides high-performance data
manipulation in Python. The name of Pandas is derived from the word Panel Data,
which means an Econometrics from Multidimensional data. It is used for data
analysis in Python and developed by Wes McKinney in 2008.
Data analysis requires lots of processing, such as restructuring, cleaning or merging,
etc. There are different tools are available for fast data processing, such as Numpy,
Scipy, Cython, and Panda.
Before Pandas, Python was capable for data preparation, but it only provided limited
support for data analysis. So, Pandas came into the picture and enhanced the
capabilities of data analysis. It can perform five significant steps required for
processing and analysis of data irrespective of the origin of the data, i.e., load,
manipulate, prepare, model, and analyze.
Key Features of Pandas
 It has a fast and efficient DataFrame object with the default and customized
indexing.
 Used for reshaping and pivoting of the data sets.
 Group by data for aggregations and transformations.
 It is used for data alignment and integration of the missing data.
 Provide the functionality of Time Series.
 Process a variety of data sets in different formats like matrix data, tabular
heterogeneous, time series.
 Handle multiple operations of the data sets such as subsetting, slicing,
filtering, groupBy, re-ordering, and re-shaping.

 Provides fast performance, and If you want to speed it, even more, you can use
the Cython.

8
Sania Dixit(8522244)
Benefits of Pandas
The benefits of pandas over using other language are as follows:
 Data Representation: It represents the data in a form that is suited for data
analysis through its DataFrame and Series.
 Clear code: The clear API of the Pandas allows you to focus on the core part
of the code. So, it provides clear and concise code for the user.
Python Pandas Data Structure
The Pandas provides two data structures for processing the data,
i.e., Series and DataFrame, which are discussed below:
1) Series
It is defined as a one-dimensional array that is capable of storing various data types.
The row labels of series are called the index. We can easily convert the list, tuple, and
dictionary into series using "series' method. A Series cannot contain multiple
columns. It has one parameter:
Data: It can be any list, dictionary, or scalar value.
Creating Series from Array:
Before creating a Series, Firstly, we have to import the numpy module and then use
array() function in the program.

import pandas as pd
import numpy as np
info = np.array(['P','a','n','d','a','s'])
a = pd.Series(info)
print(a)

Output

Explanation: In this code, firstly, we have imported the pandas and numpy library
with the pd and np alias. Then, we have taken a variable named "info" that consist of
an array of some values. We have called the info variable through a Series method
and defined it in an "a" variable. The Series has printed by calling the print(a)
method.
Python Pandas DataFrame
It is a widely used data structure of pandas and works with a two-dimensional array
with labeled axes (rows and columns). DataFrame is defined as a standard way to
store data and has two different indexes, i.e., row index and column index. It consists
of the following properties:
9
Sania Dixit(8522244)
 The columns can be heterogeneous types like int, bool, and so on.
 It can be seen as a dictionary of Series structure where both the rows and
columns are indexed. It is denoted as "columns" in case of columns and
"index" in case of rows.
Create a DataFrame using List:
We can easily create a DataFrame in Pandas using list.

import pandas as pd
x = ['Python', 'Pandas']
df = pd.DataFrame(x)
print(df)

Output

The Pandas Series can be defined as a one-dimensional array that is capable of storing
various data types. We can easily convert the list, tuple, and dictionary into series
using "series' method. The row labels of series are called the index. A Series cannot
contain multiple columns. It has the following parameter:
data: It can be any list, dictionary, or scalar value.
index: The value of the index should be unique and hashable. It must be of the same
length as data. If we do not pass any index, default np.arrange(n) will be used.
dtype: It refers to the data type of series.
copy: It is used for copying the data.
Creating a Series:
We can create a Series in two ways:
Create an empty Series
Create a Series using inputs.
Create an Empty Series:
We can easily create an empty series in Pandas which means it will not have any value.
The syntax that is used for creating an Empty Series:
<series object> = pandas.Series()
The below example creates an Empty Series type object that has no values and having
default datatype, i.e., float64.
Example

10
Sania Dixit(8522244)
import pandas as pd
x = pd.Series()
print (x)

Output

Creating a Series using inputs:

We can create Series by using various inputs:
Array
Dict
Scalar value
Creating Series from Array:
Before creating a Series, firstly, we have to import the numpy module and then use
array() function in the program. If the data is ndarray, then the passed index must be
of the same length.
If we do not pass an index, then by default index of range(n) is being passed where n
defines the length of an array, i.e., [0,1,2,. range(len(array))-1].
Example

import pandas as pd
import numpy as np
info = np.array(['P','a','n','d','a','s'])
a = pd.Series(info)
print(a)

Output

Create a Series from dict

We can also create a Series from dict. If the dictionary object is being passed as an
input and the index is not specified, then the dictionary keys are taken in a sorted
order to construct the index.
If index is passed, then values correspond to a particular label in the index will be
extracted from the dictionary.

11
Sania Dixit(8522244)
import the pandas library
import pandas as pd
import numpy as np
info = {'x' : 0., 'y' : 1., 'z' : 2.}
a = pd.Series(info)
print (a)

Output

Create a Series using Scalar:

If we take the scalar values, then the index must be provided. The scalar value will be
repeated for matching the length of the index.

import pandas library

import pandas as pd
import numpy as np
x = pd.Series(4, index=[0, 1, 2, 3])
print (x)

Output

Accessing data from series with Position:

Once you create the Series type object, you can access its indexes, data, and even
individual elements.
The data in the Series can be accessed similar to that in the ndarray.

import pandas as pd
x = pd.Series([1,2,3],index = ['a','b','c'])
print (x[0])
Output

12
Sania Dixit(8522244)
Retrieving Index array and data array of a series object
We can retrieve the index array and data array of an existing Series object by using the
attributes index and values.

import numpy as np
import pandas as pd
x=pd.Series(data=[2,4,6,8])
y=pd.Series(data=[11.2,18.6,22.5], index=['a','b','c'])
print(x.index)
print(x.values)
print(y.index)
print(y.values)

Output

What is Matplotlib?

Matplotlib is a low level graph plotting library in python that serves as a

visualization utility.Matplotlib was created by John D. Hunter.Matplotlib is open
source and we can use it freely.Matplotlib is mostly written in python, a few
segments are written in C, Objective-C and Javascript for Platform compatibility.

Installation of Matplotlib

If you have Python and PIP already installed on a system, then installation of
Matplotlib is very easy.

Install it using this command:

C:\Users\Your Name>pip install matplotlib

If this command fails, then use a python distribution that already has Matplotlib
installed, like Anaconda, Spyder etc.

13
Sania Dixit(8522244)
Import Matplotlib

Once Matplotlib is installed, import it in your applications by adding

the import module statement:

import matplotliball matplotlib

Now Matplotlib is imported and ready to use:

Checking Matplotlib Version

The version string is stored under version attribute.

import matplotlib

print(matplotlib. version )
Pyplot

Most of the Matplotlib utilities lies under the pyplot submodule, and are usually
imported under the plt alias:

import matplotlib.pyplot as plt

Now the Pyplot package can be referred to as plt.

import matplotlib.pyplot as plt

import numpy as np

xpoints = np.array([0, 6])

ypoints = np.array([0, 250])

plt.plot(xpoints, ypoints)

plt.show()

Result:-

14
Sania Dixit(8522244)
Plotting x and y points

The plot() function is used to draw points (markers) in a diagram.

By default, the plot() function draws a line from point to point.

The function takes parameters for specifying points in the diagram.

Parameter 1 is an array containing the points on the x-axis.

Parameter 2 is an array containing the points on the y-axis.

If we need to plot a line from (1, 3) to (8, 10), we have to pass two arrays [1, 8] and
[3, 10] to the plot function.

Draw a line in a diagram from position (1, 3) to position (8, 10):

import matplotlib.pyplot as plt

import numpy as np

xpoints = np.array([1, 8])

ypoints = np.array([3, 10])

plt.plot(xpoints, ypoints)

plt.show()

Result:

15
Sania Dixit(8522244)
Plotting Without Line

To plot only the markers, you can use shortcut string notation parameter 'o', which
means 'rings'.

Example

Draw two points in the diagram, one at position (1, 3) and one in position (8, 10):

import matplotlib.pyplot as plt

import numpy as np

xpoints = np.array([1, 8])

ypoints = np.array([3, 10])

plt.plot(xpoints, ypoints, 'o')

plt.show()

Result:

16
Sania Dixit(8522244)
Multiple Points

You can plot as many points as you like, just make sure you have the same number of
points in both axis.

Example

Draw a line in a diagram from position (1, 3) to (2, 8) then to (6, 1) and finally to
position (8, 10):

import matplotlib.pyplot as plt

import numpy as np

xpoints = np.array([1, 2, 6, 8])

ypoints = np.array([3, 8, 1, 10])

17
Sania Dixit(8522244)
plt.plot(xpoints, ypoints)

plt.show()

Result:

Default X-Points

If we do not specify the points on the x-axis, they will get the default values 0, 1, 2, 3
(etc., depending on the length of the y-points.

So, if we take the same example as above, and leave out the x-points, the diagram will
look like this:

Example

Plotting without x-points:

import matplotlib.pyplot as plt

import numpy as np

ypoints = np.array([3, 8, 1, 10, 5, 7])

plt.plot(ypoints)

18
Sania Dixit(8522244)
plt.show()

Result:

Markers

You can use the keyword argument marker to emphasize each point with a specified
marker:

Mark each point with a circle:

import matplotlib.pyplot as plt

import numpy as np

ypoints = np.array([3, 8, 1, 10])

plt.plot(ypoints, marker = 'o')

plt.show()

Result:

19
Sania Dixit(8522244)
Mark each point with a circle:

import matplotlib.pyplot as

plt import numpy as np

ypoints = np.array([3, 8, 1, 10])

plt.plot(ypoints, marker = 'o')

plt.show()

Result:

20
Sania Dixit(8522244)
Experiment -3

Aim:- Write a program to create samples from population.

Code:-
sample() is an inbuilt function of random module in Python that returns a
particular length list of items chosen from the sequence i.e. list, tuple, string or set.
Used for random sampling without replacement.
Syntax : random.sample(sequence, k)
Parameters:
sequence: Can be a list, tuple, string, or set.
k: An Integer value, it specify the length of a sample.
Returns: k length new list of elements chosen from the sequence.

Code #1: Simple implementation of sample() function.

# By the use of sample() function .

from random import sample

# Prints list of random items of given length

list1 = [1, 2, 3, 4, 5]

print(sample(list1,3))
Output:
[2, 3, 5]

Code #2: Basic use of sample() function.

# the use of sample() function .

# import random
import random
# Prints list of random items of
list1 = [1, 2, 3, 4, 5, 6]
print("With list:", random.sample(list1, 3))

string = "This is a python program"

print("With string:", random.sample(string, 4))

# Prints list of random items of

# length 4 from the given tuple.
tuple1 = ("artificial", "machine", "computer", "science", "portal", "scientist", "btech")

21
Sania Dixit(8522244)
print("With tuple:", random.sample(tuple1, 4))

# Prints list of random items of

# length 3 from the given set.
set1 = {"a", "b", "c", "d", "e"}
print("With set:", random.sample(set1, 3))

Output:
With list: [3, 1, 2]
With string: ['i', 's', 'T', 'p']
With tuple: ['artificial', 'portal', 'machine', 'computer']
With set: ['b', 'd', 'c']

22
Sania Dixit(8522244)
Experiment: 4

Aim:- Write a program to evaluate Mean, Median, Mode of

dataset. Code:-
Mean, median, and mode are fundamental topics of statistics. You can easily calculate
them in Python, with and without the use of external libraries.
These three are the main measures of central tendency. The central tendency lets us
know the “normal” or “average” values of a dataset.

Calculating the Mean in Python

The mean or arithmetic average is the most used measure of central tendency.
Remember that central tendency is a typical value of a set of data.
A dataset is a collection of data, therefore a dataset in Python can be any of the
following built-in data structures:
• Lists, tuples, and sets: a collection of objects
• Strings: a collection of characters
• Dictionary: a collection of key-value pairs
Note: Altought there are other data structures in Python
like queues or stacks, we’ll be using only the built-in ones.
We can calculate the mean by adding all the values of a dataset and dividing the result
by the number of values. For example, if we have the following list of numbers:
[1, 2, 3, 4, 5, 6]
The mean or average would be 3.5 because the sum of the list is 21 and its length
is 6. Twenty-one divided by six is 3.5. You can perform this calculation with the
below calculation:
(1 + 2 + 3 + 4 + 5 + 6) / 6 = 21
Copy
In this tutorial, we’ll be using the players of a basketball team as our sample data.

Creating a Custom Mean Function

Let’s start by calculating the average (mean) age of the players in a basketball
Program:

# calculating the mean in python

# without using the library

def cal_mean(dataset):
return sum(dataset)/len(dataset)

dataset = [12,2,34,34,54,54,3,23,42,33,4,3,2,3]
print("The Mean of the given dataset is: ",cal_mean(dataset))

Output:

The Mean of the given dataset is: 21.642857142857142

23
Sania Dixit(8522244)
Using mean() from the Python Statistic Module
Calculating measures of central tendency is a common operation for most developers.
That’s because Python’s statistics module provides diverse functions to calculate
them, along with other basic statistics topics.

Program:

from statistics import mean

print("Mean using the python library or built in function: ",mean(dataset))

dataset_median = [12,2,33,4,45,34,32,56,45,56]

Output:

Mean using the python library or built in function: 21.642857142857142

Finding the Median in Python

The median is the middle value of a sorted dataset. It is used — again — to provide a
“typical” value of a determined population.
In programming, we can define the median as the value that separates a sequence into
two parts — The lower half and the higher half —.
To calculate the median, first, we need to sort the dataset. We could do this
with sorting algorithms or using the built-in function sorted(). The second step is to
determine whether the dataset length is odd or even. Depending on this some of the
following process:
• Odd: The median is the middle value of the dataset
• Even: The median is the sum of the two middle values divided by two

Creating a Custom Median Function

Let’s implement the above concept into a Python function.
Remember the three steps we need to follow to get the median of a dataset:
• Sort the dataset: We can do this with the sorted() function
• Determine if it’s odd or even: We can do this by getting the length of
the dataset and using the modulo operator (%)
• Return the median based on each case:
• Odd: Return the middle value
• Even: Return the average of the two middle values
That would result in the following function:

dataset_median = [12,2,33,4,45,34,32,56,45,56]

def cal_median(dataset):
n = len(dataset)
if (n%2==0):
n = n/2
n = int(n)

dataset.sort()
x = (dataset[n-1] + dataset[n])/2

24
Sania Dixit(8522244)
return x
else:
n = (n+1)/2
n = int(n)
dataset.sort()
return dataset[n-1]

print("The median of the given dataset is: ",cal_median(dataset_median))

dataset_median.sort()
print("The sorted dataset is: ",dataset_median)

Output:

The median of the given dataset is: 33.5

The sorted dataset is: [2, 4, 12, 32, 33, 34, 45, 45, 56, 56]

Using median() from the Python Statistic Module

This way is much simpler because we’re using an already existent function from the
statistics module.

from statistics import median

print("The median of the given dataset using python library is:
",median(dataset_median))

Output:
The median of the given dataset using python library is: 33.5

25
Sania Dixit(8522244)
Experiment -5

Aim:- Write a program to implement Central Limit Theorem in dataset.

Code:-
Where represents the sampling distribution of the sample mean of size n each,
and are the mean and standard deviation of the population respectively.

The distribution of the sample tends towards the normal distribution as the sample
size increases.
It is evident from the graphs that as we keep on increasing the sample size from 1 to
100 the histogram tends to take the shape of a normal distribution.

Rule of thumb:
Of course, the term “large” is relative. Roughly, the more “abnormal” the basic
distribution, the larger n must be for normal approximations to work well. The rule
of thumb is that a sample size n of at least 30 will suffice.

Why is this important?

The answer to this question is very simple, as we can often use well developed
statistical inference procedures that are based on a normal distribution such as 68-
95-99.7 rule and many others, even if we are sampling from a population that is not
normal, provided we have a large sample size.
Program for Python implementation of the Central Limit Theorem

import numpy
import matplotlib.pyplot as plt

# number of sample
num = [1, 10, 50, 100]
# list of sample means
means = []

# Generating 1, 10, 30, 100 random numbers from -40 to 40

# taking their mean and appending it to list means.
for j in num:
# Generating seed so that we can get same result
# every time the loop is run...
numpy.random.seed(1)
x = [numpy.mean(
numpy.random.randint(
-40, 40, j)) for _i in range(1000)]
means.append(x)
k=0

# plotting all the means in one figure

fig, ax = plt.subplots(2, 2, figsize =(8, 8))
for i in range(0, 2):
26
Sania Dixit(8522244)
for j in range(0, 2)
# Histogram for each x stored in means
ax[i,j].hist(means[k], 10, density = True)
ax[i,j].set_title(label = num[k])
k=k+1
plt.show()

27
Sania Dixit(8522244)
Experiment : 6
Aim:- Write a program to implement Measure of Spread in datset.

Code:-

How To Get Measures Of Spread With Python

Measures of spread tell how spread the data points are. Some examples of measures
of spread are quantiles, variance, standard deviation and mean absolute deviation.

In this excercise we are going to get the measures of spread using python.

Quantiles are values that split sorted data or a probability distribution into equal parts.
There several different types of quantlies, here are some of the examples:

1. Quartiles - Divides the data into 4 equal parts.

2. Quintiles - Divides the data into 5 equal parts.

3. Deciles - Divides the data into 10 equal parts

4. Percentiles - Divides the data into 100 equal

parts

Code:-

import numpy as np

# Define a function to calculate quartiles

def calculate_quartiles(data):

quartiles = np.percentile(data, [25, 50, 75])

return quartiles

# Define a function to calculate quantiles

def calculate_quantiles(data, q):

quantiles = np.percentile(data, q)

return quantiles

28
Sania Dixit(8522244)
# Define a function to calculate percentiles

def calculate_percentiles(data, p):

percentiles = np.percentile(data, p)

return percentiles

# Example data

data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

# Calculate quartiles

quartiles = calculate_quartiles(data)

print("Quartiles:", quartiles)

# Calculate quantiles (e.g., deciles)

quantiles = calculate_quantiles(data, [10, 20, 30, 40, 50, 60, 70, 80, 90])
print("Quantiles:", quantiles)

# Calculate percentiles

percentiles = calculate_percentiles(data, [10, 20, 30, 40, 50, 60, 70, 80, 90])
print("Percentiles:", percentiles)

Output:-

Quartiles: [3.25 5.5 7.75]

Quantiles: [1.9 2.8 3.7 4.6 5.5 6.4 7.3 8.2 9.1]
Percentiles: [1.9 2.8 3.7 4.6 5.5 6.4 7.3 8.2 9.1]

29
Sania Dixit(8522244)
1. Mean Absolute Deviation

This is the average of the distance between each data point and the mean of the data.

Let's find the mean absolute distance of the scores

Code:-

import numpy as np

# Define a function to calculate Mean Absolute Deviation (MAD)

def mean_absolute_deviation(data):
mean = np.mean(data)
mad = np.mean(np.abs(data - mean))
return mad

# Example data
data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

# Calculate Mean Absolute Deviation

mad_value = mean_absolute_deviation(data)
print("Mean Absolute Deviation:", mad_value)

Output:-

Mean Absolute Deviation: 2.5

30
Sania Dixit(8522244)
Experiment : 7

Aim:- Write a program to implement pmf, pdf and cdf.

Code:-
This code demonstrates how to generate and visualize the Probability
Mass Function (PMF), Probability Density Function (PDF), and
Cumulative Distribution Function (CDF) for a normal distribution. You
can adjust parameters like mu and sigma to explore different
distributions, and modify the range and number of points in x to
change the granularity of the plots.
1. Defining Plotting Functions:
 plot_pmf(x, pmf, title): This function plots the Probability Mass Function
(PMF). It uses plt.stem() to plot the stem plot, where x represents the data
points, pmf represents the probability mass function values, and title is the
title of the plot.
 plot_pdf(x, pdf, title): This function plots the Probability Density Function
(PDF). It uses plt.plot() to plot the PDF curve, where x represents the data
points, pdf represents the probability density function values, and title is the
title of the plot.
 plot_cdf(x, cdf, title): This function plots the Cumulative Distribution
Function (CDF). It uses plt.plot() to plot the CDF curve, where x represents
the data points, cdf represents the cumulative distribution function values,
and title is the title of the plot.

Code:-

import numpy as np
import matplotlib.pyplot as plt

def pmf(values, probabilities, x):

if x in values:
index = values.index(x)
return probabilities[index]
else:
return 0

def pdf(func, x, dx=1e-6):

return (func(x + dx) - func(x)) / dx

def cdf(values, probabilities, x):

cdf_value = 0
for i in range(len(values)):
if values[i] <= x:
cdf_value += probabilities[i]
else:
break
return cdf_value
31
Sania Dixit(8522244)
# Example usage
values = [1, 2, 3, 4, 5]
probabilities = [0.1, 0.2, 0.3, 0.2, 0.2]

# Plot PMF
plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1)
plt.bar(values, probabilities)
plt.title('Probability Mass Function (PMF)')
plt.xlabel('Values')
plt.ylabel('Probability')

# Plot CDF
plt.subplot(1, 2, 2)
cdf_values = [cdf(values, probabilities, val) for val in values]
plt.plot(values, cdf_values, marker='o')
plt.title('Cumulative Distribution Function (CDF)')
plt.xlabel('Values')
plt.ylabel('CDF')

plt.tight_layout()
plt.show()

Output:-

32
Sania Dixit(8522244)
Experiment : 8

Aim:- Write a program to implement different visualization techniques on sample

dataset.

Code:-

Packages of Data Visualization in Python

In today’s world, a lot of data is being generated on a daily basis. And sometimes to
analyze this data for certain trends, patterns may become difficult if the data is in its
raw format. To overcome this data visualization comes into play. Data visualization
provides a good, organized pictorial representation of the data which makes it easier
to understand, observe, analyze. In this tutorial, we will discuss how to visualize data
using Python.
Python provides various libraries that come with different features for visualizing data.
All these libraries come with different features and can support various types of
graphs. In this tutorial, we will be discussing four such libraries.
• Matplotlib
• Seaborn
• Bokeh
• Plotly
We will discuss these libraries one by one and will plot some most commonly used
graphs.

Tips Database

Tips database is the record of the tip given by the customers in a restaurant for two
and a half months in the early 1990s. It contains 6 columns such as total_bill, tip,
sex, smoker, day, time, size.
Data Visualization
Data visualization is a branch of data analysis that focuses on visualizing data. It plots
data graphically and is a good way to communicate data inferences.
The human brain has an easy and fast processing time and understanding of data when
presented in pictures, maps, and graphs. We can always get a visual description of our
data by using data visualization. Data visualization is important in representing both
big and small data sets. However, it is especially useful when dealing with large data
sets where it is impossible to see all our records, process, and understand them.
Data Visualization in Python
In today's world, data visualization in Python is probably one of the most widely used
functionalities for data science with Python. Python libraries include various features
that allow users to create highly customized, classy, and interactive plots. Python
includes several plotting libraries, such as Matplotlib, Seaborn, and other data
visualization packages. Each has its own set of features for constructing informative,
customized, and intriguing plots to present information most simply and effectively
possible.
Python has several libraries for visualizing data, each with its features. Four of these
libraries will be discussed in this tutorial. Each of these libraries has its own set of
33
Sania Dixit(8522244)
functionalities and can support a variety of graph types.
• Matplotlib

• Seaborn

• Bokeh

• Plotly
We'll go over each of these libraries one after another.
Useful Python Packages for Visualizations
Matplotlib
Matplotlib is a Python visualization library for 2D array plots. Matplotlib is a Python
library that uses the NumPy library. It works with Python and IPython shells, as well
as Jupyter notebooks and web application server software. Matplotlib includes a wide
range of plots, such as scatter, line, bar, histogram, and others, that can assist us in
delving deeper into trends, behavioral patterns, and correlations. John Hunter first
launched it in 2002.
Example:
import pandas as pd

# reading the database

data = pd.read_csv("tips.csv")

# printing the top 10 rows

print(data.head(10))

Output:

34
Sania Dixit(8522244)
Matplotlib

Matplotlib is an easy-to-use, low-level data visualization library that is built on

NumPy arrays. It consists of various plots like scatter plot, line plot, histogram, etc.
Matplotlib provides a lot of flexibility.
To install this type the below command in the terminal.
pip install matplotlib

Scatter Plot

Scatter plots are used to observe relationships between variables and uses dots to
represent the relationship between them. The scatter() method in the matplotlib
library is used to draw a scatter plot.
Example:
import pandas as pd
import matplotlib.pyplot as plt

# reading the database

data = pd.read_csv("tips.csv")

# Scatter plot with day against tip

plt.scatter(data['day'], data['tip'])

# Adding Title to the Plot

plt.title("Scatter Plot")

# Setting the X and Y labels

plt.xlabel('Day')
plt.ylabel('Tip')

plt.show()

35
Sania Dixit(8522244)
Output:

This graph can be more meaningful if we can add colors and also change the size of
the points. We can do this by using the c and s parameter respectively of the scatter
function. We can also show the color bar using the colorbar() method.
Example:
import pandas as pd
import matplotlib.pyplot as plt

# reading the database

data = pd.read_csv("tips.csv")

# Scatter plot with day against tip

plt.scatter(data['day'], data['tip'], c=data['size'],
s=data['total_bill'])

# Adding Title to the Plot

plt.title("Scatter Plot")

# Setting the X and Y labels

plt.xlabel('Day')
plt.ylabel('Tip')

36
Sania Dixit(8522244)
plt.colorbar()

Output:

Line Chart

Line Chart is used to represent a relationship between two data X and Y on a

different axis. It is plotted using the plot() function. Let’s see the below example.
Example:
import pandas as pd
import matplotlib.pyplot as plt

# reading the database

data = pd.read_csv("tips.csv")

# Scatter plot with day against tip

plt.plot(data['tip'])
plt.plot(data['size'])

# Adding Title to the Plot

plt.title("Scatter Plot")

# Setting the X and Y labels

plt.xlabel('Day')
plt.ylabel('Tip')

plt.show()
37
Sania Dixit(8522244)
Output:

Bar Chart

A bar plot or bar chart is a graph that represents the category of data with rectangular
bars with lengths and heights that is proportional to the values which they represent.
It can be created using the bar() method.
Example:
import pandas as pd
import matplotlib.pyplot as plt

# reading the database

data = pd.read_csv("tips.csv")

# Bar chart with day against tip

plt.bar(data['day'], data['tip'])

plt.title("Bar Chart")
# Setting the X and Y labels
plt.xlabel('Day')
plt.ylabel('Tip')
38
Sania Dixit(8522244)
# Adding the legends
plt.show()
Output:

Histogram

A histogram is basically used to represent data in the form of some groups. It is a

type of bar plot where the X-axis represents the bin ranges while the Y-axis gives
information about frequency. The hist() function is used to compute and create a
histogram. In histogram, if we pass categorical data then it will automatically
compute the frequency of that data i.e. how often each value occurred.
Example
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv("tips.csv")

# histogram of total_bills
plt.hist(data['total_bill'])

plt.title("Histogram")
# Adding the legends
plt.show()

39
Sania Dixit(8522244)
Output:

Seaborn
Seaborn is a high-level interface built on top of the Matplotlib. It provides beautiful
design styles and color palettes to make more attractive graphs.
To install seaborn type the below command in the terminal.
pip install seaborn

Seaborn is built on the top of Matplotlib, therefore it can be used with the Matplotlib
as well. Using both Matplotlib and Seaborn together is a very simple process. We
just have to invoke the Seaborn Plotting function as normal, and then we can use
Matplotlib’s customization function.
Note: Seaborn comes loaded with dataset such as tips, iris, etc. but for the sake of
this tutorial we will use Pandas for loading these datasets.
Example:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

data = pd.read_csv("tips.csv")

# draw lineplot
sns.lineplot(x="sex", y="total_bill", data=data)
# setting the title using Matplotlib
plt.title('Title using Matplotlib Function')

plt.show()
40
Sania Dixit(8522244)
Output:

Scatter Plot

Scatter plot is plotted using the scatterplot() method. This is similar to Matplotlib,
but additional argument data is required.
Example:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv("tips.csv")
sns.scatterplot(x='day', y='tip', data=data,)
plt.show()
Output:

41
Sania Dixit(8522244)
You will find that while using Matplotlib it will a lot difficult if you want to color
each point of this plot according to the sex. But in scatter plot it can be done with
the help of hue argument.
Example:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

# reading the database

data = pd.read_csv("tips.csv")

sns.scatterplot(x='day', y='tip', data=data,

hue='sex')
plt.show()
Output:

42
Sania Dixit(8522244)

Ch2 Numpy Pandas
No ratings yet
Ch2 Numpy Pandas
87 pages
Unit 2
No ratings yet
Unit 2
25 pages
Ex 1
No ratings yet
Ex 1
6 pages
Python Unit 4
No ratings yet
Python Unit 4
43 pages
Int246 L1
No ratings yet
Int246 L1
25 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
42 pages
Numpy 2
No ratings yet
Numpy 2
24 pages
An Introduction To Numpy and Scipy by Scott Shell
No ratings yet
An Introduction To Numpy and Scipy by Scott Shell
24 pages
Unit 1
No ratings yet
Unit 1
170 pages
Numerical Python Numpy
No ratings yet
Numerical Python Numpy
28 pages
NumPy Basics and Array Operations
No ratings yet
NumPy Basics and Array Operations
73 pages
Basic Array Creation and Operations
No ratings yet
Basic Array Creation and Operations
27 pages
Python Lectures
No ratings yet
Python Lectures
29 pages
Numpy
No ratings yet
Numpy
64 pages
NumPy Notes
No ratings yet
NumPy Notes
13 pages
NumPy Guide for Python Beginners
No ratings yet
NumPy Guide for Python Beginners
67 pages
Unit 3
No ratings yet
Unit 3
56 pages
Python Numpy
100% (1)
Python Numpy
31 pages
NumPy Basics and Operations Guide
No ratings yet
NumPy Basics and Operations Guide
53 pages
NumPy Guide for Python Developers
No ratings yet
NumPy Guide for Python Developers
23 pages
Python Presentation 3
No ratings yet
Python Presentation 3
44 pages
13 - NumPy
No ratings yet
13 - NumPy
46 pages
Exp 12 Pyt
No ratings yet
Exp 12 Pyt
7 pages
Numpy Python
No ratings yet
Numpy Python
36 pages
Unit8 DataAnalyticsandVisualizationpdf 2023 10 17 09 16 46
No ratings yet
Unit8 DataAnalyticsandVisualizationpdf 2023 10 17 09 16 46
64 pages
Numpy
No ratings yet
Numpy
4 pages
Unit II - Final
No ratings yet
Unit II - Final
37 pages
Numpy For Mathematical Computing
No ratings yet
Numpy For Mathematical Computing
41 pages
Introduction To NumPy
No ratings yet
Introduction To NumPy
27 pages
HKU - 7001 - 3.2 Managing Data II
No ratings yet
HKU - 7001 - 3.2 Managing Data II
67 pages
15 Numpy
No ratings yet
15 Numpy
32 pages
Experiment 3
No ratings yet
Experiment 3
3 pages
NumPy Essentials for Data Scientists
100% (1)
NumPy Essentials for Data Scientists
27 pages
Unit Vi
No ratings yet
Unit Vi
60 pages
FUNDAMENTALS OF DATA SCIENCE LAB - Jupyter Notebook
No ratings yet
FUNDAMENTALS OF DATA SCIENCE LAB - Jupyter Notebook
48 pages
Numpy
No ratings yet
Numpy
23 pages
Data Science Using Python Lab Manual
No ratings yet
Data Science Using Python Lab Manual
68 pages
Numpy Pyhton Tutorial
No ratings yet
Numpy Pyhton Tutorial
28 pages
Python 2.1.1
No ratings yet
Python 2.1.1
7 pages
New Chat
No ratings yet
New Chat
30 pages
NumPy Array Operations Guide
No ratings yet
NumPy Array Operations Guide
14 pages
Advanced NumPy Broadcasting and Strides Guide
No ratings yet
Advanced NumPy Broadcasting and Strides Guide
21 pages
Python 5th Sem
No ratings yet
Python 5th Sem
33 pages
Python Sem V Portion 2
No ratings yet
Python Sem V Portion 2
29 pages
55-102 Numpy
No ratings yet
55-102 Numpy
48 pages
Comprehensive NumPy Guide for Python
No ratings yet
Comprehensive NumPy Guide for Python
30 pages
45B AIML Practical1.1
No ratings yet
45B AIML Practical1.1
57 pages
02 Appendix 2 Python Packages
No ratings yet
02 Appendix 2 Python Packages
25 pages
Python for Numerical Methods
No ratings yet
Python for Numerical Methods
34 pages
DS 4 1 Unit 2
No ratings yet
DS 4 1 Unit 2
60 pages
Hands On Intro Prog II 24CSA 115 Lab5 Numpy
No ratings yet
Hands On Intro Prog II 24CSA 115 Lab5 Numpy
10 pages
NumPy Essentials for Beginners
No ratings yet
NumPy Essentials for Beginners
19 pages
Python Unit 3
No ratings yet
Python Unit 3
38 pages
Data Science Handwritten Notes - 3
No ratings yet
Data Science Handwritten Notes - 3
26 pages
Dsa Cheatsheet Mastery 1695728264
No ratings yet
Dsa Cheatsheet Mastery 1695728264
22 pages
Python Data Science Certificate Course
No ratings yet
Python Data Science Certificate Course
5 pages
C Programming & Data Structures Lab
No ratings yet
C Programming & Data Structures Lab
113 pages
Cb20133 Logbook Week 1-5
No ratings yet
Cb20133 Logbook Week 1-5
32 pages
Python Lab Manual Final
No ratings yet
Python Lab Manual Final
45 pages
B.Tech AI & ML Syllabus Guide
No ratings yet
B.Tech AI & ML Syllabus Guide
170 pages
PROMYS Europe 2025 Application Problem Set - J25
No ratings yet
PROMYS Europe 2025 Application Problem Set - J25
3 pages
C Programming: Arrays, Strings, Structures, Pointers
No ratings yet
C Programming: Arrays, Strings, Structures, Pointers
42 pages
Dsu Viva
No ratings yet
Dsu Viva
23 pages
2025 AMD3A Test A1
No ratings yet
2025 AMD3A Test A1
7 pages
C++ Program List
No ratings yet
C++ Program List
3 pages
Clad Sample Papers
No ratings yet
Clad Sample Papers
100 pages
MATLAB For Engineers 6th Edition Holly Moore - The Ebook Is Ready For Download With Just One Simple Click
80% (5)
MATLAB For Engineers 6th Edition Holly Moore - The Ebook Is Ready For Download With Just One Simple Click
65 pages
C Programming Absolute Beginner's Guide (3rd Edition) Perry PDF Download
No ratings yet
C Programming Absolute Beginner's Guide (3rd Edition) Perry PDF Download
165 pages
Dsa in Python
No ratings yet
Dsa in Python
3 pages
CS201 13
No ratings yet
CS201 13
29 pages
Itp Unit-3
0% (1)
Itp Unit-3
8 pages
Programming Fundamentals Outlines For Session s2020 in Semester at March 9 2020
No ratings yet
Programming Fundamentals Outlines For Session s2020 in Semester at March 9 2020
5 pages
Data Structures and Algorithms MCQ Questions
No ratings yet
Data Structures and Algorithms MCQ Questions
9 pages
Starting Out With C From Control Structures To Objects 10th Edition Full Download
100% (2)
Starting Out With C From Control Structures To Objects 10th Edition Full Download
407 pages
UNIT 3 Strings
No ratings yet
UNIT 3 Strings
91 pages
Final Miroprocessor Lab 23 24
No ratings yet
Final Miroprocessor Lab 23 24
34 pages
Practical BASIC Programming (P. E. Gosling (Auth.) ) (Z-Library)
No ratings yet
Practical BASIC Programming (P. E. Gosling (Auth.) ) (Z-Library)
114 pages
JNTUK Big Data Analytics R16 MCA
No ratings yet
JNTUK Big Data Analytics R16 MCA
86 pages
Data Structure and Algorithm Explaned
No ratings yet
Data Structure and Algorithm Explaned
342 pages
CPDS Question Bank
No ratings yet
CPDS Question Bank
4 pages
FOP Question Bank
No ratings yet
FOP Question Bank
3 pages
Core - Subject Paper
No ratings yet
Core - Subject Paper
4 pages
Python Lab Manual
No ratings yet
Python Lab Manual
34 pages
DS in 7 Hours
No ratings yet
DS in 7 Hours
87 pages