0% found this document useful (0 votes)
3 views42 pages

Python File Semester-4

The document outlines a series of practical programming exercises focused on implementing basic Python libraries such as NumPy, SciPy, Matplotlib, and Pandas. Each exercise includes specific aims, code examples, and explanations of concepts like array creation, indexing, and data manipulation. The document serves as a tutorial for understanding and utilizing these libraries for data analysis and visualization.

Uploaded by

tonyalwyn28
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views42 pages

Python File Semester-4

The document outlines a series of practical programming exercises focused on implementing basic Python libraries such as NumPy, SciPy, Matplotlib, and Pandas. Each exercise includes specific aims, code examples, and explanations of concepts like array creation, indexing, and data manipulation. The document serves as a tutorial for understanding and utilizing these libraries for data analysis and visualization.

Uploaded by

tonyalwyn28
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 42

INDEX

SR.NO. PRACTICALS REMARKS

1. WRITE A PROGRAM TO IMPLEMENT OF BASIC


PYTHON LIBRARIES-NUMPY, SCIPY.

2. WRITE A PROGRAM TO IMPLEMENT OF BASIC


PYTHON LIBRARIES-MATPLOTLIB, PANDAS,
SCIKITLEARN.
3. WRITE A PROGRAM TO CREATE SAMPLES FROM
POPULATION.

4. WRITE A PROGRAM TO EVALUATE MEAN, MEDIAN,


MODE OF DATASET.

5. WRITE A PROGRAM TO EVALUATE MEAN, MEDIAN,


MODE OF DATASET.

6. WRITE A PROGRAM TO IMPLEMENT MEASURE OF


SPREAD IN DATSET.

7. WRITE A PROGRAM TO IMPLEMENT PMF, PDF AND


CDF.

8. WRITE A PROGRAM TO IMPLEMENT DIFFERENT


VISUALIZATION TECHNIQUES ON SAMPLE
DATASET.

1
Sania Dixit(8522244)
Experiment 1

Aim:- Write a program to implement of Basic Python Libraries-numpy, scipy.

Code:-
NumPy:- It provides a high- performance multidimensional array object, and tools for
working with these arrays. It is the fundamental package for scientific computing with
Python. It contains various features including these important ones:
 A powerful N-dimensional array object
 Sophisticated (broadcasting) functions
 Tools for integrating C/C++ and Fortran code
 Useful linear algebra, Fourier transform, and random number capabilities

1. Arrays in NumPy: NumPy’s main object is the homogeneous multidimensional


array.
 It is a table of elements (usually numbers), all of the same type, indexed by a tuple
of positive integers.
 In NumPy dimensions are called axes. The number of axes is rank.
 NumPy’s array class is called ndarray. It is also known by the alias array.
Example :
[[ 1, 2, 3],
[ 4, 2, 5]]
Here,
rank = 2 (as it is 2-dimensional or it has 2 axes)
first dimension(axis) length = 2, second dimension has length = 3
overall shape can be expressed as: (2, 3)
2. Array creation: There are various ways to create arrays in NumPy.
 For example, you can create an array from a regular Python list or tuple using
the array function. The type of the resulting array is deduced from the type of the
elements in the sequences.
 Often, the elements of an array are originally unknown, but its size is known.
Hence, NumPy offers several functions to create arrays with initial placeholder
content. These minimize the necessity of growing arrays, an expensive
operation. For example: np.zeros, np.ones, np.full, np.empty, etc.
 To create sequences of numbers, NumPy provides a function analogous to
range that returns arrays instead of lists.

3. Array Indexing: Knowing the basics of array indexing is important for analysing
and manipulating the array object. NumPy offers many ways to do array indexing.

2
Sania Dixit(8522244)
 Slicing: Just like lists in python, NumPy arrays can be sliced. As arrays can be
multidimensional, you need to specify a slice for each dimension of the array.
 Integer array indexing: In this method, lists are passed for indexing for each
dimension. One to one mapping of corresponding elements is done to construct
a new arbitrary array.

4. Basic operations: Plethora of built-in arithmetic functions are provided in NumPy.


 Operations on single array: We can use overloaded arithmetic operators to
do element-wise operation on array to create a new array. In case of +=, -=, *=
operators, the existing array is modified.

 Unary operators: Many unary operations are provided as a method


of ndarray class. This includes sum, min, max, etc. These functions can also be
applied row-wise or column-wise by setting an axis parameter.

 Binary operators: These operations apply on array elementwise and a new


array is created. You can use all basic arithmetic operators like +, -, /, , etc. In
case of
+=, -=, = operators, the existing array is modified.

Program of NumPy showing implementation of these operations :-

import numpy as np

arr = np.array( [[ 1, 2, 3],[ 4, 2, 5]] )


# Printing type of arr object
print("Array is of type: ", type(arr))
print("No. of dimensions: ",
arr.ndim) # Printing shape of array
print("Shape of array: ", arr.shape)
# Printing size (total number of elements) of array
print("Size of array: ", arr.size)
# Printing type of elements in array
print("Array stores elements of type: ", arr.dtype)

#Creating array from list with type float


a = np.array([[1, 2, 4], [5, 8, 7]], dtype = 'float')
print ("Array created using passed list:\n", a)

# Create an array with random


values e = np.random.random((2,
2))
print ("\nA random array:\n", e)

# An exemplar array
arr = np.array([[-1, 2, 0, 4],[4, -0.5, 6, 0],[2.6, 0, 7, 8],[3, -7, 4, 2.0]])
# Slicing array
temp = arr[:2, ::2]
3
Sania Dixit(8522244)
print ("Array with first 2 rows and alternate" "columns(0 and 2):\n", temp)

4
Sania Dixit(8522244)
arr = np.array([[1, 5, 6],[4, 7, 2],[3, 1, 9]])

# maximum element of array


print ("Largest element is:", arr.max())
print ("Row-wise maximum elements:",arr.max(axis = 1))
# minimum element of array
print ("Column-wise minimum elements:",arr.min(axis = 0))
# sum of array elements
print ("Sum of all array elements:",arr.sum())
# cumulative sum along each row
print ("Cumulative sum along each row:\n",arr.cumsum(axis = 1))

a = np.array([[1, 2],[3, 4]])


b = np.array([[4, 3],[2, 1]])
# add arrays
print ("Array sum:\n", a + b)
# multiply arrays (elementwise multiplication)
print ("Array multiplication:\n", a*b)
# matrix multiplication
print ("Matrix multiplication:\n", a.dot(b))

Output-

5
Sania Dixit(8522244)
Scipy:-
SciPy is a collection of mathematical algorithms and convenience functions built on
the NumPy extension of Python. It adds significant power to the interactive Python
session by providing the user with high-level commands and classes for manipulating
and visualizing data. With SciPy, an interactive Python session becomes a data-
processing and system-prototyping environment rivaling systems, such as MATLAB,
IDL, Octave, R-Lab, and SciLab.

The additional benefit of basing SciPy on Python is that this also makes a powerful
programming language available for use in developing sophisticated programs and
specialized applications. Scientific applications using SciPy benefit from the
development of additional modules in numerous niches of the software landscape by
developers across the world. Everything from parallel programming to web and data-
base subroutines and classes have been made available to the Python programmer. All
of this power is available in addition to the mathematical libraries in SciPy.

Spatial Data:-

Spatial data refers to data that is represented in a geometric space.


E.g. points on a coordinate system.
We deal with spatial data problems on many tasks.
E.g. finding if a point is inside a boundary or not.
SciPy provides us with the module scipy.spatial, which has functions for working
with spatial data.

Triangulation:-

A Triangulation of a polygon is to divide the polygon into multiple triangles with which
we can compute an area of the polygon.
A Triangulation with points means creating surface composed triangles in which all of
the given points are on at least one vertex of any triangle in the surface.
One method to generate these triangulations through points is
the Delaunay() Triangulation
plt.scatter: This function is used to create a scatter plot. It takes two arguments: the x-
coordinates of the points (stored in points[:, 0]) and the y-coordinates of the points
(stored in points[:, 1]).
- points[:, 0]: This extracts the x-coordinates of all points in the points array.
- points[:, 1]: This extracts the y-coordinates of all points in the points array.
- color='r': This parameter specifies the color of the points. In this case, 'r' indicates red.

6
Sania Dixit(8522244)
Code:-

import numpy as np
from scipy.spatial import Delaunay
import matplotlib.pyplot as plt

points = np.array([
[2, 4],
[3, 4],
[3, 0],
[2, 2],
[4, 1]
])

simplices = Delaunay(points).simplices

plt.triplot(points[:, 0], points[:, 1], simplices)


plt.scatter(points[:, 0], points[:, 1], color='r')

plt.show()
Output:-

7
Sania Dixit(8522244)
Experiment 2

Aim:-Write a program to implement of Basic Python Libraries-matplotlib, pandas,


Scikitlearn.

Code:-
Python Pandas is defined as an open-source library that provides high-
performance data manipulation in Python. This tutorial is designed for both
beginners and professionals. It is used for data analysis in Python and developed by
Wes McKinney in 2008. Our Tutorial provides all the basic and advanced concepts of
Python Pandas, such as Numpy, Data operation and Time Series

Python Pandas Introduction


Pandas is defined as an open-source library that provides high-performance data
manipulation in Python. The name of Pandas is derived from the word Panel Data,
which means an Econometrics from Multidimensional data. It is used for data
analysis in Python and developed by Wes McKinney in 2008.
Data analysis requires lots of processing, such as restructuring, cleaning or merging,
etc. There are different tools are available for fast data processing, such as Numpy,
Scipy, Cython, and Panda.
Before Pandas, Python was capable for data preparation, but it only provided limited
support for data analysis. So, Pandas came into the picture and enhanced the
capabilities of data analysis. It can perform five significant steps required for
processing and analysis of data irrespective of the origin of the data, i.e., load,
manipulate, prepare, model, and analyze.
Key Features of Pandas
 It has a fast and efficient DataFrame object with the default and customized
indexing.
 Used for reshaping and pivoting of the data sets.
 Group by data for aggregations and transformations.
 It is used for data alignment and integration of the missing data.
 Provide the functionality of Time Series.
 Process a variety of data sets in different formats like matrix data, tabular
heterogeneous, time series.
 Handle multiple operations of the data sets such as subsetting, slicing,
filtering, groupBy, re-ordering, and re-shaping.

 Provides fast performance, and If you want to speed it, even more, you can use
the Cython.

8
Sania Dixit(8522244)
Benefits of Pandas
The benefits of pandas over using other language are as follows:
 Data Representation: It represents the data in a form that is suited for data
analysis through its DataFrame and Series.
 Clear code: The clear API of the Pandas allows you to focus on the core part
of the code. So, it provides clear and concise code for the user.
Python Pandas Data Structure
The Pandas provides two data structures for processing the data,
i.e., Series and DataFrame, which are discussed below:
1) Series
It is defined as a one-dimensional array that is capable of storing various data types.
The row labels of series are called the index. We can easily convert the list, tuple, and
dictionary into series using "series' method. A Series cannot contain multiple
columns. It has one parameter:
Data: It can be any list, dictionary, or scalar value.
Creating Series from Array:
Before creating a Series, Firstly, we have to import the numpy module and then use
array() function in the program.

import pandas as pd
import numpy as np
info = np.array(['P','a','n','d','a','s'])
a = pd.Series(info)
print(a)

Output

Explanation: In this code, firstly, we have imported the pandas and numpy library
with the pd and np alias. Then, we have taken a variable named "info" that consist of
an array of some values. We have called the info variable through a Series method
and defined it in an "a" variable. The Series has printed by calling the print(a)
method.
Python Pandas DataFrame
It is a widely used data structure of pandas and works with a two-dimensional array
with labeled axes (rows and columns). DataFrame is defined as a standard way to
store data and has two different indexes, i.e., row index and column index. It consists
of the following properties:
9
Sania Dixit(8522244)
 The columns can be heterogeneous types like int, bool, and so on.
 It can be seen as a dictionary of Series structure where both the rows and
columns are indexed. It is denoted as "columns" in case of columns and
"index" in case of rows.
Create a DataFrame using List:
We can easily create a DataFrame in Pandas using list.

import pandas as pd
x = ['Python', 'Pandas']
df = pd.DataFrame(x)
print(df)

Output

The Pandas Series can be defined as a one-dimensional array that is capable of storing
various data types. We can easily convert the list, tuple, and dictionary into series
using "series' method. The row labels of series are called the index. A Series cannot
contain multiple columns. It has the following parameter:
data: It can be any list, dictionary, or scalar value.
index: The value of the index should be unique and hashable. It must be of the same
length as data. If we do not pass any index, default np.arrange(n) will be used.
dtype: It refers to the data type of series.
copy: It is used for copying the data.
Creating a Series:
We can create a Series in two ways:
Create an empty Series
Create a Series using inputs.
Create an Empty Series:
We can easily create an empty series in Pandas which means it will not have any value.
The syntax that is used for creating an Empty Series:
<series object> = pandas.Series()
The below example creates an Empty Series type object that has no values and having
default datatype, i.e., float64.
Example

10
Sania Dixit(8522244)
import pandas as pd
x = pd.Series()
print (x)

Output

Creating a Series using inputs:


We can create Series by using various inputs:
Array
Dict
Scalar value
Creating Series from Array:
Before creating a Series, firstly, we have to import the numpy module and then use
array() function in the program. If the data is ndarray, then the passed index must be
of the same length.
If we do not pass an index, then by default index of range(n) is being passed where n
defines the length of an array, i.e., [0,1,2,. range(len(array))-1].
Example

import pandas as pd
import numpy as np
info = np.array(['P','a','n','d','a','s'])
a = pd.Series(info)
print(a)

Output

Create a Series from dict


We can also create a Series from dict. If the dictionary object is being passed as an
input and the index is not specified, then the dictionary keys are taken in a sorted
order to construct the index.
If index is passed, then values correspond to a particular label in the index will be
extracted from the dictionary.

11
Sania Dixit(8522244)
import the pandas library
import pandas as pd
import numpy as np
info = {'x' : 0., 'y' : 1., 'z' : 2.}
a = pd.Series(info)
print (a)

Output

Create a Series using Scalar:


If we take the scalar values, then the index must be provided. The scalar value will be
repeated for matching the length of the index.

import pandas library


import pandas as pd
import numpy as np
x = pd.Series(4, index=[0, 1, 2, 3])
print (x)

Output

Accessing data from series with Position:


Once you create the Series type object, you can access its indexes, data, and even
individual elements.
The data in the Series can be accessed similar to that in the ndarray.

import pandas as pd
x = pd.Series([1,2,3],index = ['a','b','c'])
print (x[0])
Output

12
Sania Dixit(8522244)
Retrieving Index array and data array of a series object
We can retrieve the index array and data array of an existing Series object by using the
attributes index and values.

import numpy as np
import pandas as pd
x=pd.Series(data=[2,4,6,8])
y=pd.Series(data=[11.2,18.6,22.5], index=['a','b','c'])
print(x.index)
print(x.values)
print(y.index)
print(y.values)

Output

What is Matplotlib?

Matplotlib is a low level graph plotting library in python that serves as a


visualization utility.Matplotlib was created by John D. Hunter.Matplotlib is open
source and we can use it freely.Matplotlib is mostly written in python, a few
segments are written in C, Objective-C and Javascript for Platform compatibility.

Installation of Matplotlib

If you have Python and PIP already installed on a system, then installation of
Matplotlib is very easy.

Install it using this command:

C:\Users\Your Name>pip install matplotlib

If this command fails, then use a python distribution that already has Matplotlib
installed, like Anaconda, Spyder etc.

13
Sania Dixit(8522244)
Import Matplotlib

Once Matplotlib is installed, import it in your applications by adding


the import module statement:

import matplotliball matplotlib

Now Matplotlib is imported and ready to use:

Checking Matplotlib Version

The version string is stored under version attribute.

import matplotlib

print(matplotlib. version )
Pyplot

Most of the Matplotlib utilities lies under the pyplot submodule, and are usually
imported under the plt alias:

import matplotlib.pyplot as plt

Now the Pyplot package can be referred to as plt.

import matplotlib.pyplot as plt

import numpy as np

xpoints = np.array([0, 6])

ypoints = np.array([0, 250])

plt.plot(xpoints, ypoints)

plt.show()

Result:-

14
Sania Dixit(8522244)
Plotting x and y points

The plot() function is used to draw points (markers) in a diagram.

By default, the plot() function draws a line from point to point.

The function takes parameters for specifying points in the diagram.

Parameter 1 is an array containing the points on the x-axis.

Parameter 2 is an array containing the points on the y-axis.

If we need to plot a line from (1, 3) to (8, 10), we have to pass two arrays [1, 8] and
[3, 10] to the plot function.

Draw a line in a diagram from position (1, 3) to position (8, 10):

import matplotlib.pyplot as plt

import numpy as np

xpoints = np.array([1, 8])

ypoints = np.array([3, 10])

plt.plot(xpoints, ypoints)

plt.show()

Result:

15
Sania Dixit(8522244)
Plotting Without Line

To plot only the markers, you can use shortcut string notation parameter 'o', which
means 'rings'.

Example

Draw two points in the diagram, one at position (1, 3) and one in position (8, 10):

import matplotlib.pyplot as plt

import numpy as np

xpoints = np.array([1, 8])

ypoints = np.array([3, 10])

plt.plot(xpoints, ypoints, 'o')

plt.show()

Result:

16
Sania Dixit(8522244)
Multiple Points

You can plot as many points as you like, just make sure you have the same number of
points in both axis.

Example

Draw a line in a diagram from position (1, 3) to (2, 8) then to (6, 1) and finally to
position (8, 10):

import matplotlib.pyplot as plt

import numpy as np

xpoints = np.array([1, 2, 6, 8])

ypoints = np.array([3, 8, 1, 10])

17
Sania Dixit(8522244)
plt.plot(xpoints, ypoints)

plt.show()

Result:

Default X-Points

If we do not specify the points on the x-axis, they will get the default values 0, 1, 2, 3
(etc., depending on the length of the y-points.

So, if we take the same example as above, and leave out the x-points, the diagram will
look like this:

Example

Plotting without x-points:

import matplotlib.pyplot as plt

import numpy as np

ypoints = np.array([3, 8, 1, 10, 5, 7])

plt.plot(ypoints)

18
Sania Dixit(8522244)
plt.show()

Result:

Markers

You can use the keyword argument marker to emphasize each point with a specified
marker:

Mark each point with a circle:

import matplotlib.pyplot as plt

import numpy as np

ypoints = np.array([3, 8, 1, 10])

plt.plot(ypoints, marker = 'o')

plt.show()

Result:

19
Sania Dixit(8522244)
Mark each point with a circle:

import matplotlib.pyplot as

plt import numpy as np

ypoints = np.array([3, 8, 1, 10])

plt.plot(ypoints, marker = 'o')

plt.show()

Result:

20
Sania Dixit(8522244)
Experiment -3

Aim:- Write a program to create samples from population.

Code:-
sample() is an inbuilt function of random module in Python that returns a
particular length list of items chosen from the sequence i.e. list, tuple, string or set.
Used for random sampling without replacement.
Syntax : random.sample(sequence, k)
Parameters:
sequence: Can be a list, tuple, string, or set.
k: An Integer value, it specify the length of a sample.
Returns: k length new list of elements chosen from the sequence.

Code #1: Simple implementation of sample() function.


# By the use of sample() function .

from random import sample

# Prints list of random items of given length


list1 = [1, 2, 3, 4, 5]

print(sample(list1,3))
Output:
[2, 3, 5]

Code #2: Basic use of sample() function.

# the use of sample() function .

# import random
import random
# Prints list of random items of
list1 = [1, 2, 3, 4, 5, 6]
print("With list:", random.sample(list1, 3))

string = "This is a python program"


print("With string:", random.sample(string, 4))

# Prints list of random items of


# length 4 from the given tuple.
tuple1 = ("artificial", "machine", "computer", "science", "portal", "scientist", "btech")

21
Sania Dixit(8522244)
print("With tuple:", random.sample(tuple1, 4))

# Prints list of random items of


# length 3 from the given set.
set1 = {"a", "b", "c", "d", "e"}
print("With set:", random.sample(set1, 3))

Output:
With list: [3, 1, 2]
With string: ['i', 's', 'T', 'p']
With tuple: ['artificial', 'portal', 'machine', 'computer']
With set: ['b', 'd', 'c']

22
Sania Dixit(8522244)
Experiment: 4

Aim:- Write a program to evaluate Mean, Median, Mode of

dataset. Code:-
Mean, median, and mode are fundamental topics of statistics. You can easily calculate
them in Python, with and without the use of external libraries.
These three are the main measures of central tendency. The central tendency lets us
know the “normal” or “average” values of a dataset.

Calculating the Mean in Python


The mean or arithmetic average is the most used measure of central tendency.
Remember that central tendency is a typical value of a set of data.
A dataset is a collection of data, therefore a dataset in Python can be any of the
following built-in data structures:
• Lists, tuples, and sets: a collection of objects
• Strings: a collection of characters
• Dictionary: a collection of key-value pairs
Note: Altought there are other data structures in Python
like queues or stacks, we’ll be using only the built-in ones.
We can calculate the mean by adding all the values of a dataset and dividing the result
by the number of values. For example, if we have the following list of numbers:
[1, 2, 3, 4, 5, 6]
The mean or average would be 3.5 because the sum of the list is 21 and its length
is 6. Twenty-one divided by six is 3.5. You can perform this calculation with the
below calculation:
(1 + 2 + 3 + 4 + 5 + 6) / 6 = 21
Copy
In this tutorial, we’ll be using the players of a basketball team as our sample data.

Creating a Custom Mean Function


Let’s start by calculating the average (mean) age of the players in a basketball
Program:

# calculating the mean in python

# without using the library

def cal_mean(dataset):
return sum(dataset)/len(dataset)

dataset = [12,2,34,34,54,54,3,23,42,33,4,3,2,3]
print("The Mean of the given dataset is: ",cal_mean(dataset))

Output:

The Mean of the given dataset is: 21.642857142857142

23
Sania Dixit(8522244)
Using mean() from the Python Statistic Module
Calculating measures of central tendency is a common operation for most developers.
That’s because Python’s statistics module provides diverse functions to calculate
them, along with other basic statistics topics.

Program:

from statistics import mean


print("Mean using the python library or built in function: ",mean(dataset))

dataset_median = [12,2,33,4,45,34,32,56,45,56]

Output:

Mean using the python library or built in function: 21.642857142857142

Finding the Median in Python


The median is the middle value of a sorted dataset. It is used — again — to provide a
“typical” value of a determined population.
In programming, we can define the median as the value that separates a sequence into
two parts — The lower half and the higher half —.
To calculate the median, first, we need to sort the dataset. We could do this
with sorting algorithms or using the built-in function sorted(). The second step is to
determine whether the dataset length is odd or even. Depending on this some of the
following process:
• Odd: The median is the middle value of the dataset
• Even: The median is the sum of the two middle values divided by two

Creating a Custom Median Function


Let’s implement the above concept into a Python function.
Remember the three steps we need to follow to get the median of a dataset:
• Sort the dataset: We can do this with the sorted() function
• Determine if it’s odd or even: We can do this by getting the length of
the dataset and using the modulo operator (%)
• Return the median based on each case:
• Odd: Return the middle value
• Even: Return the average of the two middle values
That would result in the following function:

dataset_median = [12,2,33,4,45,34,32,56,45,56]

def cal_median(dataset):
n = len(dataset)
if (n%2==0):
n = n/2
n = int(n)

dataset.sort()
x = (dataset[n-1] + dataset[n])/2

24
Sania Dixit(8522244)
return x
else:
n = (n+1)/2
n = int(n)
dataset.sort()
return dataset[n-1]

print("The median of the given dataset is: ",cal_median(dataset_median))


dataset_median.sort()
print("The sorted dataset is: ",dataset_median)

Output:

The median of the given dataset is: 33.5


The sorted dataset is: [2, 4, 12, 32, 33, 34, 45, 45, 56, 56]

Using median() from the Python Statistic Module


This way is much simpler because we’re using an already existent function from the
statistics module.

from statistics import median


print("The median of the given dataset using python library is:
",median(dataset_median))

Output:
The median of the given dataset using python library is: 33.5

25
Sania Dixit(8522244)
Experiment -5

Aim:- Write a program to implement Central Limit Theorem in dataset.

Code:-
Where represents the sampling distribution of the sample mean of size n each,
and are the mean and standard deviation of the population respectively.

The distribution of the sample tends towards the normal distribution as the sample
size increases.
It is evident from the graphs that as we keep on increasing the sample size from 1 to
100 the histogram tends to take the shape of a normal distribution.

Rule of thumb:
Of course, the term “large” is relative. Roughly, the more “abnormal” the basic
distribution, the larger n must be for normal approximations to work well. The rule
of thumb is that a sample size n of at least 30 will suffice.

Why is this important?


The answer to this question is very simple, as we can often use well developed
statistical inference procedures that are based on a normal distribution such as 68-
95-99.7 rule and many others, even if we are sampling from a population that is not
normal, provided we have a large sample size.
Program for Python implementation of the Central Limit Theorem

import numpy
import matplotlib.pyplot as plt

# number of sample
num = [1, 10, 50, 100]
# list of sample means
means = []

# Generating 1, 10, 30, 100 random numbers from -40 to 40


# taking their mean and appending it to list means.
for j in num:
# Generating seed so that we can get same result
# every time the loop is run...
numpy.random.seed(1)
x = [numpy.mean(
numpy.random.randint(
-40, 40, j)) for _i in range(1000)]
means.append(x)
k=0

# plotting all the means in one figure


fig, ax = plt.subplots(2, 2, figsize =(8, 8))
for i in range(0, 2):
26
Sania Dixit(8522244)
for j in range(0, 2)
# Histogram for each x stored in means
ax[i,j].hist(means[k], 10, density = True)
ax[i,j].set_title(label = num[k])
k=k+1
plt.show()

27
Sania Dixit(8522244)
Experiment : 6
Aim:- Write a program to implement Measure of Spread in datset.

Code:-

How To Get Measures Of Spread With Python

Measures of spread tell how spread the data points are. Some examples of measures
of spread are quantiles, variance, standard deviation and mean absolute deviation.

In this excercise we are going to get the measures of spread using python.

Quantiles are values that split sorted data or a probability distribution into equal parts.
There several different types of quantlies, here are some of the examples:

1. Quartiles - Divides the data into 4 equal parts.

2. Quintiles - Divides the data into 5 equal parts.

3. Deciles - Divides the data into 10 equal parts

4. Percentiles - Divides the data into 100 equal

parts

Code:-

import numpy as np

# Define a function to calculate quartiles

def calculate_quartiles(data):

quartiles = np.percentile(data, [25, 50, 75])

return quartiles

# Define a function to calculate quantiles

def calculate_quantiles(data, q):

quantiles = np.percentile(data, q)

return quantiles

28
Sania Dixit(8522244)
# Define a function to calculate percentiles

def calculate_percentiles(data, p):

percentiles = np.percentile(data, p)

return percentiles

# Example data

data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

# Calculate quartiles

quartiles = calculate_quartiles(data)

print("Quartiles:", quartiles)

# Calculate quantiles (e.g., deciles)

quantiles = calculate_quantiles(data, [10, 20, 30, 40, 50, 60, 70, 80, 90])
print("Quantiles:", quantiles)

# Calculate percentiles

percentiles = calculate_percentiles(data, [10, 20, 30, 40, 50, 60, 70, 80, 90])
print("Percentiles:", percentiles)

Output:-

Quartiles: [3.25 5.5 7.75]


Quantiles: [1.9 2.8 3.7 4.6 5.5 6.4 7.3 8.2 9.1]
Percentiles: [1.9 2.8 3.7 4.6 5.5 6.4 7.3 8.2 9.1]

29
Sania Dixit(8522244)
1. Mean Absolute Deviation

This is the average of the distance between each data point and the mean of the data.

Let's find the mean absolute distance of the scores

Code:-

import numpy as np

# Define a function to calculate Mean Absolute Deviation (MAD)


def mean_absolute_deviation(data):
mean = np.mean(data)
mad = np.mean(np.abs(data - mean))
return mad

# Example data
data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

# Calculate Mean Absolute Deviation


mad_value = mean_absolute_deviation(data)
print("Mean Absolute Deviation:", mad_value)

Output:-

Mean Absolute Deviation: 2.5

30
Sania Dixit(8522244)
Experiment : 7

Aim:- Write a program to implement pmf, pdf and cdf.

Code:-
This code demonstrates how to generate and visualize the Probability
Mass Function (PMF), Probability Density Function (PDF), and
Cumulative Distribution Function (CDF) for a normal distribution. You
can adjust parameters like mu and sigma to explore different
distributions, and modify the range and number of points in x to
change the granularity of the plots.
1. Defining Plotting Functions:
 plot_pmf(x, pmf, title): This function plots the Probability Mass Function
(PMF). It uses plt.stem() to plot the stem plot, where x represents the data
points, pmf represents the probability mass function values, and title is the
title of the plot.
 plot_pdf(x, pdf, title): This function plots the Probability Density Function
(PDF). It uses plt.plot() to plot the PDF curve, where x represents the data
points, pdf represents the probability density function values, and title is the
title of the plot.
 plot_cdf(x, cdf, title): This function plots the Cumulative Distribution
Function (CDF). It uses plt.plot() to plot the CDF curve, where x represents
the data points, cdf represents the cumulative distribution function values,
and title is the title of the plot.

Code:-

import numpy as np
import matplotlib.pyplot as plt

def pmf(values, probabilities, x):


if x in values:
index = values.index(x)
return probabilities[index]
else:
return 0

def pdf(func, x, dx=1e-6):


return (func(x + dx) - func(x)) / dx

def cdf(values, probabilities, x):


cdf_value = 0
for i in range(len(values)):
if values[i] <= x:
cdf_value += probabilities[i]
else:
break
return cdf_value
31
Sania Dixit(8522244)
# Example usage
values = [1, 2, 3, 4, 5]
probabilities = [0.1, 0.2, 0.3, 0.2, 0.2]

# Plot PMF
plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1)
plt.bar(values, probabilities)
plt.title('Probability Mass Function (PMF)')
plt.xlabel('Values')
plt.ylabel('Probability')

# Plot CDF
plt.subplot(1, 2, 2)
cdf_values = [cdf(values, probabilities, val) for val in values]
plt.plot(values, cdf_values, marker='o')
plt.title('Cumulative Distribution Function (CDF)')
plt.xlabel('Values')
plt.ylabel('CDF')

plt.tight_layout()
plt.show()

Output:-

32
Sania Dixit(8522244)
Experiment : 8

Aim:- Write a program to implement different visualization techniques on sample


dataset.

Code:-

Packages of Data Visualization in Python


In today’s world, a lot of data is being generated on a daily basis. And sometimes to
analyze this data for certain trends, patterns may become difficult if the data is in its
raw format. To overcome this data visualization comes into play. Data visualization
provides a good, organized pictorial representation of the data which makes it easier
to understand, observe, analyze. In this tutorial, we will discuss how to visualize data
using Python.
Python provides various libraries that come with different features for visualizing data.
All these libraries come with different features and can support various types of
graphs. In this tutorial, we will be discussing four such libraries.
• Matplotlib
• Seaborn
• Bokeh
• Plotly
We will discuss these libraries one by one and will plot some most commonly used
graphs.

Tips Database

Tips database is the record of the tip given by the customers in a restaurant for two
and a half months in the early 1990s. It contains 6 columns such as total_bill, tip,
sex, smoker, day, time, size.
Data Visualization
Data visualization is a branch of data analysis that focuses on visualizing data. It plots
data graphically and is a good way to communicate data inferences.
The human brain has an easy and fast processing time and understanding of data when
presented in pictures, maps, and graphs. We can always get a visual description of our
data by using data visualization. Data visualization is important in representing both
big and small data sets. However, it is especially useful when dealing with large data
sets where it is impossible to see all our records, process, and understand them.
Data Visualization in Python
In today's world, data visualization in Python is probably one of the most widely used
functionalities for data science with Python. Python libraries include various features
that allow users to create highly customized, classy, and interactive plots. Python
includes several plotting libraries, such as Matplotlib, Seaborn, and other data
visualization packages. Each has its own set of features for constructing informative,
customized, and intriguing plots to present information most simply and effectively
possible.
Python has several libraries for visualizing data, each with its features. Four of these
libraries will be discussed in this tutorial. Each of these libraries has its own set of
33
Sania Dixit(8522244)
functionalities and can support a variety of graph types.
• Matplotlib

• Seaborn

• Bokeh

• Plotly
We'll go over each of these libraries one after another.
Useful Python Packages for Visualizations
Matplotlib
Matplotlib is a Python visualization library for 2D array plots. Matplotlib is a Python
library that uses the NumPy library. It works with Python and IPython shells, as well
as Jupyter notebooks and web application server software. Matplotlib includes a wide
range of plots, such as scatter, line, bar, histogram, and others, that can assist us in
delving deeper into trends, behavioral patterns, and correlations. John Hunter first
launched it in 2002.
Example:
import pandas as pd

# reading the database


data = pd.read_csv("tips.csv")

# printing the top 10 rows


print(data.head(10))

Output:

34
Sania Dixit(8522244)
Matplotlib

Matplotlib is an easy-to-use, low-level data visualization library that is built on


NumPy arrays. It consists of various plots like scatter plot, line plot, histogram, etc.
Matplotlib provides a lot of flexibility.
To install this type the below command in the terminal.
pip install matplotlib

Scatter Plot

Scatter plots are used to observe relationships between variables and uses dots to
represent the relationship between them. The scatter() method in the matplotlib
library is used to draw a scatter plot.
Example:
import pandas as pd
import matplotlib.pyplot as plt

# reading the database


data = pd.read_csv("tips.csv")

# Scatter plot with day against tip


plt.scatter(data['day'], data['tip'])

# Adding Title to the Plot


plt.title("Scatter Plot")

# Setting the X and Y labels


plt.xlabel('Day')
plt.ylabel('Tip')

plt.show()

35
Sania Dixit(8522244)
Output:

This graph can be more meaningful if we can add colors and also change the size of
the points. We can do this by using the c and s parameter respectively of the scatter
function. We can also show the color bar using the colorbar() method.
Example:
import pandas as pd
import matplotlib.pyplot as plt

# reading the database


data = pd.read_csv("tips.csv")

# Scatter plot with day against tip


plt.scatter(data['day'], data['tip'], c=data['size'],
s=data['total_bill'])

# Adding Title to the Plot


plt.title("Scatter Plot")

# Setting the X and Y labels


plt.xlabel('Day')
plt.ylabel('Tip')

36
Sania Dixit(8522244)
plt.colorbar()

Output:

Line Chart

Line Chart is used to represent a relationship between two data X and Y on a


different axis. It is plotted using the plot() function. Let’s see the below example.
Example:
import pandas as pd
import matplotlib.pyplot as plt

# reading the database


data = pd.read_csv("tips.csv")

# Scatter plot with day against tip


plt.plot(data['tip'])
plt.plot(data['size'])

# Adding Title to the Plot


plt.title("Scatter Plot")

# Setting the X and Y labels


plt.xlabel('Day')
plt.ylabel('Tip')

plt.show()
37
Sania Dixit(8522244)
Output:

Bar Chart

A bar plot or bar chart is a graph that represents the category of data with rectangular
bars with lengths and heights that is proportional to the values which they represent.
It can be created using the bar() method.
Example:
import pandas as pd
import matplotlib.pyplot as plt

# reading the database


data = pd.read_csv("tips.csv")

# Bar chart with day against tip


plt.bar(data['day'], data['tip'])

plt.title("Bar Chart")
# Setting the X and Y labels
plt.xlabel('Day')
plt.ylabel('Tip')
38
Sania Dixit(8522244)
# Adding the legends
plt.show()
Output:

Histogram

A histogram is basically used to represent data in the form of some groups. It is a


type of bar plot where the X-axis represents the bin ranges while the Y-axis gives
information about frequency. The hist() function is used to compute and create a
histogram. In histogram, if we pass categorical data then it will automatically
compute the frequency of that data i.e. how often each value occurred.
Example
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv("tips.csv")

# histogram of total_bills
plt.hist(data['total_bill'])

plt.title("Histogram")
# Adding the legends
plt.show()

39
Sania Dixit(8522244)
Output:

Seaborn
Seaborn is a high-level interface built on top of the Matplotlib. It provides beautiful
design styles and color palettes to make more attractive graphs.
To install seaborn type the below command in the terminal.
pip install seaborn

Seaborn is built on the top of Matplotlib, therefore it can be used with the Matplotlib
as well. Using both Matplotlib and Seaborn together is a very simple process. We
just have to invoke the Seaborn Plotting function as normal, and then we can use
Matplotlib’s customization function.
Note: Seaborn comes loaded with dataset such as tips, iris, etc. but for the sake of
this tutorial we will use Pandas for loading these datasets.
Example:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

data = pd.read_csv("tips.csv")

# draw lineplot
sns.lineplot(x="sex", y="total_bill", data=data)
# setting the title using Matplotlib
plt.title('Title using Matplotlib Function')

plt.show()
40
Sania Dixit(8522244)
Output:

Scatter Plot

Scatter plot is plotted using the scatterplot() method. This is similar to Matplotlib,
but additional argument data is required.
Example:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv("tips.csv")
sns.scatterplot(x='day', y='tip', data=data,)
plt.show()
Output:

41
Sania Dixit(8522244)
You will find that while using Matplotlib it will a lot difficult if you want to color
each point of this plot according to the sex. But in scatter plot it can be done with
the help of hue argument.
Example:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

# reading the database


data = pd.read_csv("tips.csv")

sns.scatterplot(x='day', y='tip', data=data,


hue='sex')
plt.show()
Output:

42
Sania Dixit(8522244)

You might also like