Python File Semester-4
Python File Semester-4
1
Sania Dixit(8522244)
Experiment 1
Code:-
NumPy:- It provides a high- performance multidimensional array object, and tools for
working with these arrays. It is the fundamental package for scientific computing with
Python. It contains various features including these important ones:
A powerful N-dimensional array object
Sophisticated (broadcasting) functions
Tools for integrating C/C++ and Fortran code
Useful linear algebra, Fourier transform, and random number capabilities
3. Array Indexing: Knowing the basics of array indexing is important for analysing
and manipulating the array object. NumPy offers many ways to do array indexing.
2
Sania Dixit(8522244)
Slicing: Just like lists in python, NumPy arrays can be sliced. As arrays can be
multidimensional, you need to specify a slice for each dimension of the array.
Integer array indexing: In this method, lists are passed for indexing for each
dimension. One to one mapping of corresponding elements is done to construct
a new arbitrary array.
import numpy as np
# An exemplar array
arr = np.array([[-1, 2, 0, 4],[4, -0.5, 6, 0],[2.6, 0, 7, 8],[3, -7, 4, 2.0]])
# Slicing array
temp = arr[:2, ::2]
3
Sania Dixit(8522244)
print ("Array with first 2 rows and alternate" "columns(0 and 2):\n", temp)
4
Sania Dixit(8522244)
arr = np.array([[1, 5, 6],[4, 7, 2],[3, 1, 9]])
Output-
5
Sania Dixit(8522244)
Scipy:-
SciPy is a collection of mathematical algorithms and convenience functions built on
the NumPy extension of Python. It adds significant power to the interactive Python
session by providing the user with high-level commands and classes for manipulating
and visualizing data. With SciPy, an interactive Python session becomes a data-
processing and system-prototyping environment rivaling systems, such as MATLAB,
IDL, Octave, R-Lab, and SciLab.
The additional benefit of basing SciPy on Python is that this also makes a powerful
programming language available for use in developing sophisticated programs and
specialized applications. Scientific applications using SciPy benefit from the
development of additional modules in numerous niches of the software landscape by
developers across the world. Everything from parallel programming to web and data-
base subroutines and classes have been made available to the Python programmer. All
of this power is available in addition to the mathematical libraries in SciPy.
Spatial Data:-
Triangulation:-
A Triangulation of a polygon is to divide the polygon into multiple triangles with which
we can compute an area of the polygon.
A Triangulation with points means creating surface composed triangles in which all of
the given points are on at least one vertex of any triangle in the surface.
One method to generate these triangulations through points is
the Delaunay() Triangulation
plt.scatter: This function is used to create a scatter plot. It takes two arguments: the x-
coordinates of the points (stored in points[:, 0]) and the y-coordinates of the points
(stored in points[:, 1]).
- points[:, 0]: This extracts the x-coordinates of all points in the points array.
- points[:, 1]: This extracts the y-coordinates of all points in the points array.
- color='r': This parameter specifies the color of the points. In this case, 'r' indicates red.
6
Sania Dixit(8522244)
Code:-
import numpy as np
from scipy.spatial import Delaunay
import matplotlib.pyplot as plt
points = np.array([
[2, 4],
[3, 4],
[3, 0],
[2, 2],
[4, 1]
])
simplices = Delaunay(points).simplices
plt.show()
Output:-
7
Sania Dixit(8522244)
Experiment 2
Code:-
Python Pandas is defined as an open-source library that provides high-
performance data manipulation in Python. This tutorial is designed for both
beginners and professionals. It is used for data analysis in Python and developed by
Wes McKinney in 2008. Our Tutorial provides all the basic and advanced concepts of
Python Pandas, such as Numpy, Data operation and Time Series
Provides fast performance, and If you want to speed it, even more, you can use
the Cython.
8
Sania Dixit(8522244)
Benefits of Pandas
The benefits of pandas over using other language are as follows:
Data Representation: It represents the data in a form that is suited for data
analysis through its DataFrame and Series.
Clear code: The clear API of the Pandas allows you to focus on the core part
of the code. So, it provides clear and concise code for the user.
Python Pandas Data Structure
The Pandas provides two data structures for processing the data,
i.e., Series and DataFrame, which are discussed below:
1) Series
It is defined as a one-dimensional array that is capable of storing various data types.
The row labels of series are called the index. We can easily convert the list, tuple, and
dictionary into series using "series' method. A Series cannot contain multiple
columns. It has one parameter:
Data: It can be any list, dictionary, or scalar value.
Creating Series from Array:
Before creating a Series, Firstly, we have to import the numpy module and then use
array() function in the program.
import pandas as pd
import numpy as np
info = np.array(['P','a','n','d','a','s'])
a = pd.Series(info)
print(a)
Output
Explanation: In this code, firstly, we have imported the pandas and numpy library
with the pd and np alias. Then, we have taken a variable named "info" that consist of
an array of some values. We have called the info variable through a Series method
and defined it in an "a" variable. The Series has printed by calling the print(a)
method.
Python Pandas DataFrame
It is a widely used data structure of pandas and works with a two-dimensional array
with labeled axes (rows and columns). DataFrame is defined as a standard way to
store data and has two different indexes, i.e., row index and column index. It consists
of the following properties:
9
Sania Dixit(8522244)
The columns can be heterogeneous types like int, bool, and so on.
It can be seen as a dictionary of Series structure where both the rows and
columns are indexed. It is denoted as "columns" in case of columns and
"index" in case of rows.
Create a DataFrame using List:
We can easily create a DataFrame in Pandas using list.
import pandas as pd
x = ['Python', 'Pandas']
df = pd.DataFrame(x)
print(df)
Output
The Pandas Series can be defined as a one-dimensional array that is capable of storing
various data types. We can easily convert the list, tuple, and dictionary into series
using "series' method. The row labels of series are called the index. A Series cannot
contain multiple columns. It has the following parameter:
data: It can be any list, dictionary, or scalar value.
index: The value of the index should be unique and hashable. It must be of the same
length as data. If we do not pass any index, default np.arrange(n) will be used.
dtype: It refers to the data type of series.
copy: It is used for copying the data.
Creating a Series:
We can create a Series in two ways:
Create an empty Series
Create a Series using inputs.
Create an Empty Series:
We can easily create an empty series in Pandas which means it will not have any value.
The syntax that is used for creating an Empty Series:
<series object> = pandas.Series()
The below example creates an Empty Series type object that has no values and having
default datatype, i.e., float64.
Example
10
Sania Dixit(8522244)
import pandas as pd
x = pd.Series()
print (x)
Output
import pandas as pd
import numpy as np
info = np.array(['P','a','n','d','a','s'])
a = pd.Series(info)
print(a)
Output
11
Sania Dixit(8522244)
import the pandas library
import pandas as pd
import numpy as np
info = {'x' : 0., 'y' : 1., 'z' : 2.}
a = pd.Series(info)
print (a)
Output
Output
import pandas as pd
x = pd.Series([1,2,3],index = ['a','b','c'])
print (x[0])
Output
12
Sania Dixit(8522244)
Retrieving Index array and data array of a series object
We can retrieve the index array and data array of an existing Series object by using the
attributes index and values.
import numpy as np
import pandas as pd
x=pd.Series(data=[2,4,6,8])
y=pd.Series(data=[11.2,18.6,22.5], index=['a','b','c'])
print(x.index)
print(x.values)
print(y.index)
print(y.values)
Output
What is Matplotlib?
Installation of Matplotlib
If you have Python and PIP already installed on a system, then installation of
Matplotlib is very easy.
If this command fails, then use a python distribution that already has Matplotlib
installed, like Anaconda, Spyder etc.
13
Sania Dixit(8522244)
Import Matplotlib
import matplotlib
print(matplotlib. version )
Pyplot
Most of the Matplotlib utilities lies under the pyplot submodule, and are usually
imported under the plt alias:
import numpy as np
plt.plot(xpoints, ypoints)
plt.show()
Result:-
14
Sania Dixit(8522244)
Plotting x and y points
If we need to plot a line from (1, 3) to (8, 10), we have to pass two arrays [1, 8] and
[3, 10] to the plot function.
import numpy as np
plt.plot(xpoints, ypoints)
plt.show()
Result:
15
Sania Dixit(8522244)
Plotting Without Line
To plot only the markers, you can use shortcut string notation parameter 'o', which
means 'rings'.
Example
Draw two points in the diagram, one at position (1, 3) and one in position (8, 10):
import numpy as np
plt.show()
Result:
16
Sania Dixit(8522244)
Multiple Points
You can plot as many points as you like, just make sure you have the same number of
points in both axis.
Example
Draw a line in a diagram from position (1, 3) to (2, 8) then to (6, 1) and finally to
position (8, 10):
import numpy as np
17
Sania Dixit(8522244)
plt.plot(xpoints, ypoints)
plt.show()
Result:
Default X-Points
If we do not specify the points on the x-axis, they will get the default values 0, 1, 2, 3
(etc., depending on the length of the y-points.
So, if we take the same example as above, and leave out the x-points, the diagram will
look like this:
Example
import numpy as np
plt.plot(ypoints)
18
Sania Dixit(8522244)
plt.show()
Result:
Markers
You can use the keyword argument marker to emphasize each point with a specified
marker:
import numpy as np
plt.show()
Result:
19
Sania Dixit(8522244)
Mark each point with a circle:
import matplotlib.pyplot as
plt.show()
Result:
20
Sania Dixit(8522244)
Experiment -3
Code:-
sample() is an inbuilt function of random module in Python that returns a
particular length list of items chosen from the sequence i.e. list, tuple, string or set.
Used for random sampling without replacement.
Syntax : random.sample(sequence, k)
Parameters:
sequence: Can be a list, tuple, string, or set.
k: An Integer value, it specify the length of a sample.
Returns: k length new list of elements chosen from the sequence.
print(sample(list1,3))
Output:
[2, 3, 5]
# import random
import random
# Prints list of random items of
list1 = [1, 2, 3, 4, 5, 6]
print("With list:", random.sample(list1, 3))
21
Sania Dixit(8522244)
print("With tuple:", random.sample(tuple1, 4))
Output:
With list: [3, 1, 2]
With string: ['i', 's', 'T', 'p']
With tuple: ['artificial', 'portal', 'machine', 'computer']
With set: ['b', 'd', 'c']
22
Sania Dixit(8522244)
Experiment: 4
dataset. Code:-
Mean, median, and mode are fundamental topics of statistics. You can easily calculate
them in Python, with and without the use of external libraries.
These three are the main measures of central tendency. The central tendency lets us
know the “normal” or “average” values of a dataset.
def cal_mean(dataset):
return sum(dataset)/len(dataset)
dataset = [12,2,34,34,54,54,3,23,42,33,4,3,2,3]
print("The Mean of the given dataset is: ",cal_mean(dataset))
Output:
23
Sania Dixit(8522244)
Using mean() from the Python Statistic Module
Calculating measures of central tendency is a common operation for most developers.
That’s because Python’s statistics module provides diverse functions to calculate
them, along with other basic statistics topics.
Program:
dataset_median = [12,2,33,4,45,34,32,56,45,56]
Output:
dataset_median = [12,2,33,4,45,34,32,56,45,56]
def cal_median(dataset):
n = len(dataset)
if (n%2==0):
n = n/2
n = int(n)
dataset.sort()
x = (dataset[n-1] + dataset[n])/2
24
Sania Dixit(8522244)
return x
else:
n = (n+1)/2
n = int(n)
dataset.sort()
return dataset[n-1]
Output:
Output:
The median of the given dataset using python library is: 33.5
25
Sania Dixit(8522244)
Experiment -5
Code:-
Where represents the sampling distribution of the sample mean of size n each,
and are the mean and standard deviation of the population respectively.
The distribution of the sample tends towards the normal distribution as the sample
size increases.
It is evident from the graphs that as we keep on increasing the sample size from 1 to
100 the histogram tends to take the shape of a normal distribution.
Rule of thumb:
Of course, the term “large” is relative. Roughly, the more “abnormal” the basic
distribution, the larger n must be for normal approximations to work well. The rule
of thumb is that a sample size n of at least 30 will suffice.
import numpy
import matplotlib.pyplot as plt
# number of sample
num = [1, 10, 50, 100]
# list of sample means
means = []
27
Sania Dixit(8522244)
Experiment : 6
Aim:- Write a program to implement Measure of Spread in datset.
Code:-
Measures of spread tell how spread the data points are. Some examples of measures
of spread are quantiles, variance, standard deviation and mean absolute deviation.
In this excercise we are going to get the measures of spread using python.
Quantiles are values that split sorted data or a probability distribution into equal parts.
There several different types of quantlies, here are some of the examples:
parts
Code:-
import numpy as np
def calculate_quartiles(data):
return quartiles
quantiles = np.percentile(data, q)
return quantiles
28
Sania Dixit(8522244)
# Define a function to calculate percentiles
percentiles = np.percentile(data, p)
return percentiles
# Example data
# Calculate quartiles
quartiles = calculate_quartiles(data)
print("Quartiles:", quartiles)
quantiles = calculate_quantiles(data, [10, 20, 30, 40, 50, 60, 70, 80, 90])
print("Quantiles:", quantiles)
# Calculate percentiles
percentiles = calculate_percentiles(data, [10, 20, 30, 40, 50, 60, 70, 80, 90])
print("Percentiles:", percentiles)
Output:-
29
Sania Dixit(8522244)
1. Mean Absolute Deviation
This is the average of the distance between each data point and the mean of the data.
Code:-
import numpy as np
# Example data
data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
Output:-
30
Sania Dixit(8522244)
Experiment : 7
Code:-
This code demonstrates how to generate and visualize the Probability
Mass Function (PMF), Probability Density Function (PDF), and
Cumulative Distribution Function (CDF) for a normal distribution. You
can adjust parameters like mu and sigma to explore different
distributions, and modify the range and number of points in x to
change the granularity of the plots.
1. Defining Plotting Functions:
plot_pmf(x, pmf, title): This function plots the Probability Mass Function
(PMF). It uses plt.stem() to plot the stem plot, where x represents the data
points, pmf represents the probability mass function values, and title is the
title of the plot.
plot_pdf(x, pdf, title): This function plots the Probability Density Function
(PDF). It uses plt.plot() to plot the PDF curve, where x represents the data
points, pdf represents the probability density function values, and title is the
title of the plot.
plot_cdf(x, cdf, title): This function plots the Cumulative Distribution
Function (CDF). It uses plt.plot() to plot the CDF curve, where x represents
the data points, cdf represents the cumulative distribution function values,
and title is the title of the plot.
Code:-
import numpy as np
import matplotlib.pyplot as plt
# Plot PMF
plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1)
plt.bar(values, probabilities)
plt.title('Probability Mass Function (PMF)')
plt.xlabel('Values')
plt.ylabel('Probability')
# Plot CDF
plt.subplot(1, 2, 2)
cdf_values = [cdf(values, probabilities, val) for val in values]
plt.plot(values, cdf_values, marker='o')
plt.title('Cumulative Distribution Function (CDF)')
plt.xlabel('Values')
plt.ylabel('CDF')
plt.tight_layout()
plt.show()
Output:-
32
Sania Dixit(8522244)
Experiment : 8
Code:-
Tips Database
Tips database is the record of the tip given by the customers in a restaurant for two
and a half months in the early 1990s. It contains 6 columns such as total_bill, tip,
sex, smoker, day, time, size.
Data Visualization
Data visualization is a branch of data analysis that focuses on visualizing data. It plots
data graphically and is a good way to communicate data inferences.
The human brain has an easy and fast processing time and understanding of data when
presented in pictures, maps, and graphs. We can always get a visual description of our
data by using data visualization. Data visualization is important in representing both
big and small data sets. However, it is especially useful when dealing with large data
sets where it is impossible to see all our records, process, and understand them.
Data Visualization in Python
In today's world, data visualization in Python is probably one of the most widely used
functionalities for data science with Python. Python libraries include various features
that allow users to create highly customized, classy, and interactive plots. Python
includes several plotting libraries, such as Matplotlib, Seaborn, and other data
visualization packages. Each has its own set of features for constructing informative,
customized, and intriguing plots to present information most simply and effectively
possible.
Python has several libraries for visualizing data, each with its features. Four of these
libraries will be discussed in this tutorial. Each of these libraries has its own set of
33
Sania Dixit(8522244)
functionalities and can support a variety of graph types.
• Matplotlib
• Seaborn
• Bokeh
• Plotly
We'll go over each of these libraries one after another.
Useful Python Packages for Visualizations
Matplotlib
Matplotlib is a Python visualization library for 2D array plots. Matplotlib is a Python
library that uses the NumPy library. It works with Python and IPython shells, as well
as Jupyter notebooks and web application server software. Matplotlib includes a wide
range of plots, such as scatter, line, bar, histogram, and others, that can assist us in
delving deeper into trends, behavioral patterns, and correlations. John Hunter first
launched it in 2002.
Example:
import pandas as pd
Output:
34
Sania Dixit(8522244)
Matplotlib
Scatter Plot
Scatter plots are used to observe relationships between variables and uses dots to
represent the relationship between them. The scatter() method in the matplotlib
library is used to draw a scatter plot.
Example:
import pandas as pd
import matplotlib.pyplot as plt
plt.show()
35
Sania Dixit(8522244)
Output:
This graph can be more meaningful if we can add colors and also change the size of
the points. We can do this by using the c and s parameter respectively of the scatter
function. We can also show the color bar using the colorbar() method.
Example:
import pandas as pd
import matplotlib.pyplot as plt
36
Sania Dixit(8522244)
plt.colorbar()
Output:
Line Chart
plt.show()
37
Sania Dixit(8522244)
Output:
Bar Chart
A bar plot or bar chart is a graph that represents the category of data with rectangular
bars with lengths and heights that is proportional to the values which they represent.
It can be created using the bar() method.
Example:
import pandas as pd
import matplotlib.pyplot as plt
plt.title("Bar Chart")
# Setting the X and Y labels
plt.xlabel('Day')
plt.ylabel('Tip')
38
Sania Dixit(8522244)
# Adding the legends
plt.show()
Output:
Histogram
# histogram of total_bills
plt.hist(data['total_bill'])
plt.title("Histogram")
# Adding the legends
plt.show()
39
Sania Dixit(8522244)
Output:
Seaborn
Seaborn is a high-level interface built on top of the Matplotlib. It provides beautiful
design styles and color palettes to make more attractive graphs.
To install seaborn type the below command in the terminal.
pip install seaborn
Seaborn is built on the top of Matplotlib, therefore it can be used with the Matplotlib
as well. Using both Matplotlib and Seaborn together is a very simple process. We
just have to invoke the Seaborn Plotting function as normal, and then we can use
Matplotlib’s customization function.
Note: Seaborn comes loaded with dataset such as tips, iris, etc. but for the sake of
this tutorial we will use Pandas for loading these datasets.
Example:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv("tips.csv")
# draw lineplot
sns.lineplot(x="sex", y="total_bill", data=data)
# setting the title using Matplotlib
plt.title('Title using Matplotlib Function')
plt.show()
40
Sania Dixit(8522244)
Output:
Scatter Plot
Scatter plot is plotted using the scatterplot() method. This is similar to Matplotlib,
but additional argument data is required.
Example:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv("tips.csv")
sns.scatterplot(x='day', y='tip', data=data,)
plt.show()
Output:
41
Sania Dixit(8522244)
You will find that while using Matplotlib it will a lot difficult if you want to color
each point of this plot according to the sex. But in scatter plot it can be done with
the help of hue argument.
Example:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
42
Sania Dixit(8522244)