FDS Record
FDS Record
Academic Year
2021-22
CHAITANYA BHARATHI INSTITUTE OF
TECHNOLOGY
Gandipet, Hyderabad-500075
Certificate
Certified that this is the bonafide record of the practical work done during the academic year
2020-2021 by Sreya Reddy Addula
Roll Number _ 160120748017 Section CSE-4
in the Laboratory of Fundamentals of Data Science of the Department of Computer
Science.
Date : 02-02-2022
INDEX
1
CSE-4 FDS RECORD 160120748017
2
CSE-4 FDS RECORD 160120748017
WEEK-1:
PROGRAM 1:
AIM: To access various type of commands from the numpy array
PROCEDURE: In this code we have used type, shape commands. A numpy array
is a grid of values, all of the same type, and is indexed by a tuple of
nonnegative integers. The number of dimensions is the rank of the array;
the shape of an array is a tuple of integers giving the size of the array along
each dimension.
CODE:
import numpy as np
a = np.array([5, 12, 23, 40])
print(type(a)) print(a.shape)
print(a[3], a[1], a[0])
a[0] = 6
print(a)
b = np.array([[1,2,43],[14,5,6]])
print(b.shape)
print(b[0 0], b[0 1], b[1 0])
OUTPUT:
PROGRAM 2:
AIM: To create arrays using various functions
PROCEDURE: Here we have used different functions to create an array like
zeros([m,n]) is the command used to create an array with all zeros with m rows
and n columns
ones([m,n]) is the command used to create an array with all ones with m rows
and n columns
full([m,n]) is the command used to create a constant array with m rows and n
columns
eyes([m,n]) is the command used to create an identity matrix with m rows and
n columns
random.random([m,n]) is the command used to create an array consisting of
random values with m rows and n columns
CODE:
import numpy as np
3
CSE-4 FDS RECORD 160120748017
a = np.zeros((2,5))
print(a)
b = np.ones((2,3))
print(b)
c = np.full((2,2), 12)
print(c)
d = np.eye(2)
print(d)
e = np.random.random((3,2))
print(e)
OUTPUT:
PROGRAM 3:
AIM: To implement slicing.
PROCEDURE: Similar to Python lists, numpy arrays can be sliced. Since arrays
may be multidimensional, you must specify a slice for each dimension of the
array
CODE:
import numpy as np
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
b = a[:3, 1:2]
print(a[0, 1])
b[0, 0] = 23
print(a[0, 1])
OUTPUT:
PROGRAM 4:
AIM: To create an array with different dimensions
4
CSE-4 FDS RECORD 160120748017
PROGRAM 5:
AIM: To implement integer array indexing
PROCEDURE: When you index into numpy arrays using slicing, the resulting
array view will always be a subarray of the original array. In contrast, integer
array indexing allows you to construct arbitrary arrays using the data from
another array.
CODE:
import numpy as np
a = np.array([[1,2], [3, 4], [5, 6]])
print(a[[0, 1, 2], [0, 1, 0]])
print(np.array([a[0, 0], a[1, 1], a[2, 0]]))
print(a[[0, 0], [1, 1]])
print(np.array([a[0, 1], a[0, 1]]))
OUTPUT:
5
CSE-4 FDS RECORD 160120748017
PROGRAM 6:
AIM: To implement Boolean array indexing
PROCEDURE: Boolean array indexing lets you pick out arbitrary elements of an
array. Frequently this type of indexing is used to select the elements of an
array that satisfy some condition.
CODE:
import numpy as np
a = np.array([[1,2], [3, 4], [5, 6]])
bool_idx = (a > 2)
print(bool_idx)
print(a[bool_idx])
print(a[a > 2])
OUTPUT:
PROGRAM 7:
AIM: To implement data types
PROCEDURE: Numpy provides a large set of numeric datatypes that you can
use to construct arrays. Numpy tries to guess a datatype when you create an
array, but functions that construct arrays usually also include an optional
argument to explicitly specify the datatype
CODE:
import numpy as np
x = np.array([1, 2])
print(x.dtype)
x = np.array([1.0, 2.0])
print(x.dtype)
x = np.array([1, 2], dtype=np.int64)
print(x.dtype)
OUTPUT:
PROGRAM 8
AIM: To implement math in arrays
6
CSE-4 FDS RECORD 160120748017
OUTPUT:
PROGRAM 9:
AIM: To implement inner product and vector product
PROCEDURE: We use the dot function to compute inner products of vectors, to
multiply a vector by a matrix, and to multiply matrices. dot is available both as
a function in the numpy module and as an instance method of array objects
CODE:
7
CSE-4 FDS RECORD 160120748017
import numpy as np
x = np.array([[1,2],[3,4]])
y = np.array([[5,6],[7,8]])
v = np.array([9,10])
w = np.array([11, 12])
print(v.dot(w))
print(np.dot(v, w))
print(x.dot(v))
print(np.dot(x, v))
OUTPUT:
PROGRAM 10
AIM: To implement computation functions.
PROCEDURE: Here is the sum command which is used to find the sum of
elements in the array.
CODE:
import numpy as np
x = np.array([[1,2],[3,4]])
print(np.sum(x))
print(np.sum(x, axis=0))
print(np.sum(x, axis=1))
OUTPUT:
PROGRAM 11
AIM: To display transpose of a matrix
PROCEDURE: The transpose of a matrix can be established by arrayname.T
CODE:
import numpy as np
x = np.array([[1,2], [3,4]])
print(x)
print(x.T)
v = np.array([1,2,3])
print(v)
print(v.T)
8
CSE-4 FDS RECORD 160120748017
OUTPUT:
9
CSE-4 FDS RECORD 160120748017
Output:
Code:
import numpy as np
b=np.arange(1,9,2)
print(list(b))
Output:
Code:
#arange([start,] stop[, step], [, dtype=None])
x = np.arange(19.8)
print(x)
x = np.arange(0.8, 19.8,1.0 )
print(x)
Output:
Code:
10
CSE-4 FDS RECORD 160120748017
Output:
Output:
Code:
Output:
11
CSE-4 FDS RECORD 160120748017
Code:
print(M.shape)
print(M)
Output:
Procedure:
Indexing: Array indexing refers to the accessing of elements in the given array.
Slicing: Similar to Python lists, numpy arrays can be sliced. Since arrays may
be multidimensional, you must specify a slice for each dimension of the array.
Code:
Q = np.array([1,5,14,6,87,24,84])
# print the first element of Q
print(Q[0])
# print the last but one element of Q
print(Q[-2])
12
CSE-4 FDS RECORD 160120748017
Output:
Code:
#slicing ( Single Dimensional Array)
S = np.array([ 1, 2, 3, 4, 5, 6, 7, 8, 9])
print(S[2:4])
print(S[:2])
print(S[3:])
print(S[:]) #prints entire array
Output:
Code:
L = np.array([ [[-12, 100, -903,901], [-156,-34,123,392]],
[[39,278,890,456], [-12,-279,125,580]],
[[190,-19,-78,90], [-292,70,109,-18]]])
L[1:3, 0:1,1:4] # equivalent to A[1:3, 0:2, :]
Output:
13
CSE-4 FDS RECORD 160120748017
Dt:21.10.2021
PROGRAM 1:
AIM : To write a program using numpy in python to create an array using dtype .
PROCEDURE: In this program, dtype is used to set the byte size of the elements in
the array . i4 is declared as dtype (np.int32) and arr array is then declared as
array(lst,dtype=i4) which results as all the elements in arr array are int32 data
type.
CODE:
import numpy as np
i4 = np.dtype(np.int32)
print(i4)
list_a = [ [1.2,2.3,4.5,9.0],[2.4,7.8,4.7,5],[7.9,-5.3,7, 5.9],[4.6,7,9,-6.8]]
arr= np.array(lst, dtype=i4)
print(arr)
OUTPUT:
PROGRAM 2:
AIM : To write a program to create an array using dtype and to show repr()
function.
PROCEDURE: In this program, dtype is used to set the layout for the array .dtype
can set different datatypes(different byte size ) to different columns in the multi
dimensional array.
CODE:
import numpy as np
dt = np.dtype([('area', np.int32)])
arr = np.array([(2357), (1456), (6789)], dtype=dt)
print(arr)
print("Internal representation:")
print(repr(arr))
OUTPUT:
14
CSE-4 FDS RECORD 160120748017
PROGRAM 3:
AIM : To write a program to create an array which shows different datatypes in
different columns of the array.
PROCEDURE:In this program , dtype is used to create the layout for the array.
dtype can set different datatypes(different byte size ) to different columns in the
multi dimentional array. And some slicing and indexing operations are done on
the array arr1.
CODE:
d=np.dtype([('product','S20'),('productId','i4'),('Price',np.float64)])
arr1= np.array([('Pen',245,20.4),
('Pencil',304,35.8),
('Book',498,57),
('Mask',268,10),
('Sanitiser',468,59.9)],dtype=d)
print(arr1)
print(repr(arr1))
print(arr1[1])
print(arr1[1][2])
print(arr1[1:])
OUTPUT:
PROGRAM 4:
AIM : To write a program to save the array to a file using savetxt and print data
from the file.
PROCEDURE: This method is used to save an array to a file in requires format .The
NumPy genfromtxt is one of the various functions supported by python numpy
15
CSE-4 FDS RECORD 160120748017
library that reads the table data and generates it into an array of data and
displays as output.
CODE:
np.savetxt("products.csv",
arr1,
fmt="%s;%d;%d",
delimiter=";")
d=np.dtype([('product','S20'),('productId','i4'),('Price','i4')])
a7 = np.genfromtxt("products.csv",
dtype=d,
delimiter=";")
print(a7)
OUTPUT:
16
CSE-4 FDS RECORD 160120748017
Dt: 28.10.21
PROGRAM 1
AIM: To demonstrate pandas series
PROCEDURE:: Pandas Series is a one-dimensional labeled array capable of
holding data of any type (integer, string, float, python objects, etc.). The axis
labels are collectively called index. Pandas Series is nothing but a column in an
excel sheet. Labels need not be unique but must be a hashable type
CODE:
import pandas as pd
A=pd.Series([12,40,23,17])
A
OUTPUT:
PROGRAM 2:
AIM: To access single values from pandas series
PROCEDURE:: Pandas Series is a one-dimensional labeled array capable of
holding data of any type (integer, string, float, python objects, etc.). The axis
labels are collectively called index. Pandas Series is nothing but a column in an
excel sheet. Labels need not be unique but must be a hashable type
PROGRAM CODE:
colors=['blue','red','black','white']
codes=[12,40,23,17]
I=pd.Series(codes,index=colors)
I
OUTPUT:
PROGRAM 3
AIM: To demonstrate addition on pandas series
PROCEDURE:: Pandas Series is a one-dimensional labeled array capable of
holding data of any type (integer, string, float, python objects, etc.). The axis
17
CSE-4 FDS RECORD 160120748017
labels are collectively called index. Pandas Series is nothing but a column in an
excel sheet. Labels need not be unique but must be a hashable type
CODE:
colors=['blue','red','black','white']
colors1=['blue','orange','black','green']
T=pd.Series([12,23,40,17],index=colors)
Y=pd.Series([5,12,16,39],index=colors1)
print(T+Y)
print(sum(T))
OUTPUT:
PROGRAM 4
AIM: To demonstrate how to handle missing values in pandas series
PROCEDURE:: Pandas Series is a one-dimensional labeled array capable of
holding data of any type (integer, string, float, python objects, etc.). The axis
labels are collectively called index. Pandas Series is nothing but a column in an
excel sheet. Labels need not be unique but must be a hashable type
CODE:
colors=['blue','red','black','white']
colors1=['pink','orange','yellow','green']
T=pd.Series([12,23,40,17],index=colors)
Y=pd.Series([5,12,16,39],index=colors1)
print(T+Y)
print(sum(T))
OUTPUT:
18
CSE-4 FDS RECORD 160120748017
PROGRAM 5
AIM: To demonstrate pandas isnull() and notnull() function
PROCEDURE:: Return a boolean same-sized object indicating if the values are
NA. NA values, such as None or numpy.NaN, gets mapped to True values.
Everything else gets mapped to False values. Characters such as empty strings
'' or numpy.inf are not considered NA values
CODE:
my_cities=["USA","Poland","Berlin","China"]
my_city_series=pd.Series(cities,index=my_cities)
print(my_city_series.isnull())
print(my_city_series.notnull())
OUTPUT:
PROGRAM 7
AIM: To demonstrate pandas dropna () function
PROCEDURE:: The dropna() function is used to return a new Series with
missing values removed. There is only one axis to drop values from. If True, do
19
CSE-4 FDS RECORD 160120748017
operation inplace and return None. Whether to perform the operation in place
on the data
CODE:
cities={"Australia":123456,
"China":9324,
"Russia":683506,
"USA":56897,
"Cambodia":896764}
city_series=pd.Series(cities)
print(city_series)
print(my_city_series.dropna())
print(my_city_series.fillna(0))
OUTPUT:
20
CSE-4 FDS RECORD 160120748017
Dt: 11.11.21
Program 1:
AIM : To plot a 2-d graph using matplotlib.
PROCEDURE:.A Line plot can be defined as a graph that displays data as points or check
marks above a number line, showing the frequency of each valuematplotlib.pyplot is
library of functions that make matplotlib work like matlab and helps to visualise the
data. In this program, plot() fuction is used to plot the 2d graph and xlabel , ylabel are
used to provide labels to the graph.
CODE:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
x=[100,200,300]
y=[400,500,600]
plt.plot(x,y)
OUTPUT:
CODE:
plt.title("Graph")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.plot(x,y)
21
CSE-4 FDS RECORD 160120748017
OUTPUT :
PROGRAM 2:
AIM : to plot a 2d graph and determine the use of title(), fontdict, xticks and yticks.
PROCEDURE : In this program, title is used to print the title to the graph. Fontdict is
used to style the title of the graph , to give fontname , fontsize to the title. Xticks and
yticks are used to set the current tick locations.
CODE:
x=[100,200,300]
y=[400,500,600]
plt.plot(x,y)
plt.title("Graph",fontdict={'fontname':'FreeSerif','fontsize':20})
plt.xlabel("X")
plt.ylabel("Y")
plt.xticks([60,100,140,180,220,260,300])
plt.yticks([400,500,600,700,800])
plt.show()
OUTPUT:
22
CSE-4 FDS RECORD 160120748017
PROGRAM 3:
PROGRAM 4:
AIM : To draw multiple plots using plot() function and save the figure.
PROCEDURE : In this program, plots of x+1 , x^2, x^3 are ploted using plot() . to save
the figure, savefig() function is used . the figure is saved as linegraph.png and with dpi
300 by using savefig('linegraph.png',dpi=300).
23
CSE-4 FDS RECORD 160120748017
CODE:
x=[1,1.2,1.4,1.6]
y=[2,2.2,2.4,2.6]
plt.plot(x,y,'b*--',label='x+1')
plt.title("Graph",fontdict={'fontname':'FreeSerif','fontsize':20})
x2=np.arange(0,2.5,0.5)
plt.plot(x2,x2**2,'g^--',label='x^2')
plt.plot(x2,x2**3,'r',label='x^3')
plt.xlabel("X")
plt.ylabel("Y")
plt.savefig('linegraph.png',dpi=300)
plt.legend()
plt.show()
OUTPUT:
PROGRAM 5:
24
CSE-4 FDS RECORD 160120748017
labels=['a','b','c']
values=[10,20,30]
b=plt.bar(labels,values)
OUTPUT:
CODE:
labels=['a','b','c']
values=[10,20,30]
b=plt.bar(labels,values)
b[0].set_hatch('/')
b[1].set_hatch('*')
b[2].set_hatch('.')
OUTPUT:
CODE:
labels=['a','b','c']
25
CSE-4 FDS RECORD 160120748017
values=[10,20,30]
b=plt.bar(labels,values)
patterns=['.','/',"*"]
for i in b:
i.set_hatch(patterns.pop(0))
OUTPUT:
26
CSE-4 FDS RECORD 160120748017
Dt:18.11.2021
Program 1:
Aim: To demonstrate plots for gas prices datasets
Procedure:
A Line plot can be defined as a graph that displays data as points or check
marks above a number line, showing the frequency of each value
A legend is an area describing the elements of the graph. In the matplotlib
library, there’s a function called legend() which is used to Place a legend on
the axes.
format :[color;marker;linestyle]
Program Code:
Output:
27
CSE-4 FDS RECORD 160120748017
Code:
for country in gas:
print(country)
for country in gas:
if country!='Year':
plt.plot(gas.Year,gas[country],marker='.',label=country)
print(gas.Year[::3])
plt.xticks(gas.Year[::3])
plt.xlabel('Year')
plt.ylabel('US Dollars')
plt.legend()
plt.show()
28
CSE-4 FDS RECORD 160120748017
Output:
Program 2:
Aim: To read data from fifa dataset
Procedure:
A Line plot can be defined as a graph that displays data as points or check
marks above a number line, showing the frequency of each value
A legend is an area describing the elements of the graph. In the matplotlib
library, there’s a function called legend() which is used to Place a legend on
the axes.
29
CSE-4 FDS RECORD 160120748017
format :[color;marker;linestyle]
Program Code:
fifa=pd.read_csv('fifa_data.csv')
fifa.head(5)
Output:
Program 3:
Aim: To represent data from fifa dataset using histograms
Procedure:
A histogram graph is a bar graph representation of data. It is a
representation of a range of outcomes into columns formation along the x-
axis. in the same histogram, the number count or multiple occurrences in the
data for each column is represented by the y-axis.
Program Code:
plt.hist(fifa.Overall)
plt.show()
Output:
30
CSE-4 FDS RECORD 160120748017
Program Code:
bins=[40,50,60,70,80,90,100]
plt.figure(figsize=(6,5))
plt.hist(fifa.Overall,bins=bins,color='blue')
plt.xticks(bins)
plt.ylabel('Number of Players')
plt.xlabel('Skill Level')
plt.title('Distribution of Player Skills in FIFA 2018')
plt.savefig('histogram.png',dpi=300)
plt.show()
Output:
31
CSE-4 FDS RECORD 160120748017
Program 4:
Aim: To represent data from fifa dataset using piecharts for preferred legs
Procedure:
A pie chart (or a circle chart) is a circular statistical graphic, which is divided
into slices to illustrate numerical proportion. In a pie chart, the arc length of
each slice (and consequently its central angle and area), is proportional to the
quantity it represents.
Program Code:
l=fifa.loc[fifa['Preferred Foot']=='Left'].count()[0]
r=fifa.loc[fifa['Preferred Foot']=='Right'].count()[0]
labels=['Left','Right']
colors=['y','g']
plt.pie([l,r],labels=labels,colors=colors,autopct='%.1f%%')
plt.title('Foot Preference of FIFA Players')
plt.show()
Output:
Program 5:
Aim: To represent data from fifa dataset using piecharts for weighs
Procedure:
A pie chart (or a circle chart) is a circular statistical graphic, which is divided
into slices to illustrate numerical proportion. In a pie chart, the arc length of
32
CSE-4 FDS RECORD 160120748017
each slice (and consequently its central angle and area), is proportional to the
quantity it represents.
Program Code:
light=fifa.loc[fifa.Weight<125].count()[0]
light_medium=fifa[(fifa.Weight>=125)&(fifa.Weight<150)].count()[0]
medium=fifa[(fifa.Weight>=150)&(fifa.Weight<175)].count()[0]
medium_heavy=fifa[(fifa.Weight>=200)&(fifa.Weight<200)].count()[0]
heavy=fifa[(fifa.Weight>=200)].count()[0]
labels=['Under 125','125-150','150-175','175-200','Over 200']
weights=[light,light_medium,medium,medium_heavy,heavy]
plt.pie(weights,labels=labels)
plt.title('Weight of professional Soccer Players(lbs)')
plt.show()
Output:
Program 6:
Aim: To demonstrate box plots for fifa dataset
Procedure:
Boxplots are a standardized way of displaying the distribution of data based on
a five number summary (“minimum”, first quartile (Q1), median, third quartile
(Q3), and “maximum”).
Program Code:
barcelona=fifa.loc[fifa.Club=='FC Barcelona']['Overall']
33
CSE-4 FDS RECORD 160120748017
madrid=fifa.loc[fifa.Club=='Real Madrid']['Overall']
bp=plt.boxplot([barcelona,madrid])
plt.title('Professional Soccer Team Comparision')
plt.ylabel('FIFA Overall Rating')
plt.show()
Output:
Program 7:
Aim: To demonstrate box plots for fifa dataset
Procedure:
Boxplots are a standardized way of displaying the distribution of data based on
a five number summary (“minimum”, first quartile (Q1), median, third quartile
(Q3), and “maximum”).
Program Code:
barcelona=fifa.loc[fifa.Club=='FC Barcelona']['Overall']
madrid=fifa.loc[fifa.Club=='Real Madrid']['Overall']
rev=fifa.loc[fifa.Club=='New England Revolution']['Overall']
labels=['FC Barcelona','Real Madrid','New England Revolution']
bp=plt.boxplot([barcelona,madrid,rev],labels=labels,patch_artist=True)
for box in bp['boxes']:
box.set(color='b',linewidth=2)
box.set(facecolor='y')
plt.title('Professional Soccer Team Comparision')
plt.ylabel('FIFA Overall Rating')
34
CSE-4 FDS RECORD 160120748017
plt.show()
Output:
35
CSE-4 FDS RECORD 160120748017
Dt:25.11.21
PROGRAM 1
AIM: To implement scatter plot
PROCEDURE: With Pyplot, you can use the scatter() function to draw a scatter plot.
The scatter() function plots one dot for each observation. It needs two arrays of the same
length, one for the values of the x-axis, and one for values on the y-axis
CODE:
import matplotlib.pyplot as plt
import numpy as np
price=np.asarray([23.3,23.20,20.3,10.3,3.2])
sales_per_day=np.asarray([10,20,30,40,50])
profit_margin=np.asarray([5,10,15,20,25])
low=(0,1,0)
medium=(0,0,1)
high=(1,0,0)
sugar_cont=[low,high,high,medium,high]
plt.scatter(x=price,y=sales_per_day,s=profit_margin*10,c=sugar_cont)
plt.show()
OUTPUT:
CODE:
import matplotlib.pyplot as plt
import numpy as np
low=(0,1,0)
medium=(0,0,1)
high=(1,0,0)
price_orange=np.asarray([23.3,23.20,20.3,10.3,3.2])
sales_per_day_orange=np.asarray([10,20,30,40,50])
profit_margin_orange=np.asarray([5,10,15,20,25])
sugar_cont_orange=[low,high,high,medium,high]
price_cereal = np.asarray([1.50, 2.50, 1.15, 1.95])
36
CSE-4 FDS RECORD 160120748017
sales_per_day_cereal = np.asarray([67, 34, 36, 12])
profit_margin_cereal = np.asarray([20,12,7,9])
sugar_cont_cereal = [low, high, medium, low]
plt.scatter(x=price_orange,y=sales_per_day_orange,s=profit_margin_orange*10,c=sugar_c
ont_orange,marker="X")
plt.scatter(x=price_cereal,y=sales_per_day_cereal,s=profit_margin_cereal*10,c=sugar_cont
_cereal,marker="D")
plt.show()
OUTPUT:
PROGRAM 2
AIM: To demonstrate plot function
DESCRIPTION: here in this program kind () function determines the type of the plot
required and fig size reperesents the window
CODE:
import pandas as pd
plotdata = pd.DataFrame({
"2018":[57,67,77,83],
"2019":[68,73,80,79],
"2020":[73,78,80,85]},
index=["Django", "Gafur", "Tommy", "Ronnie"])
plotdata.plot(kind="bar",figsize=(15, 8))
plt.title("FIFA ratings")
plt.xlabel("Footballer")
plt.ylabel("Ratings")
OUTPUT:
37
CSE-4 FDS RECORD 160120748017
PROGRAM 3
AIM: to implement stacked function
DESCRIPTION: here the stacked function plots the data one above the other like a pile
CODE:
import pandas as pd
plotdata = pd.DataFrame({
"2018":[57,67,77,83],
"2019":[68,73,80,79],
"2020":[73,78,80,85]},
index=["Django", "Gafur", "Tommy", "Ronnie"])
plotdata.plot(kind="bar",figsize=(15, 8),stacked="True")
plt.title("FIFA ratings")
plt.xlabel("Footballer")
plt.ylabel("Ratings")
OUTPUT:
38
CSE-4 FDS RECORD 160120748017
PROGRAM 4:
AIM: To demonstrate the first n number of observations from the csv file
PROCEDURE: value_counts() function is used to access the data to certain number given
which is present in the csv file or given dataset
CODE:
top_20 = df['Country'].value_counts()[:20]
top_20.plot(kind='bar',figsize=(10,8))
plt.title('All Time Medals of top 20 countries')
plt.show()
OUTPUT:
Dt:16.12.21
Program 1:
AIM: To implement box plot without using inbuilt function
PROCEDURE:
Boxplots are a standardized way of displaying the distribution of data based on a five
number summary (“minimum”, first quartile (Q1), median, third quartile (Q3), and
“maximum”).
CODE:
#box plot
import matplotlib.pyplot as plt
import numpy as np
39
CSE-4 FDS RECORD 160120748017
data=[199,201,236,269,271,278,283,291,301,303,341]
n=len(data)
m=(n+1)//2
q2=data[m]
q1=data[(n+1)//4]
q3=data[(n+1)*3//4]
iqr= q3-q1
min= q1-(iqr/2)
max= q3+(iqr/2)
x= ['min','q1','q2','q3','max']
y= [min,q1,q2,q3,max]
plt.boxplot(y)
print(y)
plt.show()
OUTPUT:
Program 2:
AIM: To plot frequency polygons using input frequency.
PROCEDURE:
A frequency polygon is a line graph of class frequency plotted against class midpoint. It can
be obtained by joining the midpoints of the tops of the rectangles in the histogram
CODE:
#q2 frequency polygons using frequency
import matplotlib.pyplot as plt
import numpy as np
range_bin=[5.5,10.5,15.5,20.5,25.5,30.5,35.5,40.5]
freq=[1,3,2,4,5,3,2]
l= len(range_bin)
r=[]
for i in range(l-1):
x= (range_bin[i]+range_bin[i+1])/2
r.append(x)
40
CSE-4 FDS RECORD 160120748017
plt.plot(r,freq,marker="*")
plt.xticks([0,5.5,10.5,15.5,20.5,25.5,30.5,35.5,40.5,45.5])
plt.xlabel("Bin range")
plt.ylabel("Frequency")
plt.show()
OUTPUT:
Program 3:
AIM: To plot relative frequency polygon with given input frequencies.
PROCEDURE:
A relative frequency polygon has peaks that represent the percentage of total data points
falling within the interval.
CODE:
#q3 frequency polygons using relative frequency
import matplotlib.pyplot as plt
import numpy as np
range_bin=[5.5,10.5,15.5,20.5,25.5,30.5,35.5,40.5]
freq=[1,3,2,4,5,3,2]
p=len(freq)
c_freq=0
for i in range(p):
c_freq=c_freq+freq[i]
print(c_freq)
c_f=[]
for i in range(p):
c_f.append(freq[i]/c_freq)
l= len(range_bin)
r=[]
for i in range(l-1):
x= (range_bin[i]+range_bin[i+1])/2
r.append(x)
plt.plot(r,c_f,marker="*")
plt.xticks([0,5.5,10.5,15.5,20.5,25.5,30.5,35.5,40.5,45.5])
41
CSE-4 FDS RECORD 160120748017
plt.xlabel("Bin range")
plt.ylabel("Frequency")
plt.show()
OUTPUT:
Program 4:
AIM: To demonstrate stem and leaf plot
PROCEDURE:
Stem and leaf plot is a way of organizing data into a form that makes it easy to observe the
frequency of different types of values.
CODE:
#stem and leaf plot
x=[143,163,154,159,172,165,162,171,146,165,176,145,165,182,175,186,160,158,167,172]
x=sorted(x)
dict_a={}
for i in x :
s= str(i)
y=int(s[0:2])
dict_a[y]=[]
for i in x:
s= str(i)
y= int(s[0:2])
z= int(s[2])
dict_a[y].append(z)
OUTPUT:
42
CSE-4 FDS RECORD 160120748017
Dt: 06-01-2022
Program-1
AIM: To implement one sample t test
PROCEDURE:
1. Identify Null hypothesis for the given problem.
2. Calculate mean of the given data set.
43
CSE-4 FDS RECORD 160120748017
su=su+j
n=len(x)
avg=su/n
b=90
sd=0
for i in x:
sd=sd+(i-avg)**2
s=(sd/(n-1))**(0.5)
tstat=(avg-b)/(s/(n)**(0.5))
print(tstat)
tcritic=1.83
if tstat<tcritic:
print("accept NH")
else:
print("reject NH")
OUTPUT:
USING SCIPY
#one sample t-test
from scipy import stats
data=[90,98,110,150,200,91,82,80,110,96]
t,p=stats.ttest_1samp(data,90)
print("tstat: ",t)
tcr=1.83
44
CSE-4 FDS RECORD 160120748017
print("tcritical: ",tcr)
if(t<tcr):
print("Null hypothesis is accepted")
else:
print("Null hypothesis is rejected")
OUTPUT:
Program 2
AIM: To implement Unpaired unequal Variance T-Test Theory
PROCEDURE:
1.Identify Null hypothesis for the given problem.
2.Calculate first sample mean and second sample for the given data set.
3.Calculate s1 and s2 values by using the formula:
45
CSE-4 FDS RECORD 160120748017
N 1,N 2 are the mean of first
And second samples
6. If t calculated is less than t critical then Null hypothesis is accepted or else rejected
CODE:
x=[13.5,23,13.2,12.7,22.1,17.5,20.1,22.5,19.0,21.9,13.2,11,12.8,13.1,11.6,23.0,13.2,22.9,13
.1]
y=[10.1,27.6,13.8,13.1,25.6,26.7,28.9,30.1,25.4,21.9,12.1,13.4,12.3,11.9,22.2,12.3,22.2]
t_critic=2.052
nx=len(x)
ny=len(y)
meanx=sum(x)/nx
meany=sum(y)/ny
def standard_deviation(a,mean):
x=0
n=len(a)
for i in a:
c= (i-mean)**2
x= x+ c
varience= x/(n-1)
sd=(varience)**(0.5)
return sd
sdx= standard_deviation(x,meanx)
sdy= standard_deviation(y,meany)
df=(((sdx**2)/nx) + ((sdy**2)/ny))/((((sdx/nx)**2)/(nx-1)) +(((sdy/ny)**2)/(ny-1)))
t_stat= (meanx-meany)/(np.sqrt((sdx**2/nx)+(sdy**2/ny)))
print("Degree of freedom :",df)
46
CSE-4 FDS RECORD 160120748017
print("t-statical value :",t_stat)
f_stat= (sdy**2)/(sdx**2)
f_critic=2.23
tcritic=2.052
if f_stat>f_critic :
print("Unequal variences")
else:
print("Equal variences")
if t_stat<tcritic:
print("Null hypothesis is accepted")
else:
print("Null hypothesis is rejected")
OUTPUT:
USING SCIPY
x=[13.5,23,13.2,12.7,22.1,17.5,20.1,22.5,19.0,21.9,13.2,11,12.8,13.1,11.6,23.0,13.2,22.9,13
.1]
y=[10.1,27.6,13.8,13.1,25.6,26.7,28.9,30.1,25.4,21.9,12.1,13.4,12.3,11.9,22.2,12.3,22.2]
res=stats.ttest_ind(x,y,equal_var=False)
print(res)
t_crit=2.052
if abs(res[0]) > t_crit:
print("Alternate hypothesis is rejected.")
else:
print("Null hypothesis is rejected.")
47
CSE-4 FDS RECORD 160120748017
OUTPUT:
Program 3
AIM: To implement unpaired equal variance t test
PROCEDURE:
1.Identify Null hypothesis for the given problem.
2.Calculate first sample mean and second sample mean as x1 bar and x2 bar.
3.Calculate s1 and s2 values by using the formula:
5. If f calculated is less than f critical , then it denotes that the variances of samples are
equal
6. find degree of freedom
48
CSE-4 FDS RECORD 160120748017
9.If t calculated is less than t critical then Null hypothesis is accepted or else rejected.
CODE:
a=[23,15,16,25,20,17,18,14,12,19,21,22]
b= [16,21,16,11,24,21,18,15,19,22,13,24]
f_critic= 2.82
na=len(a)
sa=sum(a)
mean_a=sa/na
nb= len(b)
sb= sum(b)
mean_b= sb/nb
def standard_deviation(a,mean):
x=0
n=len(a)
for i in a:
c= (i-mean)**2
x= x+ c
varience= x/(n-1)
sd=(varience)**(0.5)
return sd
49
CSE-4 FDS RECORD 160120748017
sda= standard_deviation(a,mean_a)
sdb= standard_deviation(b,mean_b)
f_stat=(sdb/sda)**2
print("F-stat:",f_stat)
if f_stat>f_critic:
print("Variences are unequal")
elif f_stat< f_critic:
print("Equal variences")
def pooledSV(X,Y):
n1, n2 = len(X), len(Y)
xbar, ybar = np. mean( X), np.mean(Y)
sum1, sum2 = 0, 0
sum1 = sum([(x - xbar)**2 for x in X])
sum2 = sum([(y - ybar)**2 for y in Y])
return (sum1+sum2)/ (n1+n2-2)
S2 = pooledSV(a,b)
print("Pooled Sample Variance:{}".format(S2))
t_value = (mean_a-mean_b)/ (np.sqrt(S2 * (1/na + 1/nb)))
print("t-value :",t_value)
tcritic=1.796
if t<tcritic:
print("Null hypothesis is accepted")
else:
print("Null hypothesis is rejected")
OUTPUT:
50
CSE-4 FDS RECORD 160120748017
USING SCIPY
from scipy import stats
x= [23, 15, 16, 25, 20, 17, 18, 14, 12, 19, 21, 22]
y= [16, 21, 16, 11, 24, 21, 18, 15, 19, 22, 30, 24]
res = stats.ttest_ind(x,y, equal_var=True)
print(res)
t_crit = 1.717
if abs(res[0]) > t_crit:
print("Alternate hypothesis is rejected.")
else:
print("Null hypothesis is rejected.")
OUTPUT:
Program 4:
AIM: To implement paired t -test
PROCEDURE:
1. Identify Null hypothesis for the given problem.
51
CSE-4 FDS RECORD 160120748017
5. Calculate t value by using the formula:
52
CSE-4 FDS RECORD 160120748017
if tstat<t_critic:
print("Null hypothesis is accepted")
else:
print("Null hypothesis is rejected")
OUTPUT:
USING SCIPY
import pandas as pd
from scipy import stats
#paired t test
pretest=[23,25,28,25,25,26,25,22,30,35,40,35,30,30]
posttest=[35,40,30,40,45,30,30,55,40,40,35,38,41,35]
ttest,pvalue=stats.ttest_rel(pretest,posttest)
print("ttest:",ttest)
print("pvalue:",pvalue)
if pvalue<0.05:
print("Reject Null hypothesis")
else:
print("Accept Null hypothesis")
OUTPUT:
53
CSE-4 FDS RECORD 160120748017
Program 5:
fstat=mssb/mssw
In the program to find with out using scipy, ssw, msw, ssb,msb,fstat were calculated using
numpy, pandas with the help of above formulae. If f statistic is less than f critical then Null
hypothesis is accepted or else rejected.
In the program to find with help of scipy, stats is imported , with the help of f_oneway(),
tstat value and p value will be generated
CODE:
1) Without using scipy
#one way anova
g1=[7,7,6,9,7,7,6,7,8,9]
g2=[5,6,3,5,4,6,5,4,5,5,6,7,6]
g3=[1,3,4,3,1,1,2,6,5,4,3,4,5]
mg1=sum(g1)/len(g1)
mg2=sum(g2)/len(g2)
mg3= sum(g3)/len(g3)
x=0
54
CSE-4 FDS RECORD 160120748017
n1=len(g1)
n2=len(g2)
n3=len(g3)
n= len(g1) +len(g2) +len(g3)
k=3
for i in g1:
y= (i-mg1)**2
x = x+ y
for i in g2:
y= (i-mg2)**2
x = x+ y
for i in g3:
y= (i-mg3)**2
x = x+ y
mssw= x/(n-k)
G=(sum(g1)+sum(g2)+sum(g3))/n
mssb= ((n1*(mg1-G)**2) + (n2*(mg2-G)**2) +(n3*(mg3-G)**2))/2
fstat=mssb/mssw
print("Mssb and Mssw are :",mssb,mssw)
fcritic=3.32
if fstat>fcritic :
print("Reject Null Hypothesis")
else:
print("Accept null hypothesis")
print("f-stat:",fstat)
OUTPUT:
55
CSE-4 FDS RECORD 160120748017
2) using scipy
from scipy import stats
import numpy as np
g1=[7,7,6,9,7,7,6,7,8,9]
g2=[5,6,3,5,4,6,5,4,5,5,6,7,6]
g3=[1,3,4,3,1,1,2,6,5,4,3,4,5]
stats.f_oneway(g1,g2,g3)
OUTPUT:
56