0% found this document useful (0 votes)
10 views53 pages

Fdsa Lab Manual

lab manual

Uploaded by

saveetha.cse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views53 pages

Fdsa Lab Manual

lab manual

Uploaded by

saveetha.cse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

ADHI COLLEGE OF ENGINEERING

AND
TECHNOLOGY
Sankarapuram, Near Walajabad, KancheepuramDist, Pin: 631605

DEPARTMENT
OF
ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

STUDENT NAME :

REGISTER NO :

SUBJECT CODE : AD3411

SUBJECT NAME : DATA SCIENCE AND ANALYTICS LABORATORY

YEAR / SEMESTER : II/IV(2024- 2025)

1
Sankarapuram, Near Walajabad, KancheepuramDist, Pin: 631605
DEPARTMENT OF

ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

LABORATORY RECORD NOTEBOOK

Academic Year 2024–2025

This is to certify that this is a bonafide record of the work done

byMr./Ms. of the year _______________B.E/B.Tech.,

Department of _____________________________________________________

In the Laboratory in the_________________________Semester.

Staff In-Charge Head of the Department

Submitted to the University Examination held on .

INTERNAL EXAMINER EXTERNAL EXAMINER

2
TABLE OF CONTENTS

PAGE
EX.NO DATE LISTOFEXPERIMENTS SIGN
NO
Download and Install the NumpyScipyJupyter
1
Packages and Pandas
Working with Numpy Arrays
2

3 Working with Pandas Data frames

4 Basic plots using Matplotlib

5 Frequency distributors, Averages,Variability

6 Normal Curves,Correlation and Scatter


plots,Correlation Coefficient

7 Regression

8 Z-Test

9 T-Test

10 ANOVA

11 Building and Validating Linear models

12 Building and validating Logistic models

13 Time series analysis

3
Ex. No: 1
DOWNLOAD AND INSTALL THE NUMPY SCIPY
Date:
JUPYTER PACKAGES AND PANDAS

How to install Numpy package:

Download install and Explore one features of numpy, scepy, jupyter, stats model, pandas
package.

NUMPY:

Install:

1. Press comment + space bar to open spothight search. Type in terminal and press enter.

2. In the terminal, use the pip command to install numpy package.

3. Once the package is install successfully, type python to get into python prompt. Notice the python
version is displayed too.

Description:

Numpy is a python library used for working with arrays. It also has function for working in domain
od linear algebra, farrier transform and matrices. Numpy was created in 2005 by travisoliphant. It is
an open source project and your can use it freely numpy strands for numerical python.

Command:

pip install numpy

SCIPY

Description:Scipy is a scientific computation library that uses numpy underneath. Scipy stands for
scientific python. It provides more utility functions for optimization, stats and signed processing.
Like numpy, scipy is open source so we can use it freely.

Command:

pip install scipy

4
JUPYTER

Description:

The pupyter notebook is an open source web application that you use to wide and share
documents that contain live code, equation visualization.

Command:

Pip install jupyter

PANDA PACKAGE

Description:

Pandas is a python package that provides fast, flexible and expressive data structures designed to
make making with “relation” or “labelled” data path.

Command:

Pip install panda

SATE MODELS

Description:

State models is a python module that provide classes and functions for the estination of many
different statical models as well as conducting statical.

Command:

pip install state models

5
RESULT
Thus the install, download and explore the features of numpy, scipy, jupyter, stats models and
pandas package are done.

6
Ex. No:2
W0RKING WITH NUMPY ARRAY, CREATION OF NUMPY
Date:
ARRAY

AIM

PROCEDURE

7
PROGRAM

import numpy as np

arr=np.array([1,2,3])

print(“Array with Rank 1: \n”,arr)

arr=np.array([1,2,3],[4,5,6])

print(“Array with Rank 2:\n”,arr)

arr=np.array([1,3,2])

print(“\n Array created using passed tuple;\n”,arr)

OUTPUT:

Array with Rank 1:

[1 2 3]

Array with Rank 2:

[[1 2 3]

[4 5 6]]

Array created using passed tuple;

[1 3 2]

8
ACCEESSING THE ARRAY INDEX
PROGRAM:-
import numpy as np

arr=np.array ([[-1,2,,0,4],[4,-0.5,6,0],[2.6,0,7,6],[3,-7,4,2.0]])

print(“Initial Array: ”)

print(arr)

sliced_arr=arr[:2,::2]

print(“Array with first 2 rows and alternate columns 10 and 2 :\n”,sciled_arr)

index-arr=arr[[1,1,0,3],[3,2,1,0]]

print(“\n Elements at indices (1,3),(1,2),(0,1),(3,0):\n”,index_arr)

OUTPUT:
Initial Array:

[[-1. 2. 0. 4. ]

[ 4. -0.5 6. 0. ]

[ 2.6 0. 7. 6. ]

[ 3. -7. 4. 2. ]]

Array with first 2 rows and alternate columns 10 and 2 :

[[-1. 0.]

[ 4. 6.]]

Elements at indices (1,3),(1,2),(0,1),(3,0):

[0. 6. 2. 3.]

9
BASIC ARRAY OPERATION
import numpy as np

a=np.array([[1,2],[3,4]])

b=np.array([[4,3],[2,1]])

print(“Adding 1 to every element: ”,a+1)

print(“\n Subtracting 2 from each element :”,b-2)

print(“\n Sum of all array elements:”,a.sum())

print(“\n Array sum:\n”,a+b)

OUTPUT:
Adding 1 to every element: [[2 3]

[4 5]]

Subtracting 2 from each element : [[ 2 1]

[ 0 -1]]

Sum of all array elements: 10

Array sum:

[[5 5]

[5 5]]

10
RESULT

Thus the program was created and executed successfully.

11
Ex. No: 3
WORKING WITH PANDAS
Date:

AIM

PROCEDURE

12
PROGRAM

import pandas as pd

raw_data={‘F_name’:[‘A’,’B’,’C’],

‘L_name’:[‘X’,’W’,’Y’],

‘R.No’:[10,11,15],

‘mark’:[50,55,60]}

df=pd.DataFrame(raw_data,columns=[‘F_name’,’L_name’,’R.No’,’mark’])

print(df)

CREATING A DATAFRAME USING LIST


import pandas as pd

list=[‘DataFrame’,’is’,’created’,’successfully’]

df=pd.DataFrame(list)

print(df)

DataFrame as a dictionary
import pandas as pd

dict1 ={‘one’:1,’two’:2,’three’:3,’four’:4,’five’:5}

dict2 ={‘one’:’excellent’,’two’:’very
good’,’three’:’good’,’four’:’moderable’,’five’:’satisfactory’}

numbers1=pd.Series(dict1)

numbers2=pd.Series(dict2)

df=pd.DataFrame({‘numbers’:numbers1,’grade’:numbers2})

print(df)
13
DataSelection in series
import pandas as pd

data=pd.Series([0.25,0.5,0.75,1.0],index=[‘a’,’b’,’c’,’d’])

print(data)

print(data[‘b’])

print(‘a’ in data)

print(‘f’ in data)

print(data.keys())

data[‘e’]=1.25

print(data)

print(data[0:2])

.iloc()

import pandas as pd

import numpy as np

df=py.DataFrame(np.random.randiant(5,3),columns=[‘A’,’B’,’C’])

print(‘DataFrame:\n’,df)

print(‘Selected:\n’,df.iloc[5,3])

print(‘Selected:\n’,df.iloc[4])

import pandas as pd

import numpy as np

A=np.random.randiant(10,size(3,4))

14
print(‘A:\n’A)

print(‘num;A-A[0]:\n’,A-A[0])

df=pd.DataFrame(A,columns=list(‘ABCD’))

df=df.iloc[0]

Concatenation
import pandas as pd

ser 1=pd.Series([‘A’,’B’,’C’],index=[1,2,3])

ser 2=pd.Series([‘X’,’Y’,’Z’],index=[4,5,6])

print(pd.concat([ser 1,ser 2]))

df1=pd.DataFrame({‘course’:[‘Numpy’,’Pandas’],’Fee’:[2000,5000]})

df2=pd.DataFrame({‘course’:[‘C’,’Java’],’Fee’:[3000,3000]})

data=[df1,df2]

df=pd.concat(data)

print(df1,’\n’,df2)

print(‘\n concatenated:\n’,df)

15
OUTPUT:
F_nameL_nameR.No mark

0 A X 10 50

1 B W 11 55

2 C Y 15 60

0 DataFrame

1 is

2 created

3 successfully

numbers grade

one 1 excellent

two 2 very good

three 3 good
four 4 moderable

five 5 satisfactory

a 0.25

b 0.50

c 0.75

d 1.00

dtype: float64

0.5

16
True

False

Index(['a', 'b', 'c', 'd'], dtype='object')

a 0.25

b 0.50

c 0.75

d 1.00

e 1.25

dtype: float64

a 0.25

b 0.50

dtype: float64

DataFrame:

A B C

0 0.046331 0.888653 0.575410

1 0.482769 0.515486 0.472929

2 0.900508 0.550911 0.881499

3 0.049788 0.623545 0.995443


4 0.979166 0.072843 0.644333

17
RESULT
Thus the program was created and executed successfully.

18
Ex. No: 4
BASIC PLOTS USING MATPLOTLIB
Date:

AIM

PROCEDURE

19
PROGRAM:~
import matplotlib.pyplot as plt
a = [1, 2, 3, 4, 5]
b = [0, 0.6, 0.2, 15, 10, 8, 16, 21]
plt.plot(a)
# o is for circles and r is
# for red
plt.plot(b, "or")
plt.plot(list(range(0, 22, 3)))
# naming the x-axis
plt.xlabel('Day ->')
# naming the y-axis
plt.ylabel('Temp ->')
c = [4, 2, 6, 8, 3, 20, 13, 15]
plt.plot(c, label = '4th Rep')
# get current axes command
ax = plt.gca()
# get command over the individual
# boundary line of the graph body
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)

20
# set the range or the bounds of
# the left boundary line to fixed range
ax.spines['left'].set_bounds(-3, 40)
# set the interval by which
# the x-axis set the marks
plt.xticks(list(range(-3, 10)))
# set the intervals by which y-axis
# set the marks
plt.yticks(list(range(-3, 20, 3)))
# legend denotes that what color
# signifies what
ax.legend(['1st Rep', '2nd Rep', '3rd Rep', '4th Rep'])
# annotate command helps to write
# ON THE GRAPH any text xy denotes
# the position on the graph
plt.annotate('Temperature V / s Days', xy = (1.01, -2.15))
# gives a title to the Graph
plt.title('All Features Discussed')
plt.show()

21
OUTPUT:

RESULT
Thus the program was created and executed successfully.

22
Ex. No: 5 FREQUENCY DISTRIBUTION,
AVERAGES,VARIABILITY
Date:

AIM

PROCEDURE

23
PROGRAM:~
PYTHON PROGRAM TO GET AVERAGE OF A LIST
import numpy as np

li_st1=[2,40,2,502,177,7,9]

print('Average of List',np.average(li_st1))

OUTPUT:
Average of List 105.57142857142857

PYTHON PROGRAM TO GET VARIANCE OF A LIST


import numpy as np

li_st1=[2,40,2,502,177,7,9]

print('variance of list',np.var(li_st1))

OUTPUT:
variance of list 29579.10204081633

24
PYTHON PROGRAM TO GET STANDARD DEVIATION OF A LIST
import numpy as np
li_st1=[2,40,2,502,177,7,9]
print('standard deviation of list',np.std(li_st1))

OUTPUT:
standard deviation of list 171.98576115718512

RESULT:
Thus the program was created and executed successfully.

25
Ex. No: 6
NORMAL CURVES,CORRELATION AND SCATTER
Date:
PLOTS,CORRELATION COEFFICIENT

AIM

PROCEDURE

26
PROGRAM

NORMAL CURVES:

import numpy as np

from scipy.stats import norm

import matplotlib.pyplot as plt

# Generate some data for this

# demonstration.

data = np.random.normal(170, 10, 250)

# Fit a normal distribution to

# the data:

# mean and standard deviation

mu, std = norm.fit(data)

# Plot the histogram.

plt.hist(data, bins=25, density=True, alpha=0.6, color='b')

# Plot the PDF.

xmin, xmax = plt.xlim()

x = np.linspace(xmin, xmax, 100)

p = norm.pdf(x, mu, std)

plt.plot(x, p, 'k', linewidth=2)

title = "Fit Values: {:.2f} and {:.2f}".format(mu, std)

plt.title(title)

plt.show()

27
OUTPUT:

PROGRAM
CORRELATION COEFFICIENT
# Python Program to find correlation coefficient.

import math

# function that returns correlation coefficient.

defcorrelationCoefficient(X, Y, n) :

sum_X = 0

sum_Y = 0

28
sum_XY = 0

squareSum_X = 0

squareSum_Y = 0

i=0

while i<n :

# sum of elements of array X.

sum_X = sum_X + X[i]

# sum of elements of array Y.

sum_Y =sum_Y + Y[i]

# sum of X[i] * Y[i].

sum_XY = sum_XY + X[i] * Y[i]

# sum of square of array elements.

squareSum_X = squareSum_X + X[i] * X[i]

squareSum_Y = squareSum_Y + Y[i] * Y[i]

i=i+1

# use formula for calculating correlation

29
# coefficient.

corr = (float)(n * sum_XY - sum_X * sum_Y)/(float)(math.sqrt((n *


squareSum_X -

sum_X * sum_X)* (n * squareSum_Y -

sum_Y * sum_Y)))

return corr

# Driver function

X = [15, 18, 21, 24, 27]

Y = [25, 25, 27, 31, 32]

# Find the size of array.

n = len(X)

# Function call to correlationCoefficient.

print ('{0:.6f}'.format(correlationCoefficient(X, Y, n)))

# This code is contributed by Nikita Tiwari.

OUTPUT:
0.953463

30
RESULT

Thus the program was created and executed successfully.

31
Ex. No: 7 REGGRESSION
Date:

AIM

PROCEDURE

32
PROGRAM

import numpy as np

import matplotlib.pyplot as plt

defestimate_coef(x, y):

# number of observations/points

n = np.size(x)

# mean of x and y vector

m_x = np.mean(x)

m_y = np.mean(y)

# calculating cross-deviation and deviation about x

SS_xy = np.sum(y*x) - n*m_y*m_x

SS_xx = np.sum(x*x) - n*m_x*m_x

# calculating regression coefficients

b_1 = SS_xy / SS_xx

b_0 = m_y - b_1*m_x

return (b_0, b_1)

defplot_regression_line(x, y, b):

# plotting the actual points as scatter plot

plt.scatter(x, y, color = "m",

marker = "o", s = 30)

# predicted response vector

33
y_pred = b[0] + b[1]*x

# plotting the regression line

plt.plot(x, y_pred, color = "g")

# putting labels

plt.xlabel('x')

plt.ylabel('y')

# function to show plot

plt.show()

defmain():

# observations / data

x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])

# estimating coefficients

b = estimate_coef(x, y)

print("Estimated coefficients:\nb_0 = {} \

\nb_1 = {}".format(b[0], b[1]))

# plotting regression line

plot_regression_line(x, y, b)
if name == " main ":

main()

34
OUTPUT:

RESULT:

Thus the program was created and executed successfully.

35
Ex. No: 8
Z-TEST
Date:

AIM

PROCEDURE

36
Program:-

# imports

import math

import numpy as np

from numpy.random import randn

from statsmodels.stats.weightstats import ztest

# Generate a random array of 50 numbers having mean 110 and sd 15

# similar to the IQ scores data we assume above

mean_iq = 110

sd_iq = 15/math.sqrt(50)

alpha =0.05

null_mean =100

data = sd_iq*randn(50)+mean_iq

# print mean and sd

print('mean=%.2f stdv=%.2f' % (np.mean(data), np.std(data)))

# now we perform the test. In this function, we passed data, in the value parameter

# we passed mean value in the null hypothesis, in alternative hypothesis we check


whether the

# mean is larger

ztest_Score, p_value= ztest(data,value = null_mean, alternative='larger')

37
# the function outputs a p_value and z-score corresponding to that value, we compare
the

# p-value with alpha, if it is greater than alpha then we do not null hypothesis

# else we reject it.

if(p_value< alpha):

print("Reject Null Hypothesis")

else:

print("Fail to Reject NUll Hypothesis")

OUTPUT:
mean=110.27 stdv=2.08

Reject Null Hypothesis

RESULT:

Thus the program was created and executed successfully.

38
Ex.No:09 T-TEST
Date:

AIM

PROCEDURE

39
Program:-
import numpy as np
from scipy import stats
N=10
x=np.random.randn(N)+2
y=np.random.randn(N)
var_x=x.var(ddof=1)
var_y=y.var(ddof=1)
sd=np.sqrt((var_x+var_y)/2)
print('standard deviation =',sd)
tval =(x.mean()-y.mean())/(sd*np.sqrt(2/N))
dof=2*N-2
pval=1-stats.t.cdf(tval,df=dof)
print('t=' +str(tval))
print('p='+str(2*pval))
tval2,pval2=stats.ttest_ind(x,y)
print('t=' +str(tval2))
print('p='+str(pval2))

40
OUTPUT:
standard deviation = 0.839417088469087
t=3.9906805884970167
p=0.0008574438057769029
t=3.9906805884970162
p=0.0008574438057768628

RESULT:
Thus the program was created and executed successfully.

41
Ex.No:10 ANOVA
Date:

AIM

PROCEDURE

42
PROGRAM:⁓
# Importing libraries
import pandas as pd
import numpy as np
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Create a dataframe
dataframe = pd.DataFrame({'Fertilizer': np.repeat(['daily', 'weekly'], 15),
'Watering': np.repeat(['daily', 'weekly'], 15),
'height': [14, 16, 15, 15, 16, 13, 12, 11,14, 15,
16, 16, 17, 18, 14, 13,14, 14, 14, 15, 16, 16, 17, 18,14, 13, 14, 14, 14, 15]})

# Performing two-way ANOVA


model = ols('height ~ C(Fertilizer) + C(Watering) +\
C(Fertilizer):C(Watering)',
data=dataframe).fit()
result = sm.stats.anova_lm(model, type=2)

# Print the result


print(result)

43
OUTPUT:
C(Fertilizer) 1.0 0.033333 0.033333 0.012069 0.913305
C(Watering) 1.0 0.000369 0.000369 0.000133 0.990865
C(Fertilizer):C(Watering) 1.0 0.040866 0.040866 0.014796 0.904053
Residual 28.0 77.333333 2.761905 NaNNaN

RESULT:
Thus the program was created and executed successfully.

44
Ex No:11 BUILDING AND VALIDATING LINEAR MODELS
Date:

AIM

PROCEDURE

45
PROGRAM:⁓
import matplotlib.pyplot as plt
from scipy import stats
x = [89,43,36,36,95,10,66,34,38,20,26,29,48,64,6,5,36,66,72,40]
y = [21,46,3,35,67,95,53,72,58,10,26,34,90,33,38,20,56,2,47,15]
slope, intercept, r, p, std_err = stats.linregress(x, y)
defmyfunc(x):
return slope * x + intercept
mymodel = list(map(myfunc, x))
plt.scatter(x, y)
plt.plot(x, mymodel)
plt.show()

OUTPUT:

46
RESULT:
Thus the program was created and executed successfully.

47
Ex No:12 BUILDING AND VALIDATING LOGISTIC MODELS
Date:

AIM

PROCEDURE

48
PROGRAM:
import numpy
from sklearn import linear_model
X = numpy.array([3.78, 2.44, 2.09, 0.14, 1.72, 1.65, 4.92, 4.37, 4.96, 4.52, 3.69,
5.88]).reshape(-1,1)
y = numpy.array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1])
logr = linear_model.LogisticRegression()
logr.fit(X,y)
deflogit2prob(logr, X):
log_odds = logr.coef_ * X + logr.intercept_
odds = numpy.exp(log_odds)
probability = odds / (1 + odds)
return(probability)
print(logit2prob(logr, X))

49
OUTPUT:
[[0.60749955]
[0.19268876]
[0.12775886]
[0.00955221]
[0.08038616]
[0.07345637]
[0.88362743]
[0.77901378]
[0.88924409]
[0.81293497]
[0.57719129]
[0.96664243]]

RESULT:
Thus the program was created and executed successfully.

50
Ex No:13 TIME SERIES ANALYSIS
Date:

AIM:

PROCEDURE:

51
PROGRAM:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# reading the dataset using read_csv
df = pd.read_csv("stock_data.csv",
parse_dates=True, index_col="Date")
# displaying the first five rows of dataset
df.head()
# deleting column
df.drop(columns='Unnamed: 0')
df['Volume'].plot()
plt.show()

52
OUTPUT:

RESULT:
Thus the program was created and executed successfully.

53

You might also like