0% found this document useful (0 votes)
34 views

CS Lab Programs

The document contains source code for performing various statistical analyses in Python including linear regression, polynomial regression, correlation analysis using Pearson's and Spearman's correlation coefficients, and one-way and two-way ANOVA. For one-way and two-way ANOVA, the code takes input for number of treatments and blocks, calculates sums of squares, mean squares, and performs F-test to classify data by accepting or rejecting the null hypothesis.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

CS Lab Programs

The document contains source code for performing various statistical analyses in Python including linear regression, polynomial regression, correlation analysis using Pearson's and Spearman's correlation coefficients, and one-way and two-way ANOVA. For one-way and two-way ANOVA, the code takes input for number of treatments and blocks, calculates sums of squares, mean squares, and performs F-test to classify data by accepting or rejecting the null hypothesis.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

REGNO :

1 . Aim: Write a python program to find the best fit straight line and draw the
scatter plot.

Souce code:

#importing necessary libraries


import numpy as np
import matplotlib.pyplot as plt
from statistics import mean

#reading input
x=[float(x) for x in input().split(" ")]
y=[float(x) for x in input().split(" ")]

x=np.array(x)
y=np.array(y)

#calculating slope and intercept


m=sum((x-mean(x))*(y-mean(y)))/sum((x-mean(x))**2)
m=round(m,4)

c=mean(y)-m*mean(x)
c=round(c,4)

#Best fit equation is


print("equation of straight line is y={}x+{} ".format(m,c))

y_hat=m*x+c

sse=sum((y-y_hat)**2)

sst=sum((y-mean(y))**2)

ssr=sst-sse

1
REGNO :

r2=ssr/sst
print('r^2 value is : ',r2)

#goodness of fit
if(r2>0.90):
print('Good Fit')
else:
print('Not Good Fit')

#plotting Graphs
plt.scatter(x,y)
plt.show()

plt.plot(x,y,'bo-')
plt.plot(x,y_hat,'ro-')
plt.show()

2
REGNO :

OUTPUT:

1.

3
REGNO :

2.

4
REGNO :

2.Aim :Write a python program to fit a second degree parabola of the form
y=a+bx+cx2 and draw the scatter plot.

Source Code :

#importing libraries
import numpy as np
import matplotlib.pyplot as plt
x=np.array([float(x) for x in input().split(" ")])
y=np.array([float(x) for x in input().split(" ")])
n=len(x)
sumx=np.sum(x)
sumy=np.sum(y)
sumxy=np.sum(x*y)
sumx2=np.sum(x*x)
sumx3=np.sum(x*x*x)
sumx4=np.sum(x*x*x*x)
sumx2y=np.sum(x*x*y)
#calculating determinant
def getMinor(m,i,j):
return [row[:j] + row[j+1:] for row in (m[:i]+m[i+1:])]
def getDeternminant(m):
if len(m) == 2:
return m[0][0]*m[1][1]-m[0][1]*m[1][0]
determinant = 0
for c in range(len(m)):
determinant += ((-1)**c)*m[0][c]*getDeternminant(getMinor(m,0,c))
return determinant
#by using cramer's rule (without built-in)
p=getDeternminant([[n,sumx,sumx2],[sumx,sumx2,sumx3],[sumx2,sumx3,sumx4]])
q=getDeternminant([[sumy,sumxy,sumx2y],[sumx,sumx2,sumx3],[sumx2,sumx3,sumx4]])
r=getDeternminant([[n,sumx,sumx2],[sumy,sumxy,sumx2y],[sumx2,sumx3,sumx4]])
s=getDeternminant([[n,sumx,sumx2],[sumx,sumx2,sumx3],[sumy,sumxy,sumx2y]])
a=round(q/p,3)
b=round(r/p,3)

5
REGNO :

c=round(s/p,3)
print("The equation of parabola is y={}+{}x+{}x2".format(a,b,c))
plt.scatter(x,y)
plt.show()

OUTPUT:
1.
2 4 6 8 10
3.07 12.85 31.47 57.38 91.29
The equation of parabola is y=0.696+-0.855x+0.992x2

2.
01234
1 1.8 1.3 2.5 6.3
The equation of parabola is y=1.42+-1.07x+0.55x2

6
REGNO :

3.Aim: Write a python program to find Karl Pearson’s correlation coefficient.

Source Code:
import numpy as np
import pandas as pd
from math import sqrt
#reading inputs
a=[int(x) for x in input("Enter x values : ").split(" ")]
b=[int(x) for x in input("Enter y values : ").split(" ")]
x=np.array(a)
y=np.array(b)
xy=x*y
x2=x*x
y2=y*y
n=len(x)

#Data Frame
df=pd.DataFrame()
df['x']=x
df['y']=y
df['xy']=xy
df['x2']=x2
df['y2']=y2
print(df)

#calculating correlation coefficient


num=n*sum(xy)- (sum(x)*sum(y))
den=sqrt(((n*(sum(x**2)))-sum(x)**2) * ((n*(sum(y**2)))-sum(y)**2))
r=num/den
r=round(r,5)
print("correlation coefficient r is : ",r)

7
REGNO :

OUTPUT:

1.
Enter x values : 3 7 4 2 0 4 1 2
Enter y values : 11 18 9 4 7 6 3 8
x y xy x2 y2
0 3 11 33 9 121
1 7 18 126 49 324
2 4 9 36 16 81
3 2 4 8 4 16
4 0 7 0 0 49
5 4 6 24 16 36
6 1 3 3 1 9
7 2 8 16 4 64
correlation coefficient r is : 0.78673

2.

Enter x values : 65 68 67 67 68 69 70 72
Enter y values : 67 68 65 68 72 72 69 71
x y xy x2 y2
0 65 67 4355 4225 4489
1 68 68 4624 4624 4624
2 67 65 4355 4489 4225
3 67 68 4556 4489 4624
4 68 72 4896 4624 5184
5 69 72 4968 4761 5184
6 70 69 4830 4900 4761
7 72 71 5112 5184 5041
correlation coefficient r is : 0.59094

8
REGNO :

4.Aim:Write a python program to find the Spearman’s correlation coefficient


between x and y variables.

Source Code:

import numpy as np
import pandas as pd
a=[int(x) for x in input().split(" ")]
b=[int(x) for x in input().split(" ")]
n=len(a)
df=pd.DataFrame({'A':a,'B':b})

def rank(a):
s=sorted(a)
n=len(a)
s=s[::-1]
i=0
d=[]
count=[]
while i<n:
k=s.count(s[i])
if k==1:
d.append(i+1)
i=i+1
else:
m=0
for j in range(i+1,i+k+1):
m=m+j
m=m/k
for j in range(k):
d.append(m)
i=i+k
count.append(k)

9
REGNO :

r=[]
for i in range(n):
j=s.index(a[i])
r.append(d[j])

return r,count

r_x,c_x=rank(a)
r_y,c_y=rank(b)
print(r_x)
print(r_y)

df['Rank of x']=r_x
df['Rank of y']=r_y

di=[]
di2=[]
for i in range(len(a)):
k=r_x[i]-r_y[i]
di.append(k)
di2.append(k**2)

df['di']=di
df['di2']=di2

def correction_factor(c):
if len(c)!=0:
m=c[0]
cf=(m*(m**2-1))/12
return cf
else:
return 0

10
REGNO :

cf_x=correction_factor(c_x)
cf_y=correction_factor(c_y)
sum_di2=sum(di2)+cf_x+cf_y
print("Correction factor of a",cf_x)
print("Correction factor of b",cf_y)
print("Modified di2 sum is : ",sum_di2)

r=1-((6*sum_di2)/(n*(n**2-1)))
print("Rank Corelation coefficint : ",round(r,4))

OUTPUT:
1.
110 100 140 120 80 90
70 60 80 90 10 20
[3, 4, 1, 2, 6, 5]
[3, 4, 2, 1, 6, 5]
Correction factor of a 0
Correction factor of b 0
Modified di2 sum is : 2
Rank Corelation coefficint : 0.9429
2.
115 109 112 87 98 120 98 100 98 118
75 73 85 70 76 82 65 73 68 80
[3, 5, 4, 10, 8.0, 1, 8.0, 6, 8.0, 2]
[5, 6.5, 1, 8, 4, 2, 10, 6.5, 9, 3]
Correction factor of a 2.0
Correction factor of b 0.5
Modified di2 sum is : 45.0
Rank Corelation coefficint : 0.7273

11
REGNO :

5.Aim:Write a python program to classify the data based on one way Anova.

Source Code:

import scipy.stats

co=int(input("Enter no of treatements : "))


B=[]
A=B[:][:]
for i in range(co):
l=[int(x) for x in input("enter treatements : ").split(" ")]
A.append(l)
print("Treatments are :")
print(A)

m=[]
for i in range(co):
sum=0
for j in range(len(A[i])):
sum=sum+A[i][j]
m.append(sum)

g=0
for i in range(len(m)):
g=g+m[i]
print("The value of G is : ",g)

r=[]
for i in range(co):
sum1=0
k=len(A[i])
for j in range(len(A[i])):
sum1=sum1+(A[i][j])
t=sum1*sum1/k

12
REGNO :

r.append(t)
print(r)

ti=0
for i in range(len(r)):
ti=ti+r[i]
print("ti^2/n value is : ",ti)

c=0
for i in range(co):
for j in range(len(A[i])):
c=c+1

rss=0
for i in range(co):
for j in range(len(A[i])):
rss=rss+(A[i][j]**2)
print("RSS is : ",rss)

cf=g*g/c
print("CF is : ",cf)

sst=rss-cf
print("SST is : ",sst)

sstr=ti-cf
print("SSTR is : ",sstr)

sse=sst-sstr
print("SSE is : ",sse)

msstr=sstr/(co-1)
print("MSSTR is : ",msstr)

13
REGNO :

msse=sse/(c-co)
print("MSSE is : ",msse)

alpha=float(input("enter alpha value : "))


f=msstr/msse

if(f>1):
f=msstr/msse
f_tab=scipy.stats.f.ppf(1-alpha, co-1, c-co)
elif(f<1):
f=msse/msstr
f_tab=scipy.stats.f.ppf(1-alpha, c-co, co-1)
print("F calcualted value is : ",f)
print("F table value is : ",f_tab)

if(f>f_tab):
print("Reject Null Hypothesis")
else:
print("Accept Null Hypothesis")

14
REGNO :

OUTPUT:

1.

Enter no of treatements : 3

enter treatements : 13 10 8 11 8
enter treatements : 13 11 14 14
enter treatements : 4 1 3 4 2 4
Treatments are :
[[13, 10, 8, 11, 8], [13, 11, 14, 14], [4, 1, 3, 4, 2, 4]]
The value of G is : 120
[500.0, 676.0, 54.0]
ti^2/n value is : 1230.0
RSS is : 1262
CF is : 960.0
SST is : 302.0
SSTR is : 270.0
SSE is : 32.0
MSSTR is : 135.0
MSSE is : 2.6666666666666665
enter alpha value : 0.05
F calcualted value is : 50.625
F table value is : 3.8852938346523933
Reject Null Hypothesis
2.
Enter no of treatements : 3
enter treatements : 90 82 79 83 91
enter treatements : 105 89 93 104 89 95 86
enter treatements : 83 89 80 94
Treatments are :
[[90, 82, 79, 83, 91], [105, 89, 93, 104, 89, 95, 86], [83, 89, 80, 94]]
The value of G is : 1432
[36125.0, 62417.28571428572, 29929.0]
ti^2/n value is : 128471.28571428571
RSS is : 129034
CF is : 128164.0
SST is : 870.0
SSTR is : 307.2857142857101
SSE is : 562.7142857142899
MSSTR is : 153.64285714285506
MSSE is : 43.285714285714604
enter alpha value : 0.05
F calcualted value is : 3.5495049504949754
F table value is : 3.805565252978057
Accept Null Hypothesis

15
REGNO :

6.Aim:Write a python program to classify the data based on two way Anova.

Source Code:

import scipy.stats
co=int(input("Enter no of treatements : "))
mo=int(input("Enter no of blocks : "))
B=[]
A=B[:][:]
for i in range(co):
l=[int(x) for x in input("enter treatements : ").split(" ")]
A.append(l)
print("Treatments are :")
print(A)

m=[]
for i in range(co):
sum=0
for j in range(len(A[i])):
sum=sum+A[i][j]
m.append(sum)
print(m)

g=0
for i in range(len(m)):
g=g+m[i]
print("The value of G is : ",g)

r=[]
for i in range(co):
sum1=0
k=len(A[i])
for j in range(len(A[i])):
sum1=sum1+(A[i][j])
t=sum1*sum1

16
REGNO :

r.append(t)
print(r)
ti=0
for i in range(len(r)):
ti=ti+r[i]
print("ti^2 value is : ",ti)

r=[]
for i in range(mo):
sum1=0
for j in range(co):
sum1=sum1+(A[j][i])
b=sum1*sum1
r.append(b)
print(r)

bj=0
for i in range(len(r)):
bj=bj+r[i]
print("bj^2 value is : ",bj)

c=0
for i in range(co):
for j in range(len(A[i])):
c=c+1

rss=0
for i in range(co):
for j in range(len(A[i])):
rss=rss+(A[i][j]**2)
print("RSS is : ",rss)

cf=g*g/c

17
REGNO :

print("CF is : ",cf)

sst=rss-cf
print("SST is : ",sst)

sstr=(1/mo)*(ti)-cf
print("SSTR is : ",sstr)

ssb=(1/co)*(bj)-cf
print("SSB is : ",ssb)

sse=sst-sstr-ssb
print("SSE is : ",sse)

msstr=sstr/(co-1)
print("MSSTR is : ",msstr)

mssb=ssb/(mo-1)
print("MSSB is : ",mssb)

msse=sse/((mo-1)*(co-1))
print("MSSE is : ",msse)

alpha=float(input("enter alpha value : "))

f=msstr/msse
f=round(f,5)

if(f>1):
f=msstr/msse
f_tab=scipy.stats.f.ppf(1-alpha, co-1, (mo-1)*(co-1))
elif(f<1):
f=msse/msstr
f_tab=scipy.stats.f.ppf(1-alpha, (mo-1)*(co-1), co-1)

18
REGNO :

print("F calcualted value of treatments is : ",f)


print("F table value of treatments is : ",f_tab)

f1=mssb/msse
if(f1>1):
f1=mssb/msse
f1_tab=scipy.stats.f.ppf(1-alpha, mo-1, (mo-1)*(co-1))
elif(f1<1):
f1=msse/msstr
f1_tab=scipy.stats.f.ppf(1-alpha, (mo-1)*(co-1), mo-1)
print("F calcualted value of blocks is : ",f1)
print("F table value of blocks is : ",f1_tab)

def check(f,f_tab):
if(f>f_tab):
print("Reject Null Hypothesis")
else:
print("Accept Null Hypothesis")
print("For treatments : ")
check(f,f_tab)
print("For Blocks : ")
check(f1,f1_tab)

19
REGNO :

OUTPUT:
1.
Enter no of treatements : 3
Enter no of blocks : 4
enter treatements : 13 7 9 3
enter treatements : 6 6 3 1
enter treatements : 11 5 15 5
Treatments are :
[[13, 7, 9, 3], [6, 6, 3, 1], [11, 5, 15, 5]]
[32, 16, 36]
The value of G is : 84
[1024, 256, 1296]
ti^2 value is : 2576
[900, 324, 729, 81]
bj^2 value is : 2034
RSS is : 786
CF is : 588.0
SST is : 198.0
SSTR is : 56.0
SSB is : 90.0
SSE is : 52.0
MSSTR is : 28.0
MSSB is : 30.0
MSSE is : 8.666666666666666
enter alpha value : 0.05
F calcualted value of treatments is : 3.230769230769231
F table value of treatments is : 5.143252849784718
F calcualted value of blocks is : 3.4615384615384617
F table value of blocks is : 4.757062663089414
For treatments :
Accept Null Hypothesis
For Blocks :
Accept Null Hypothesis

2.
Enter no of treatements : 4
Enter no of blocks : 3
enter treatements : 45 43 51
enter treatements : 47 46 52
enter treatements : 48 50 55
enter treatements : 42 37 49
Treatments are :

20
REGNO :

[[45, 43, 51], [47, 46, 52], [48, 50, 55], [42, 37, 49]]
[139, 145, 153, 128]
The value of G is : 565
[19321, 21025, 23409, 16384]
ti^2 value is : 80139
[33124, 30976, 42849]
bj^2 value is : 106949
RSS is : 26867
CF is : 26602.083333333332
SST is : 264.9166666666679
SSTR is : 110.91666666666788
SSB is : 135.16666666666788
SSE is : 18.83333333333212
MSSTR is : 36.972222222222626
MSSB is : 67.58333333333394
MSSE is : 3.138888888888687
enter alpha value : 0.01
F calcualted value of treatments is : 11.77876106194779
F table value of treatments is : 9.779538240923273
F calcualted value of blocks is : 21.530973451329015
F table value of blocks is : 10.92476650083833
For treatments :
Reject Null Hypothesis
For Blocks :
Reject Null Hypothesis

21
REGNO :

Matrix Transpose,Multiplication,Inverse

#matrix transpose
def transpose(A):
r=len(A)
c=len(A[0])
s=[]
for i in range(0,c):
n=[]
for j in range(0,r):
n.append(A[j][i])
s.append(n)
return s

#matrix multiplication
def multiplication(A,B):
res=[]
r1=len(A)
c2=len(B[0])
for i in range(r1):
k=[]
for j in range(c2):
k.append(0)
res.append(k)

for i in range(len(A)):
for j in range(len(B[0])):
for k in range(len(B)):
res[i][j]=res[i][j]+A[i][k]*B[k][j]
return res

#matrix inverse
def getMinor(m,i,j):
return [row[:j] + row[j+1:] for row in (m[:i]+m[i+1:])]

22
REGNO :

def getDeternminant(m):
#base case for 2x2 matrix
if len(m) == 2:
return m[0][0]*m[1][1]-m[0][1]*m[1][0]

determinant = 0
for c in range(len(m)):
determinant += ((-1)**c)*m[0][c]*getDeternminant(getMinor(m,0,c))
return determinant

def inverse(m):
determinant = getDeternminant(m)
#special case for 2x2 matrix:
if len(m) == 2:
return [[m[1][1]/determinant, -1*m[0][1]/determinant],
[-1*m[1][0]/determinant, m[0][0]/determinant]]

#find matrix of cofactors


cofactors = []
for r in range(len(m)):
cofactorRow = []
for c in range(len(m)):
minor = getMinor(m,r,c)
cofactorRow.append(((-1)**(r+c)) * getDeternminant(minor))
cofactors.append(cofactorRow)
cofactors = transpose(cofactors)
for r in range(len(cofactors)):
for c in range(len(cofactors)):
cofactors[r][c] = cofactors[r][c]/determinant
return cofactors

23
REGNO :

7.Aim:Write a python program to fit a multiple regression model for any given
data.

Source code:

#importing libraries
import numpy as np
import pandas as pd
from statistics import mean
import scipy.stats as stats

#reading input
n=int(input("enter no of independent variables"))
a=[]
print("enter x values : ")
for i in range(n):
l=[int(x) for x in input().split(" ")]
a.append(l)
p=[]
for i in range(len(l)):
p.append(1)
a.insert(0,p)
X=transpose(a)

b=[]
print("enter y values : ")
q=[int(x) for x in input().split(" ")]
b.append(q)
Y=transpose(b)

#step1
Xt=transpose(X)
s1=multiplication(Xt,X)
#step2
s2=inverse(s1)

24
REGNO :

#step3
s3=multiplication(Xt,Y)
#step4
s4=multiplication(s2,s3)

Xt=np.array(Xt)

#print regression model is:


b0=s4[0][0]
b1=s4[1][0]
b2=s4[2][0]
y_hat=b0+Xt[1]*b1+Xt[2]*b2
print("regression model is :")
print("y={}+{}x1+{}x2".format(b0,b1,b2))

y=transpose(Y)
df=pd.DataFrame()
df['y']=y[0]
df['y_hat']=y_hat

#sse
sse=sum((df['y']-df['y_hat'])**2)
#sst
ybar=mean(b[0])
sst=sum((df['y']-ybar)**2)
#ssr
ssr=sst-sse

#coefficient of determination
r2=ssr/sst
print("r2 value is :",r2)
if(r2>0.90):
print("Model is Good fit")
else:

25
REGNO :

print("Model is not good fit")

n1=n+1
n2=len(df['y'])

#anova
mssr=ssr/(n1-1)
msse=sse/(n2-n1)
fcal=mssr/msse
print('calculated value of f :',fcal)

alpha=float(input("enter alpha:"))
ftab=stats.f.ppf(1-alpha,n1-1,n2-n1)
print('table value of f :',ftab)

if(fcal>ftab):
print("Accept model")
else:
print("Reject model")

#test of individual parameters


from math import sqrt
import scipy
p=[]
for i in range(len(s2)):
for j in range(len(s2[i])):
if(i==j):
p.append(s2[i][j])

b=[b0,b1,b2]
t=[]
for i in range(len(p)):
t.append(b[i]/sqrt(msse*p[i]))
print("calculated values of t are : ")

26
REGNO :

print(t)

t_tab=scipy.stats.t.ppf(1-alpha/2,n2-n1)
print("table value is of t is ",t_tab)
for i in range(len(t)):
if(abs(t[i])>t_tab):
print("b{} is contibuting to the model".format(i))
else:
print("weak variable is b{}".format(i))

OUTPUT:
1.
enter no of independent variables2
enter x values :
9 8 7 14 12 10 7 4 6 5 7 6
62 58 64 60 63 57 55 56 59 61 57 60
enter y values :
100 110 105 94 95 99 104 108 105 98 105 110
regression model is :
y=133.46048242804682+-1.2485034569591846x1+-0.3510083718055057x2
r2 value is : 0.5415279145486218
Model is not good fit
calculated value of f : 5.315210440935849
enter alpha:0.05
table value of f : 4.25649472909375
Accept model
calculated values of t are :
[5.088199707759992, -2.807937063390396, -0.7711670682671642]
table value is of t is 2.2621571627409915
b0 is contibuting to the model
b1 is contibuting to the model
weak variable is b2
2.

enter no of independent variables2


enter x values :
-5 -4 -1 2 2 3 3
5 4 1 -3 -2 -2 -3
enter y values :

27
REGNO :

11 11 8 2 5 5 4
regression model is :
y=6.571428571428571+1.000000000000007x1+2.0x2
r2 value is : 0.9767441860465117
Model is Good fit
calculated value of f : 84.0
enter alpha:0.05
table value of f : 6.944271909999155
Accept model
calculated values of t are :
[26.558112382722783, 2.1522901619383332, 4.304580323876635]
table value is of t is 2.7764451051977987
b0 is contibuting to the model
weak variable is b1
b2 is contibuting to the model

28
REGNO :

8.Aim:Write a python program to fit a multivariate regression model for any


given data.

Source code:

#importing libraries

import numpy as np
import pandas as pd
from statistics import mean
import scipy.stats as stats

#reading input

n1=int(input("enter no of independent variables"))


n2=int(input("enter no of dependent variables"))

a=[]
print("enter x values : ")
for i in range(n1):
l=[float(x) for x in input().split(" ")]
a.append(l)
p=[]
for i in range(len(l)):
p.append(1)
a.insert(0,p)
X=transpose(a)

b=[]
print("enter y values : ")
for i in range(n2):
q=[float(x) for x in input().split(" ")]
b.append(q)

29
REGNO :

Y=transpose(b)

#step1
Xt=transpose(X)
s1=multiplication(Xt,X)
#step2
s2=inverse(s1)
#step3
s3=multiplication(Xt,Y)
#step4
s4=multiplication(s2,s3)

Xt=np.array(Xt)
k=transpose(s4)
b=[]
for i in range(len(k)):
b.append(k[i])

#regression equations are :


y_hat=[]
for i in range(len(k)):
p=0
print("y{}={}+{}x1+{}x2+{}x3".format(i+1,b[i][p],b[i][p+1],b[i][p+2],b[i][p+3]))
y_hat.append(b[i][p]+Xt[1]*b[i][p+1]+Xt[2]*b[i][p+2]+Xt[3]*b[i][p+3])

y=transpose(Y)

df=pd.DataFrame()
df['y1']=y[0]
df['y_hat1']=y_hat[0]
df['y2']=y[1]
df['y_hat2']=y_hat[1]

30
REGNO :

#sse
sse1=sum((df['y1']-df['y_hat1'])**2)
sse2=sum((df['y2']-df['y_hat2'])**2)
#sst
ybar1=mean(y[0])
sst1=sum((df['y1']-ybar1)**2)
ybar2=mean(y[1])
sst2=sum((df['y2']-ybar2)**2)
#ssr
ssr1=sst1-sse1
ssr2=sst2-sse2
sse=[sse1,sse2]
ssr=[ssr1,ssr2]
sst=[sst1,sst2]
r2=[]

#coefficient of determination
for i in range(len(sse)):
r2.append(ssr[i]/sst[i])
print("r2 value of y{} is {}:".format(i+1,r2[i]))
if(r2[i]>0.90):
print("for y{} Model is Good fit".format(i+1))
else:
print("for y{} Model is not good fit".format(i+1))

p1=n1+1
n11=len(df['y1'])
p2=n1+1
n12=len(df['y2'])

p=[p1,p2]
nd=[n11,n12]

mssr=[]

31
REGNO :

msse=[]
fcal=[]
ftab=[]

#anova
for i in range(len(ssr)):
mssr.append(ssr[i]/(p[i]-1))
msse.append(sse[i]/(nd[i]-p[i]))
fcal.append(mssr[i]/msse[i])
print('calculated value of f for y{} is : {}'.format(i+1,fcal[i]))
alpha=float(input("enter alpha:"))
ftab.append(stats.f.ppf(1-alpha,p[i]-1,nd[i]-p[i]))
print('table value of f for y{} is : {}'.format(i+1,ftab[i]))
if(fcal[i]>ftab[i]):
print("Accept model")
else:
print("Reject model")

#test of individual parameters


from math import sqrt
import scipy
ph=[]
for i in range(len(s2)):
for j in range(len(s2[i])):
if(i==j):
ph.append(s2[i][j])

b1=b[0]
b2=b[1]
bh=[b1,b2]
t=[]
t_tab=[]
#for y1
for i in range(len(bh)):

32
REGNO :

th=[]
for j in range(len(ph)):
th.append(bh[i][j]/sqrt(msse[i]*ph[j]))
t.append(th)

for i in range(len(t)):
print("calculated values of t for y{} are : ".format(i+1))
print(t[i])
t_tab.append(scipy.stats.t.ppf(1-alpha/2,nd[i]-p[i]))
print("table value is of t for y{} is : {}".format(i+1,t_tab[i]))
for j in range(len(t[i])):
if(abs(t[i][j])>t_tab[i]):
print("b{} of y{} is contibuting to the model".format(j,i+1))
else:
print("weak variable in y{} is b{}".format(i+1,j))

OUTPUT:
1.
enter no of independent variables3
enter no of dependent variables2
enter x values :
9 8 7 14 12 10 7 4 6 5 7 6
62 58 64 60 63 57 55 56 59 61 57 60
1.0 1.3 1.2 0.8 0.8 0.9 1.0 1.2 1.1 1.0 1.2 1.2
enter y values :
10 12 11 9 9 10 11 12 11 10 11 12
100 110 105 94 95 99 104 108 105 98 103 110
y1=10.896995241634386+-0.04494028834971431x1+-
0.08770358706372366x2+5.035459723009581x3
y2=91.09719894388036+-0.06400723236924932x1+-
0.29437367205051923x2+27.83530348356726x3
r2 value of y1 is 0.9237965551047813:
for y1 Model is Good fit
r2 value of y2 is 0.8655098079552557:
for y2 Model is not good fit
calculated value of f for y1 is : 32.327376848470806
enter alpha:0.05
table value of f for y1 is : 4.06618055135116
Accept model

33
REGNO :

calculated value of f for y2 is : 17.161297187972476


enter alpha:0.05
table value of f for y2 is : 4.06618055135116
Accept model
calculated values of t for y1 are :
[4.237351625030057, -0.828345586389068, -2.2752363651035346, 5.461805993908311]
table value is of t for y1 is : 2.3060041350333704
b0 of y1 is contibuting to the model
weak variable in y1 is b1
weak variable in y1 is b2
b3 of y1 is contibuting to the model
calculated values of t for y2 are :
[5.264776488952902, -0.17534441149957578, -1.1349985359754569, 4.487250098529196]
table value is of t for y2 is : 2.3060041350333704
b0 of y2 is contibuting to the model
weak variable in y2 is b1
weak variable in y2 is b2
b3 of y2 is contibuting to the model

34
REGNO :

9.Aim:Write a python program to classify the treatments based on MANOVA


Test.

Source Code:

import numpy as np
n1=int(input("enter no of treatements"))
p=[]
for i in range(n1):
p.append(int(input("enter no of subgroups in treatements")))

a=[]
for i in range(len(p)):
b=[]
for j in range(p[i]):
q=[int(x) for x in input().split(" ")]
b.append(q)
a.append(b)
print(a)

yii=[]
ybar=[]
sum2=0
sum3=0
k=0
for i in range(n1):
sum=0
sum1=0
yi=[]
for j in range(p[i]):
sum=sum+a[i][j][0]
sum1=sum1+a[i][j][1]
sum2=sum2+a[i][j][0]
sum3=sum3+a[i][j][1]

35
REGNO :

k=k+p[i]
sum=sum/p[i]
sum1=sum1/p[i]
yi.append(sum)
yi.append(sum1)
yii.append(yi)
sum2=sum2/k
sum3=sum3/k
ybar.append(sum2)
ybar.append(sum3)
print(yii)
print(ybar)

#y1 and y2 and cross product of y1,y2


sse1=sst1=sse2=sst2=sse12=sst12=0

for i in range(n1):
for j in range(p[i]):
sse1=sse1+((a[i][j][0]-yii[i][0])**2)
sst1=sst1+((a[i][j][0]-ybar[0])**2)
sse2=sse2+((a[i][j][1]-yii[i][1])**2)
sst2=sst2+((a[i][j][1]-ybar[1])**2)
sse12=sse12+((a[i][j][0]*a[i][j][1])-(yii[i][0]*yii[i][1]))
sst12=sst12+((a[i][j][0]*a[i][j][1])-(ybar[0]*ybar[1]))

print('for y1 ')
print('sse : ',sse1)
print('sst : ',sst1)
ssr1=sst1-sse1
print('ssr : ',ssr1)

print('for y2 ')
print('sse : ',sse2)

36
REGNO :

print('sst : ',sst2)
ssr2=sst2-sse2
print('ssr : ',ssr2)

print('cross product values of y1 and y2 ')


print('sse : ',sse12)
print('sst : ',sst12)
ssr12=sst12-sse12
print('ssr : ',ssr12)

w1=[sse1,sse12,sse12,sse2]
t1=[sst1,sst12,sst12,sst2]
w=(w1[0]*w1[3])-(w1[1]*w1[2])
t=(t1[0]*t1[3])-(t1[1]*t1[2])
delta=w/t

#f-test
from math import sqrt
f=((k-n1-1)/(n1-1))*((1-sqrt(delta))/sqrt(delta))
print('f calculated value is :',f)

import scipy.stats as stats


p=len(a[0][0])
ftab=stats.f.ppf(1-0.05,p*(n1-1),p*(k-n1-1))
print('f table value is :',ftab)

if(f>ftab):
print("Reject Null Hypothesis")
else:
print("Accept Null Hypothesis")

37
REGNO :

OUTPUT:
1.
enter no of treatements3
enter no of subgroups in treatements3
enter no of subgroups in treatements2
enter no of subgroups in treatements3
93
62
97
04
20
38
19
27
[[[9, 3], [6, 2], [9, 7]], [[0, 4], [2, 0]], [[3, 8], [1, 9], [2, 7]]]
[[8.0, 4.0], [1.0, 2.0], [2.0, 8.0]]
[4.0, 5.0]
for y1
sse : 10.0
sst : 88.0
ssr : 78.0
for y2
sse : 24.0
sst : 72.0
ssr : 48.0
cross product values of y1 and y2
sse : 1.0
sst : -11.0
ssr : -12.0
f calculated value is : 8.198859563778374
f table value is : 3.837853354555897
Reject Null Hypothesis

2.
enter no of treatements3
enter no of subgroups in treatements4
enter no of subgroups in treatements3
enter no of subgroups in treatements5
23
34
54
25

38
REGNO :

48
56
67
76
87
10 8
95
76
[[[2, 3], [3, 4], [5, 4], [2, 5]], [[4, 8], [5, 6], [6, 7]], [[7, 6], [8, 7], [10, 8], [9, 5], [7, 6]]]
[[3.0, 4.0], [5.0, 7.0], [8.2, 6.4]]
[5.666666666666667, 5.75]
for y1
sse : 14.799999999999997
sst : 76.66666666666667
ssr : 61.866666666666674
for y2
sse : 9.2
sst : 28.25
ssr : 19.05
cross product values of y1 and y2
sse : 1.6000000000000156
sst : 25.999999999999943
ssr : 24.399999999999928
f calculated value is : 9.357513005519227
f table value is : 3.0069172799243438
Reject Null Hypothesis

39
REGNO :

10.Aim:Write a python program to classify the given observations using Linear


Discriminant Analysis.

Source Code:

#reading inputs
import numpy as np
n=int(input("enter no of independent variables : "))
a=[]
print("enter independent variables")
for i in range(n):
l=[float(x) for x in input().split(" ")]
a.append(l)

print("enter dependent variables")


Y=[x for x in input().split(" ")]
X=transpose(a)
print(X)

x1=[]
x2=[]
for i in range(len(X)):
if(Y[i]=='yes'):
x1.append(X[i])
else:
x2.append(X[i])

#calculating means
mu=np.mean(X,axis=0)
mu1=np.mean(x1,axis=0)
mu2=np.mean(x2,axis=0)

#calculation of Inverse
mul=(multiplication((transpose(X-mu)),X-mu))
ni=len(X)

40
REGNO :

c=[]
for i in range(len(mul)):
o=[]
for j in range(len(mul[0])):
o.append(mul[i][j]/ni)
c.append(o)
print('pooled covariance(c) matrix is : ')
print(c)

cinv=inverse(c)
print('c inverse is : ')
print(cinv)

#calculation of Fishers Linear Discriminant Function


mui=[[mu1],[mu2]]
n1=len(x1)
n2=len(x2)
nii=[n1,n2]
xk=[float(x) for x in input("enter observation to be classified :").split(" ")]
f=[]
for i in range(n):
a=multiplication((multiplication(mui[i],cinv)),transpose(mui[i]))
for j in range(len(a)):
a=a[j][0]/2
b=multiplication((multiplication(mui[i],cinv)),transpose([xk]))
c=np.log(nii[i]/ni)
f.append(b-a+c)
print(f)

#classification into groups


for i in range(1,len(f)):
t=f[0]
if(f[i]>t):
print('classify new observation into',i+1,'population')

41
REGNO :

else:
print('classify new observation into 1st population')

OUTPUT:
1.
enter no of independent variables : 2
enter independent variables
2.95 2.53 3.57 3.16 2.58 2.16 3.27
6.63 7.79 5.65 5.47 4.46 6.22 3.52
enter dependent variables
yes yes yes yes no no no
[[2.95, 6.63], [2.53, 7.79], [3.57, 5.65], [3.16, 5.47], [2.58, 4.46], [2.16, 6.22], [3.27, 3.52]]
pooled covariance(c) matrix is :
[[0.2059836734693877, -0.23093265306122449], [-0.23093265306122449,
1.6921632653061225]]
c inverse is :
[[5.731714487577605, 0.7822176856949431], [0.7822176856949431, 0.6977102207778748]]
enter observation to be classified :2.81 5.46
[array([[43.82818099]]), array([[43.86302018]])]
classify new observation into 2 population

2.
enter no of independent variables : 2
enter independent variables
4 2 2 3 4 9 6 9 8 10
2 4 3 6 4 10 8 5 7 8
enter dependent variables
yes yes yes yes yes no no no no no
[[4.0, 2.0], [2.0, 4.0], [2.0, 3.0], [3.0, 6.0], [4.0, 4.0], [9.0, 10.0], [6.0, 8.0], [9.0, 5.0], [8.0, 7.0],
[10.0, 8.0]]
pooled covariance(c) matrix is :
[[8.610000000000001, 5.01], [5.01, 5.8100000000000005]]
c inverse is :
[[0.23310865029690248, -0.20101107366393825], [-0.20101107366393825,
0.34545016851227717]]
enter observation to be classified :5 6
[array([[1.99072379]]), array([[1.7124378]])]
classify new observation into 1st population

42
REGNO :

11.Aim:Write a python program to find Principle components for the given


variables.

Source Code:

import numpy as np
import numpy.linalg as linalg
import math

#reading inputs

a=[]
n=int(input("enter no of variables"))
for i in range(n):
q=[float(x) for x in input().split(" ")]
a.append(q)
X=np.transpose(a)

#calculating mean
print('\nGven Matrix is :\n{}'.format(X))
print('\nMean Values {}'.format(np.mean(X,axis=0)))

A = (X - np.mean(X, axis=0))
print('\nStandardized Matrix is :\n {}'.format(A))

#variance covariance matrix


VarCov=np.dot(np.transpose(A),A)/len(X)
print('\nVariance Covariance Matrix is :\n {}'.format(VarCov))

#calculating eigen values and vectors


eigenValues, eigenVectors = linalg.eig(VarCov)

idx = eigenValues.argsort()[::-1]
eigenValues = eigenValues[idx]

43
REGNO :

eigenVectors = eigenVectors[:,idx]

print('\nEigen Values are {}'.format(eigenValues))


print('\nEigen Vectors are {}'.format(eigenVectors))

#no of principal components to be retained


k=int(input("enter threshold limit :"))
s=sum(eigenValues)
t=[]
stoppoint=1
for i in range(len(eigenValues)):
numerator=sum(eigenValues[0:i+1])
z=(numerator/s)*100
if z<=k:
stoppoint=stoppoint+1
t.append(z)
print('\nThreshold Table :\n{}'.format(t))

eigenValues=eigenValues[0:stoppoint]
eigenVectors=eigenVectors[:,0:stoppoint]

#retaining eigen values and vectors


print('\nRetained Eigen Values :\n{}'.format(eigenValues))
print('\nRetained Eigen Vectors: \n{}'.format(eigenVectors))

#printing z1 and z2
print('\n PCA Matrix :\n{}'.format(np.dot(X,eigenVectors)))

44
REGNO :

OUTPUT:
1.
enter no of variables3
90 90 60 60 30
60 90 60 60 30
90 30 60 90 30

Gven Matrix is :
[[90. 60. 90.]
[90. 90. 30.]
[60. 60. 60.]
[60. 60. 90.]
[30. 30. 30.]]

Mean Values [66. 60. 60.]

Standardized Matrix is :
[[ 24. 0. 30.]
[ 24. 30. -30.]
[ -6. 0. 0.]
[ -6. 0. 30.]
[-36. -30. -30.]]

Variance Covariance Matrix is :


[[504. 360. 180.]
[360. 360. 0.]
[180. 0. 720.]]

Eigen Values are [910.06995304 629.11038668 44.81966028]

Eigen Vectors are [[-0.65580225 -0.3859988 0.6487899 ]


[-0.4291978 -0.51636642 -0.74104991]
[-0.62105769 0.7644414 -0.17296443]]
enter threshold limit :95

Threshold Table :
[57.453911176833095, 97.17047599225765, 100.0]

Retained Eigen Values :


[910.06995304 629.11038668]

Retained Eigen Vectors:


[[-0.65580225 -0.3859988 ]
[-0.4291978 -0.51636642]
[-0.62105769 0.7644414 ]]

PCA Matrix :

45
REGNO :

[[-140.6692628 3.07784927]
[-116.28173533 -58.27962721]
[-102.36346447 -8.27542884]
[-120.99519515 14.65781313]
[ -51.18173223 -4.13771442]]

2.

enter no of variables2
2 1 0 -1
4 3 1 0.5

Gven Matrix is :
[[ 2. 4. ]
[ 1. 3. ]
[ 0. 1. ]
[-1. 0.5]]

Mean Values [0.5 2.125]

Standardized Matrix is :
[[ 1.5 1.875]
[ 0.5 0.875]
[-0.5 -1.125]
[-1.5 -1.625]]

Variance Covariance Matrix is :


[[1.25 1.5625 ]
[1.5625 2.046875]]

Eigen Values are [3.26093826 0.03593674]

Eigen Vectors are [[-0.6135581 -0.78964958]


[-0.78964958 0.6135581 ]]
enter threshold limit :99

Threshold Table :
[98.90997556853128, 100.0]

Retained Eigen Values :


[3.26093826 0.03593674]

Retained Eigen Vectors:


[[-0.6135581 -0.78964958]
[-0.78964958 0.6135581 ]]

46
REGNO :

PCA Matrix :
[[-4.38571451 0.87493326]
[-2.98250683 1.05102473]
[-0.78964958 0.6135581 ]
[ 0.21873332 1.09642863]]

47
REGNO :

12.Aim:Write a python program to group the given variables using Factor


Analysis.

Source Code:

import numpy as np
n=int(input("enter no of variables"))
x=[]
for i in range(n):
p=[int(x) for x in input().split(" ")]
x.append(p)
print(x)

mu=[]
for i in range(n):
mu.append(np.mean(x[i],axis=0))
print(“mean is : ”)
print(mu)

from math import sqrt


from statistics import mean
n1=len(x[0])
si=[]
x=np.array(x)
for i in range(n):
k=sum((x[i]-mu[i])**2)
si.append(round(sqrt(k/(n1-1)),4))
print(“variance is :” )
print(si)

a=[]
for i in range(n):
a.append((x[i]-mu[i])/si[i])
print(a)
A=np.transpose(a)

48
REGNO :

#variance covariance matrix


VarCov=np.dot(np.transpose(A),A)/n1
print('\nVariance Covariance Matrix is :\n {}'.format(VarCov))

#calculating eigen values and vectors


eigenValues, eigenVectors = np.linalg.eig(VarCov)

idx = eigenValues.argsort()[::-1]
eigenValues = eigenValues[idx]
eigenVectors = eigenVectors[:,idx]

print('\nEigen Values are {}'.format(eigenValues))


print('\nEigen Vectors are {}'.format(eigenVectors))

#no of principal components to be retained


k=int(input("enter threshold limit :"))
s=sum(eigenValues)
t=[]
stoppoint=1
for i in range(len(eigenValues)):
numerator=sum(eigenValues[0:i+1])
z=(numerator/s)*100
if z<=k:
stoppoint=stoppoint+1
t.append(z)
print('\nThreshold Table :\n{}'.format(t))

eigenValues=eigenValues[0:stoppoint]
eigenVectors=eigenVectors[:,0:stoppoint]

#retaining eigen values and vectors


print('\nRetained Eigen Values :\n{}'.format(eigenValues))

49
REGNO :

print('\nRetained Eigen Vectors: \n{}'.format(eigenVectors))

egv=np.transpose(eigenVectors)
f=[]

for i in range(len(egv)):
o=[]
for j in range(n):
k=sqrt(eigenValues[i])*egv[i][j]
o.append(k)
f.append(o)
print(“F1 and f2 values are :”)
print(f)

h=[]
for i in range(len(f[0])):
h.append(f[0][i]**2+f[1][i]**2)
print(“h2 values are : ”)
print(h)

sumh=sum(h)
print(sumh)

pve=[]
for i in range(len(eigenValues)):
pve.append((eigenValues[i]/sumh)*100)
print(“percentages are : \n”)
print(pve)

OUTPUT:

enter no of variables3
3 7 10 3 10
63996
53875

50
REGNO :

[[3, 7, 10, 3, 10], [6, 3, 9, 9, 6], [5, 3, 8, 7, 5]]

Mean is :
[6.6, 6.6, 5.6]

Variance is :
[3.5071, 2.51, 1.9494]

[array([-1.02648912, 0.11405435, 0.96946195, -1.02648912, 0.96946195]), array([-


0.23904382, -1.43426295, 0.9561753 , 0.9561753 , -0.23904382]), array([-0.30778701, -
1.33374372, 1.23114805, 0.71816969, -0.30778701])]

Variance Covariance Matrix is :


[[ 0.80001623 -0.04089598 0.06435815]
[-0.04089598 0.7999873 0.78479557]
[ 0.06435815 0.78479557 0.79996624]]

Eigen Values are [1.58512447 0.80666111 0.0081842 ]

Eigen Vectors are [[ 0.02122114 -0.99538368 -0.09360014]


[ 0.70624057 0.0811911 -0.70330098]
[ 0.70765382 -0.05117936 0.7047033 ]]
enter threshold limit :99

Threshold Table :
[66.04768471804098, 99.65898746868129, 100.0]

Retained Eigen Values :


[1.58512447 0.80666111]

Retained Eigen Vectors:


[[ 0.02122114 -0.99538368]
[ 0.70624057 0.0811911 ]
[ 0.70765382 -0.05117936]]

F1 and f2 values are :


[[0.026717780654088277, 0.8891690658976457, 0.8909483734282817], [-
0.8939970315309378, 0.07292122996398519, -0.04596639553844041]]

h2 values are :
[0.7999445321892085, 0.7959391335287521, 0.7959019136332973]

2.3917855793512577

51
REGNO :

Percentages are :
[66.2736862932679, 33.72631370673212]

52

You might also like