Fdsa Lab Manual
Fdsa Lab Manual
AND
TECHNOLOGY
Sankarapuram, Near Walajabad, KancheepuramDist, Pin: 631605
DEPARTMENT
OF
ARTIFICIAL INTELLIGENCE AND DATA SCIENCE
STUDENT NAME :
REGISTER NO :
1
Sankarapuram, Near Walajabad, KancheepuramDist, Pin: 631605
DEPARTMENT OF
Department of _____________________________________________________
2
TABLE OF CONTENTS
PAGE
EX.NO DATE LISTOFEXPERIMENTS SIGN
NO
Download and Install the NumpyScipyJupyter
1
Packages and Pandas
Working with Numpy Arrays
2
7 Regression
8 Z-Test
9 T-Test
10 ANOVA
3
Ex. No: 1
DOWNLOAD AND INSTALL THE NUMPY SCIPY
Date:
JUPYTER PACKAGES AND PANDAS
Download install and Explore one features of numpy, scepy, jupyter, stats model, pandas
package.
NUMPY:
Install:
1. Press comment + space bar to open spothight search. Type in terminal and press enter.
3. Once the package is install successfully, type python to get into python prompt. Notice the python
version is displayed too.
Description:
Numpy is a python library used for working with arrays. It also has function for working in domain
od linear algebra, farrier transform and matrices. Numpy was created in 2005 by travisoliphant. It is
an open source project and your can use it freely numpy strands for numerical python.
Command:
SCIPY
Description:Scipy is a scientific computation library that uses numpy underneath. Scipy stands for
scientific python. It provides more utility functions for optimization, stats and signed processing.
Like numpy, scipy is open source so we can use it freely.
Command:
4
JUPYTER
Description:
The pupyter notebook is an open source web application that you use to wide and share
documents that contain live code, equation visualization.
Command:
PANDA PACKAGE
Description:
Pandas is a python package that provides fast, flexible and expressive data structures designed to
make making with “relation” or “labelled” data path.
Command:
SATE MODELS
Description:
State models is a python module that provide classes and functions for the estination of many
different statical models as well as conducting statical.
Command:
5
RESULT
Thus the install, download and explore the features of numpy, scipy, jupyter, stats models and
pandas package are done.
6
Ex. No:2
W0RKING WITH NUMPY ARRAY, CREATION OF NUMPY
Date:
ARRAY
AIM
PROCEDURE
7
PROGRAM
import numpy as np
arr=np.array([1,2,3])
arr=np.array([1,2,3],[4,5,6])
arr=np.array([1,3,2])
OUTPUT:
[1 2 3]
[[1 2 3]
[4 5 6]]
[1 3 2]
8
ACCEESSING THE ARRAY INDEX
PROGRAM:-
import numpy as np
arr=np.array ([[-1,2,,0,4],[4,-0.5,6,0],[2.6,0,7,6],[3,-7,4,2.0]])
print(“Initial Array: ”)
print(arr)
sliced_arr=arr[:2,::2]
index-arr=arr[[1,1,0,3],[3,2,1,0]]
OUTPUT:
Initial Array:
[[-1. 2. 0. 4. ]
[ 4. -0.5 6. 0. ]
[ 2.6 0. 7. 6. ]
[ 3. -7. 4. 2. ]]
[[-1. 0.]
[ 4. 6.]]
[0. 6. 2. 3.]
9
BASIC ARRAY OPERATION
import numpy as np
a=np.array([[1,2],[3,4]])
b=np.array([[4,3],[2,1]])
OUTPUT:
Adding 1 to every element: [[2 3]
[4 5]]
[ 0 -1]]
Array sum:
[[5 5]
[5 5]]
10
RESULT
11
Ex. No: 3
WORKING WITH PANDAS
Date:
AIM
PROCEDURE
12
PROGRAM
import pandas as pd
raw_data={‘F_name’:[‘A’,’B’,’C’],
‘L_name’:[‘X’,’W’,’Y’],
‘R.No’:[10,11,15],
‘mark’:[50,55,60]}
df=pd.DataFrame(raw_data,columns=[‘F_name’,’L_name’,’R.No’,’mark’])
print(df)
list=[‘DataFrame’,’is’,’created’,’successfully’]
df=pd.DataFrame(list)
print(df)
DataFrame as a dictionary
import pandas as pd
dict1 ={‘one’:1,’two’:2,’three’:3,’four’:4,’five’:5}
dict2 ={‘one’:’excellent’,’two’:’very
good’,’three’:’good’,’four’:’moderable’,’five’:’satisfactory’}
numbers1=pd.Series(dict1)
numbers2=pd.Series(dict2)
df=pd.DataFrame({‘numbers’:numbers1,’grade’:numbers2})
print(df)
13
DataSelection in series
import pandas as pd
data=pd.Series([0.25,0.5,0.75,1.0],index=[‘a’,’b’,’c’,’d’])
print(data)
print(data[‘b’])
print(‘a’ in data)
print(‘f’ in data)
print(data.keys())
data[‘e’]=1.25
print(data)
print(data[0:2])
.iloc()
import pandas as pd
import numpy as np
df=py.DataFrame(np.random.randiant(5,3),columns=[‘A’,’B’,’C’])
print(‘DataFrame:\n’,df)
print(‘Selected:\n’,df.iloc[5,3])
print(‘Selected:\n’,df.iloc[4])
import pandas as pd
import numpy as np
A=np.random.randiant(10,size(3,4))
14
print(‘A:\n’A)
print(‘num;A-A[0]:\n’,A-A[0])
df=pd.DataFrame(A,columns=list(‘ABCD’))
df=df.iloc[0]
Concatenation
import pandas as pd
ser 1=pd.Series([‘A’,’B’,’C’],index=[1,2,3])
ser 2=pd.Series([‘X’,’Y’,’Z’],index=[4,5,6])
df1=pd.DataFrame({‘course’:[‘Numpy’,’Pandas’],’Fee’:[2000,5000]})
df2=pd.DataFrame({‘course’:[‘C’,’Java’],’Fee’:[3000,3000]})
data=[df1,df2]
df=pd.concat(data)
print(df1,’\n’,df2)
print(‘\n concatenated:\n’,df)
15
OUTPUT:
F_nameL_nameR.No mark
0 A X 10 50
1 B W 11 55
2 C Y 15 60
0 DataFrame
1 is
2 created
3 successfully
numbers grade
one 1 excellent
three 3 good
four 4 moderable
five 5 satisfactory
a 0.25
b 0.50
c 0.75
d 1.00
dtype: float64
0.5
16
True
False
a 0.25
b 0.50
c 0.75
d 1.00
e 1.25
dtype: float64
a 0.25
b 0.50
dtype: float64
DataFrame:
A B C
17
RESULT
Thus the program was created and executed successfully.
18
Ex. No: 4
BASIC PLOTS USING MATPLOTLIB
Date:
AIM
PROCEDURE
19
PROGRAM:~
import matplotlib.pyplot as plt
a = [1, 2, 3, 4, 5]
b = [0, 0.6, 0.2, 15, 10, 8, 16, 21]
plt.plot(a)
# o is for circles and r is
# for red
plt.plot(b, "or")
plt.plot(list(range(0, 22, 3)))
# naming the x-axis
plt.xlabel('Day ->')
# naming the y-axis
plt.ylabel('Temp ->')
c = [4, 2, 6, 8, 3, 20, 13, 15]
plt.plot(c, label = '4th Rep')
# get current axes command
ax = plt.gca()
# get command over the individual
# boundary line of the graph body
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
20
# set the range or the bounds of
# the left boundary line to fixed range
ax.spines['left'].set_bounds(-3, 40)
# set the interval by which
# the x-axis set the marks
plt.xticks(list(range(-3, 10)))
# set the intervals by which y-axis
# set the marks
plt.yticks(list(range(-3, 20, 3)))
# legend denotes that what color
# signifies what
ax.legend(['1st Rep', '2nd Rep', '3rd Rep', '4th Rep'])
# annotate command helps to write
# ON THE GRAPH any text xy denotes
# the position on the graph
plt.annotate('Temperature V / s Days', xy = (1.01, -2.15))
# gives a title to the Graph
plt.title('All Features Discussed')
plt.show()
21
OUTPUT:
RESULT
Thus the program was created and executed successfully.
22
Ex. No: 5 FREQUENCY DISTRIBUTION,
AVERAGES,VARIABILITY
Date:
AIM
PROCEDURE
23
PROGRAM:~
PYTHON PROGRAM TO GET AVERAGE OF A LIST
import numpy as np
li_st1=[2,40,2,502,177,7,9]
print('Average of List',np.average(li_st1))
OUTPUT:
Average of List 105.57142857142857
li_st1=[2,40,2,502,177,7,9]
print('variance of list',np.var(li_st1))
OUTPUT:
variance of list 29579.10204081633
24
PYTHON PROGRAM TO GET STANDARD DEVIATION OF A LIST
import numpy as np
li_st1=[2,40,2,502,177,7,9]
print('standard deviation of list',np.std(li_st1))
OUTPUT:
standard deviation of list 171.98576115718512
RESULT:
Thus the program was created and executed successfully.
25
Ex. No: 6
NORMAL CURVES,CORRELATION AND SCATTER
Date:
PLOTS,CORRELATION COEFFICIENT
AIM
PROCEDURE
26
PROGRAM
NORMAL CURVES:
import numpy as np
# demonstration.
# the data:
plt.title(title)
plt.show()
27
OUTPUT:
PROGRAM
CORRELATION COEFFICIENT
# Python Program to find correlation coefficient.
import math
defcorrelationCoefficient(X, Y, n) :
sum_X = 0
sum_Y = 0
28
sum_XY = 0
squareSum_X = 0
squareSum_Y = 0
i=0
while i<n :
i=i+1
29
# coefficient.
sum_Y * sum_Y)))
return corr
# Driver function
n = len(X)
OUTPUT:
0.953463
30
RESULT
31
Ex. No: 7 REGGRESSION
Date:
AIM
PROCEDURE
32
PROGRAM
import numpy as np
defestimate_coef(x, y):
# number of observations/points
n = np.size(x)
m_x = np.mean(x)
m_y = np.mean(y)
defplot_regression_line(x, y, b):
33
y_pred = b[0] + b[1]*x
# putting labels
plt.xlabel('x')
plt.ylabel('y')
plt.show()
defmain():
# observations / data
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
# estimating coefficients
b = estimate_coef(x, y)
print("Estimated coefficients:\nb_0 = {} \
plot_regression_line(x, y, b)
if name == " main ":
main()
34
OUTPUT:
RESULT:
35
Ex. No: 8
Z-TEST
Date:
AIM
PROCEDURE
36
Program:-
# imports
import math
import numpy as np
mean_iq = 110
sd_iq = 15/math.sqrt(50)
alpha =0.05
null_mean =100
data = sd_iq*randn(50)+mean_iq
# now we perform the test. In this function, we passed data, in the value parameter
# mean is larger
37
# the function outputs a p_value and z-score corresponding to that value, we compare
the
# p-value with alpha, if it is greater than alpha then we do not null hypothesis
if(p_value< alpha):
else:
OUTPUT:
mean=110.27 stdv=2.08
RESULT:
38
Ex.No:09 T-TEST
Date:
AIM
PROCEDURE
39
Program:-
import numpy as np
from scipy import stats
N=10
x=np.random.randn(N)+2
y=np.random.randn(N)
var_x=x.var(ddof=1)
var_y=y.var(ddof=1)
sd=np.sqrt((var_x+var_y)/2)
print('standard deviation =',sd)
tval =(x.mean()-y.mean())/(sd*np.sqrt(2/N))
dof=2*N-2
pval=1-stats.t.cdf(tval,df=dof)
print('t=' +str(tval))
print('p='+str(2*pval))
tval2,pval2=stats.ttest_ind(x,y)
print('t=' +str(tval2))
print('p='+str(pval2))
40
OUTPUT:
standard deviation = 0.839417088469087
t=3.9906805884970167
p=0.0008574438057769029
t=3.9906805884970162
p=0.0008574438057768628
RESULT:
Thus the program was created and executed successfully.
41
Ex.No:10 ANOVA
Date:
AIM
PROCEDURE
42
PROGRAM:⁓
# Importing libraries
import pandas as pd
import numpy as np
import statsmodels.api as sm
from statsmodels.formula.api import ols
# Create a dataframe
dataframe = pd.DataFrame({'Fertilizer': np.repeat(['daily', 'weekly'], 15),
'Watering': np.repeat(['daily', 'weekly'], 15),
'height': [14, 16, 15, 15, 16, 13, 12, 11,14, 15,
16, 16, 17, 18, 14, 13,14, 14, 14, 15, 16, 16, 17, 18,14, 13, 14, 14, 14, 15]})
43
OUTPUT:
C(Fertilizer) 1.0 0.033333 0.033333 0.012069 0.913305
C(Watering) 1.0 0.000369 0.000369 0.000133 0.990865
C(Fertilizer):C(Watering) 1.0 0.040866 0.040866 0.014796 0.904053
Residual 28.0 77.333333 2.761905 NaNNaN
RESULT:
Thus the program was created and executed successfully.
44
Ex No:11 BUILDING AND VALIDATING LINEAR MODELS
Date:
AIM
PROCEDURE
45
PROGRAM:⁓
import matplotlib.pyplot as plt
from scipy import stats
x = [89,43,36,36,95,10,66,34,38,20,26,29,48,64,6,5,36,66,72,40]
y = [21,46,3,35,67,95,53,72,58,10,26,34,90,33,38,20,56,2,47,15]
slope, intercept, r, p, std_err = stats.linregress(x, y)
defmyfunc(x):
return slope * x + intercept
mymodel = list(map(myfunc, x))
plt.scatter(x, y)
plt.plot(x, mymodel)
plt.show()
OUTPUT:
46
RESULT:
Thus the program was created and executed successfully.
47
Ex No:12 BUILDING AND VALIDATING LOGISTIC MODELS
Date:
AIM
PROCEDURE
48
PROGRAM:
import numpy
from sklearn import linear_model
X = numpy.array([3.78, 2.44, 2.09, 0.14, 1.72, 1.65, 4.92, 4.37, 4.96, 4.52, 3.69,
5.88]).reshape(-1,1)
y = numpy.array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1])
logr = linear_model.LogisticRegression()
logr.fit(X,y)
deflogit2prob(logr, X):
log_odds = logr.coef_ * X + logr.intercept_
odds = numpy.exp(log_odds)
probability = odds / (1 + odds)
return(probability)
print(logit2prob(logr, X))
49
OUTPUT:
[[0.60749955]
[0.19268876]
[0.12775886]
[0.00955221]
[0.08038616]
[0.07345637]
[0.88362743]
[0.77901378]
[0.88924409]
[0.81293497]
[0.57719129]
[0.96664243]]
RESULT:
Thus the program was created and executed successfully.
50
Ex No:13 TIME SERIES ANALYSIS
Date:
AIM:
PROCEDURE:
51
PROGRAM:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# reading the dataset using read_csv
df = pd.read_csv("stock_data.csv",
parse_dates=True, index_col="Date")
# displaying the first five rows of dataset
df.head()
# deleting column
df.drop(columns='Unnamed: 0')
df['Volume'].plot()
plt.show()
52
OUTPUT:
RESULT:
Thus the program was created and executed successfully.
53