0% found this document useful (0 votes)

13 views153 pages

Dsa Record-1

Uploaded by

papu varsha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views153 pages

Dsa Record-1

Uploaded by

papu varsha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 153

EXP NO:1a MATRIX MANIPULATION USING NUMPY

DATE:

AIM:
To write a python program for manipulating the matrix operations using Numpy.

ALGORITHM:
1. Start the program.
2. Import Numpy library.
3. Get the input matrices x and y.
4. Manipulate the matrix operations add, subtract, multiply, division and dot.
5. Display the output.
6. Stop.

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 1

PROGRAM:
import numpy

# initializing matrices
x = numpy.array([[1, 2], [4, 5]])
y = numpy.array([[7, 8], [9, 10]])
# using add() to add matrices
print ("The element wise addition of matrix is : ")
print (numpy.add(x,y))
# using add() to add matrices
print ("The element wise subtraction of matrix is : ")
print (numpy.subtract(x,y))
# using divide() to divide matrices
print ("The element wise division of matrix is : ")
print (numpy.divide(x,y))
# using multiply() to multiply matrices element wise
print ("The element wise multiplication of matrix is : ")
print (numpy.multiply(x,y))
# using dot() to multiply matrices
print ("The product of matrices is : ")
print (numpy.dot(x,y))

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 2

OUTPUT:

The element wise addition of matrix is:

[[ 8 10]
[13 15]]
The element wise subtraction of matrix is:
[[-6 -6]
[-5 -5]]
The element wise division of matrix is:
[[0.14285714 0.25]
[0.44444444 0.5]]
The element wise multiplication of matrix is:
[[ 7 16]
[36 50]]
The product of matrices is:
[[25 28]
[73 82]]

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 3

RESULT:

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 4

EX.NO 1b AGGREGATE AND STATISTICAL FUNCTIONS USING NUMPY
DATE:

AIM:

To write a python program for Aggregate and statistical functions using Numpy.

ALGORITHM:
1. Start the program
2. Import the Numpy library.
3. Get the input array.
4. Using the methods calculate the mean, mode and standard deviation.
5. Display the output.
6. Stop the program.

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 5

PROGRAM:

import numpy as np

array1 = np.array([[10, 20, 30], [40, 50, 60]])

print("Mean: ", np.mean(array1))

print("Std: ", np.std(array1))

print("Var: ", np.var(array1))

print("Sum: ", np.sum(array1))

print("Prod: ", np.prod(array1))

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 6

OUTPUT:
Mean: 35.0
Std: 17.07825127659933
Var: 291.6666666666667
Sum: 210
Prod: 720000000

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 7

RESULT:

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 8

EX.NO1c: RESHAPE USING NUMPY

DATE:

AIM:
To write the python program for reshaping the array using Numpy.

ALGORITHM:
1. Start the program
2. Import the numpy library.
3. Get the input array.
4. Call the reshape () function.
5. Display the output.

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 9

PROGRAM
import numpy as np

thearray = np.array([1, 2, 3, 4, 5, 6, 7, 8])

thearray = thearray.reshape(2, 4)
print(thearray)

print("-" * 10)
thearray = thearray.reshape(4, 2)
print(thearray)

print("-" * 10)
thearray = thearray.reshape(8, 1)
print(thearray)

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 10

OUTPUT:
[[1 2 3 4]
[5 6 7 8]]
----------
[[1 2]
[3 4]
[5 6]
[7 8]]
----------
[[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]]

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 11

RESULT:

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 12

EX.NO2a: CREATING DATAFRAMES USING LIST
DATE:

AIM:
To write the python program for creating the data frames using list.
ALGORITHM:
1. Start the program.
2. Import the pandas package as pd.
3. Declare the input as list.
4. Load the data into the data frame.
5. Display the result.

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 13

PROGRAM 1:
import pandas as pd

# string values in the list

lst = ['Java', 'Python', 'C', 'C++',
'JavaScript', 'Swift', 'Go']

# Calling DataFrame constructor on list

dframe = pd.DataFrame(lst)
print(dframe)

PROGRAM 2:

import pandas as pd

data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}

#load data into a DataFrame object:

df = pd.DataFrame(data)

print(df)

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 14

OUTPUT 1:
calories duration
0 420 50
1 380 40
2 390 45

OUTPUT 2:
0
0 Java
1 Python
2 C
3 C++
4 JavaScript
5 Swift
6 Go

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 15

RESULT:

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 16

EX.NO2b: HIERARCHICAL INDEXING USING PANDAS
DATE:

AIM:
To write the python program to create hierarchical indexing using pandas dataframes.
ALGORITHM:
1. Start the program.
2. Import the library pandas as pd.
3. Create the data frames.
4. Using set_index function to manipulate the hierarchical indexing.
5. By using view command, the indexing can be displayed.
6. Manipulate the hierarchical indexing with out drop command using the set-index function
with drop=false.
7. Display the result.
8. Stop the program.

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 17

PROGRAM:
import pandas as pd
import numpy as np

#Create a DataFrame
d={
'Name':['Alisa','Bobby','Cathrine','Alisa','Bobby','Cathrine',
'Alisa','Bobby','Cathrine','Alisa','Bobby','Cathrine'],
'Exam':['Semester 1','Semester 1','Semester 1','Semester 1','Semester 1','Semester 1',
'Semester 2','Semester 2','Semester 2','Semester 2','Semester 2','Semester 2'],

'Subject':['Mathematics','Mathematics','Mathematics','Science','Science','Science',
'Mathematics','Mathematics','Mathematics','Science','Science','Science'],
'Score':[62,47,55,74,31,77,85,63,42,67,89,81]}

df = pd.DataFrame(d,columns=['Name','Exam','Subject','Score'])
df

# multiple indexing or hierarchical indexing

df1=df.set_index(['Exam', 'Subject'])
df1
# View index
df1.index
# Swap the column in multiple index
df1.swaplevel('Subject','Exam')

# multiple indexing or hierarchical indexing with drop=False

df1=df.set_index(['Exam', 'Subject'],drop=False)
df1

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 18

OUTPUT:

Hierarchical Indexing:

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 19

View Index:
MultiIndex([('Semester 1', 'Mathematics'),
('Semester 1', 'Mathematics'),
('Semester 1', 'Mathematics'),
('Semester 1', 'Science'),
('Semester 1', 'Science'),
('Semester 1', 'Science'),
('Semester 2', 'Mathematics'),
('Semester 2', 'Mathematics'),
('Semester 2', 'Mathematics'),
('Semester 2', 'Science'),
('Semester 2', 'Science'),
('Semester 2', 'Science')],

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 20

names=['Exam', 'Subject'])

SWAP LEVEL:

Hierarchical indexing or multiple indexing without dropping:

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 21

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 22
RESULT:

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 23

EX.NO 3a LINE GRAPH
DATE:

AIM:
To write a python program to plot a line graph using matplotlib library.

ALGORITHM:
1.Start
2.Import matplotlib library
3.Assign values for an array x and y
4.Assign the label for x-axis
5.Assign the label for y-axis
6.Assign the title for the graph
7.Show the plotted graph.
8.Stop

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 24

PROGRAM:
import matplotlib.pyplot as plt

# x axis values

x = [1,2,3,4]

# corresponding y axis values

y = [2,4,1,5]

# plotting the points

plt.plot(x, y)

# naming the x axis

plt.xlabel('x - axis')

# naming the y axis

plt.ylabel('y - axis')

# giving a title to my graph

plt.title('Plot graph!')

# function to show the plot

plt.show()

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 25

OUTPUT:

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 26

RESULT:

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 27

EX.NO:3b SINE WAVE GRAPH
DATE:

AIM:
To write a python program to plot a sine wave graph using matplotlib library.

ALGORITHM:

1.Start

2.Import matplotlib library

3.Import numpy library

4.Import math

5.Assign the values of x and y

6.Plot the graph

7.Assign the label for X-axis and Y-axis respectively

8.Assign the title for the graph

9.Show the plotted graph

10.stop

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 28

PROGRAM:

from matplotlib import pyplot as plt

import numpy as np

import math #needed for definition of pi

x = np.arange(0, math.pi*2, 0.05)

y = np.sin(x)

plt.plot(x,y)

plt.xlabel('angle')

plt.ylabel('sine')

plt.title('sine wave')

plt.show()

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 29

OUTPUT:

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 30

RESULT:

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 31

EX.NO3c: MULTIPLOT GRAPH

DATE:

AIM:

To write a python program to plot a multiplot graph using matplotlib library.

ALGORITHM:

1.Start

2.Import matplotlib library

3.Create an lists a and b with values

4.Plot a, b and list(range(0,22,3))

5.Assign the names for x and y axis

6.Create an array c with values

7.Plot c and label c

8.Get the current axis command

9.With the current axis command set the boundary line to right, top and left

10.Set the interval for x and y axis

11.Assign the names for legend

12.Assign the names for title

13.Show the plotted graph

14.Stop

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 32

PROGRAM:

import matplotlib.pyplot as plt

a = [1, 2, 3, 4, 5]

b = [0, 0.6, 0.2, 15, 10, 8, 16, 21]

plt.plot(a)

# o is for circles and r is for red

plt.plot(b, 'or')

plt.plot(list(range(0, 22, 3)))

# naming the x-axis

plt.xlabel('Day ->')

# naming the y-axis

plt.ylabel('Temp ->')

c = [4, 2, 6, 8, 3, 20, 13, 15]

plt.plot(c, label = '4th Rep')

# get current axes command

ax = plt.gca()

# get command over the individual

# boundary line of the graph body

ax.spines['right'].set_visible(False)

ax.spines['top'].set_visible(False)

# set the range or the bounds of

# the left boundary line to fixed range

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 33

ax.spines['left'].set_bounds(-3, 40)

# set the interval by which

# the x-axis set the marks

plt.xticks(list(range(-3, 10)))

# set the intervals by which y-axis set the marks

plt.yticks(list(range(-3, 20, 3)))

# legend denotes that what color signifies what

ax.legend(['1st Rep','2nd Rep','3rd Rep','4th Rep'])

# annotate command helps to write

# ON THE GRAPH any text xy denotes

# the position on the graph

plt.annotate('Temperature V / s Days', xy = (1.01, -2.15))

# gives a title to the Graph

plt.title('All Features Discussed')

plt.show()

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 34

OUTPUT:

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 35

RESULT:

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 36

EX.NO3d: PIE CHART
DATE:

AIM:

To write a python program to plot a pie chart using matplotlib library.

ALGORITHM:

1.Start

2.Import matplotlib library

3.Import numpy

4.Declare the values for an array ‘y’

5.Declare a list ‘mylist’

6.Assign the values for the pie function.

7.Assign the title for the legend function

8.Show the plotted graph

9.Stop

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 37

PROGRAM:

import matplotlib.pyplot as plt

import numpy as np

y = np.array([35, 25, 25, 15])

mylabels = ['Apples', 'Bananas', 'Cherries', 'Dates']

plt.pie(y, labels = mylabels)

plt.legend(title = 'Four Fruits:')

plt.show()

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 38

OUTPUT:

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 39

RESULT:

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 40

EX.NO:3e SUBPLOT

DATE:

AIM:

To write a python program to plot a subplot using matplotlib library.

ALGORITHM:

1.Start

2.Import matplotlib library

3.Import numpy

4.Assign the values for x and y array

5.Declare the subplot function

6.Plot the graph

7.Assign the title for plot as ‘sales’

8.Again, assign the values for x and y array for another graph

9.Declare the subplot function

10.Plot the graph

11.Assign the title for plot as ‘income’

12.Assign the suptitle as ‘my shop’

13.Show the plotted graph

14.Stop

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 41

PROGRAM:

import matplotlib.pyplot as plt

import numpy as np

#plot 1:

x = np.array([0, 1, 2, 3])

y = np.array([3, 8, 1, 10])

plt.subplot(1, 2, 1)

plt.plot(x,y)

plt.title('SALES')

#plot 2:

x = np.array([0, 1, 2, 3])

y = np.array([10, 20, 30, 40])

plt.subplot(1, 2, 2)

plt.plot(x,y)

plt.title('INCOME')

plt.suptitle('MY SHOP')

plt.show()

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 42

OUTPUT:

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 43

RESULT:

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 44

EX.NO3f: HISTOGRAM

DATE:

AIM:

To write a python program to plot a histogram using matplotlib library.

ALGORITHM:

1.Start

2.Import matplotlib and numpy libraries

3.Using the command axes plot the subplot

4.Create an array a with values and plot the histogram

5.Assign the name for title

6.Set the interval for x-axis

7.Assign the names for x and y axis

8.Show the plotted graph

9.Stop

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 45

PROGRAM:

from matplotlib import pyplot as plt

import numpy as np

fig,ax = plt.subplots(1,1)

a = np.array([22,87,5,43,56,73,55,54,11,20,51,5,79,31,27])

ax.hist(a, bins = [0,25,50,75,100])

ax.set_title('histogram of result')

ax.set_xticks([0,25,50,75,100])

ax.set_xlabel('marks')

ax.set_ylabel('no. of students')

plt.show()

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 46

OUTPUT:

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 47

RESULT:

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 48

EX.NO:3g BAR CHART

DATE:

AIM:

To write a python program to plot a bar chart using matplotlib library.

ALGORITHM:

1.Start

2.Import matplotlib library

3.With the current axes command add axes

4.Create an arrays langs and students and assign the values

5.Plot the bar chart using langs and students

6.Show the plotted graph

7.Stop

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 49

PROGRAM:

import matplotlib.pyplot as plt

fig = plt.figure(figsize=(8,6))

ax = fig.add_axes([0.15,0.1,0.7,0.74])

langs = ['C','C++','Java','Python','PHP']

students = [23,17,35,29,12]

ax.bar(langs,students,color='green',width=0.4)

plt.xlabel("Languages available")

plt.ylabel("Number of students selected the languages")

plt.title("Bar graph for the languages opted by students")

plt.show()

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 50

OUTPUT:

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 51

RESULT:

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 52

EX.NO:3h SCATTER PLOT

DATE:

AIM:

To write a python program to plot a scatter plot using matplotlib library.

ALGORITHM:

1.Start

2.Import matplotlib library

3.Create an arrays x and y using values

4.Plot the scatter plot with the color “blue”

5.Assign the names for x and y axis

6.Assign the name for legend functions

7.Show the plotted graph

8.Stop

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 53

PROGRAM:

import matplotlib.pyplot as plt

x =[5, 7, 8, 7, 2, 17, 2, 9,4, 11, 12, 9, 6]

y =[99, 86, 87, 88, 100, 86,103, 87, 94, 78, 77, 85, 86]

plt.scatter(x, y, c ='blue')

plt.xlabel('X-values')

plt.ylabel('Y-values')

plt.legend(['plot values'])

# To show the plot

plt.show()

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 54

OUTPUT:

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 55

RESULT:

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 56

EX.NO:4a FREQUENCY DISTRIBUTIONS
DATE:

AIM:

To write a python program to implement the frequency tables using frequency distribution.

ALGORITHM:

1. Start the program.

2. Import the pandas library.
3. Create a .csv file with values and set the path.
4. Make an frequency table of pos(position) column from the dataset.
5. Find the frequency table of height column from the file.
6. Use Series.sort_index() method to sort the file.
7. Representing data in ascending order, then set the ascending parameter false.
8. Display the result.
9. Stop the program

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 57

PROGRAM:

import pandas as pd

wnba = pd.read_csv(‘wnba.csv’)

freq_dis_pos = wnba[‘Pos’].value_counts()

freq_dis_pos

freq_dis_height = wnba["Height"].value_counts()

freq_dis_height

freq_dis_height =wnba["Height"].value_counts().sort_index(ascending= False)

freq_dis_height

freq_dis_height = wnba["Height"].value_counts().sort_index()

freq_dis_height

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 58

OUTPUT:
G 60
F 33
C 25
G/F 13
F/C 12
Name: Pos, dtype: int64

188 20
193 18
175 16
185 15
173 11
183 11
191 11
196 9
178 8
180 7
170 6
198 5
168 2
201 2
165 1
206 1
Name: Height, dtype: int64

206 1
201 2
198 5
196 9
193 18
191 11
188 20
185 15
183 11
180 7
178 8
175 16
173 11
170 6
168 2
165 1
Name: Height, dtype: int64

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 59

165 1
168 2
170 6
173 11
175 16
178 8
180 7
183 11
185 15
188 20
191 11
193 18
196 9
198 5
201 2
206 1
Name: Height, dtype: int64

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 60

RESULT:

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 61

EX.NO:4b RELATIVE FREQUENCY AND PERCENTAGE FREQUENCY
DATE:

AIM:
To write the python program for relative frequency and percentile ranks using pandas.

ALGORITHM:

1. Start the program.

2. Import the pandas library.
3. Create a .csv file with values and set the path.
4. Make an percentage table of pos(position) column from the dataset.
5. From scipy import percentileofscores.
6. Find the percentage table of age column from the file.
7. Display the output.
8. Stop the program.

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 62

PROGRAM:
import pandas as pd
wnba = pd.read_csv("C:\\Users\\HARSHI\\Downloads\\wnba.csv")
wnba["Age"].value_counts() / len(wnba)
percentages_pos = wnba["Age"].value_counts(normalize=True).sort_index() * 100
percentages_pos
from scipy.stats import percentileofscore
percentile_of_25 = percentileofscore(wnba["Age"], 25, kind = ‘weak’)
percentile_of_25
percentiles = wnba["Age"].describe()
percentiles = wnba["Age"].describe(percentiles = [.1, .15, .33, .5, .592, .85, .9])
percentiles

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 63

OUTPUT:
24 0.111888
23 0.104895
25 0.104895
28 0.097902
27 0.090909
26 0.083916
22 0.069930
30 0.062937
29 0.055944
31 0.055944
32 0.055944
34 0.034965
35 0.027972
33 0.020979
21 0.013986
36 0.006993
Name: Age, dtype: float64

21 1.398601
22 6.993007
23 10.489510
24 11.188811
25 10.489510
26 8.391608
27 9.090909
28 9.790210
29 5.594406
30 6.293706
31 5.594406
32 5.594406
33 2.097902
34 3.496503
35 2.797203
36 0.699301
Name: Age, dtype: float64

40.55944055944056

count 143.000000
mean 27.076923
std 3.679170
min 21.000000
10% 23.000000
15% 23.000000
33% 25.000000
50% 27.000000
59.2% 28.000000
85% 31.000000
90% 32.000000
max 36.000000
Name: Age, dtype: float64

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 64

RESULT:

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 65

EX.NO: 4c AVERAGES
DATE:

AIM:

To write a python program to execute the Average of the values.

ALGORITHM:
1. Start the program.
2. Import the statistics package.
3. Using the method mean to calculate the average of given data.
4. Display the result.
5. Stop the program.

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 66

PROGRAM:
import statistics

# list of positive integer numbers

data1 = [1, 3, 4, 5, 7, 9, 2]

x = statistics.mean(data1)

# Printing the mean

print("Mean is :", x)

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 67

OUTPUT:

Mean is : 4.428571428571429

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 68

RESULT:

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 69

Ex.No: 4d) VARIABILTY USING DATA VALUES
Date:

Aim:
To write a python program to execute the variability using data values.

Algorithm:

Step 1: Start.

Step 2: Import statistics library.

Step 3: Create a sample data

Step 4: Print the variance of the sample data.

Step 5: Stop.

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 70

Program:

import statistics
sample = [2.74, 1.23, 2.63, 2.22, 3, 1.98]
print("Variance of sample set is % s"%(statistics.variance(sample)))

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 71

Output:

Variance of sample set is 0.40924

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 72

Result:

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 73

Ex.No: 4e) VARIABILTY USING LIST
Date:

Aim:
To write a python program to execute the variability using list.

Algorithm:

Step 1: Start.

Step 2: Import statistic library.

Step 3: Create a list with values.

Step 4: Calculate the mean of the value.

Step 5: Calculate the variance of the value.

Step 6: Print.

Step 7: Stop.

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 74

Program:

import statistics
sample = (1, 1.3, 1.2, 1.9, 2.5, 2.2)
m = statistics.mean(sample)
print("Variance of Sample set is % s"%(statistics.variance(sample, xbar = m)))

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 75

Output:

Variance of Sample set is 0.3656666666666667

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 76

Result:

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 77

Ex.No: 4f VARIABILTY USING PANDAS
Date:

Aim:
To write a python program to execute the variability using pandas.

Algorithm:

Step 1: Start.

Step 2: Import pandas library.

Step 3: Create a list with values.

Step 4: Assign the values in series to sample.

Step 5: Print the type of the value.

Step 6: Print the mean of the value.

Step 7: Print the median of the value.

Step 8: Print the standard deviation of the value.

Step 9: Print the variance of the value.

Step 10: Stop.

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 78

Program:

import pandas as pd
lst = [33219, 36254, 38801, 46335, 46840, 47596, 55130, 56863, 78070, 88830]
sample = pd.Series(lst)
print(type(sample))
print(sample.mean())
print(sample.median())
print(sample.std(ddof=0))
print(sample.var(ddof=0))
print(sample.var(ddof=1))
print(sample.mad())

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 79

Output:

<class 'pandas.core.series.Series'>
52793.8
47218.0
17076.965197598784
291622740.36
324025267.06666666
13543.560000000001

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 80

Result:

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 81

Ex.No: 5a NORMAL CURVES

Date:

Aim:
To write a python program to execute the normal curves.

Procedure:

Step 1: Start the program.

Step 2: Import the Numpy Library.

Step 3: Import matplotlib.

Step 4: Import normal from scipy.

Step 5: Assign the value of an array x.

Step 6: Plot the graph.

Step 7: show the plotted graph.

Step 8: Stop the program.

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 82

Program:

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
x = np.arange(-3, 3, 0.001)
plt.plot(x, norm.pdf(x, 0, 1))
plt.show()

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 83

Output:

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 84

Result:

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 85

Ex.No: 5b CORRELATION AND SCATTER PLOTS
Date:

Aim:
To write a python program to execute the correlations and scatter plots.

Procedure:

Step 1: Start.

Step 2: Import sklearn.

Step 3: Import Numpy Libraries.

Step 4: Import matplotlib.

Step 5: Import pandas Libraries.

Step 6: Assign the values in series to x and y.

Step 7: Assign the value of correlation of x and y to correlation.

Step 8: Assign the title to the plot and plot the scatter plot.

Step 9: Label the x and y axis.

Step 10: Show the plotted graph.

Step 11: Stop the program.

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 86

Program:

import sklearn
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
y = pd.Series([1, 2, 3, 4, 3, 5, 4])
x = pd.Series([1, 2, 3, 4, 5, 6, 7])
correlation = y.corr(x)
plt.title('Correlation')
plt.scatter(x, y)
plt.plot(np.unique(x),
np.poly1d(np.polyfit(x, y, 1))
(np.unique(x)), color='red')
plt.xlabel('x axis')
plt.ylabel('y axis')
plt.show()

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 87

Output:

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 88

Result:

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 89

Ex.No: 5c CORRELATION COEFFICIENT USING NUMPY
Date:

Aim:

To write a python program to execute the correlations coefficient using numpy.

Procedure:

Step 1: Start the program.

Step 2: Import Numpy.

Step 3: Assign the value of x and y.

Step 4: Assign the r value using correlation coefficient.

Step 5: Print r.

Step 7: Stop the program.

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 90

Program:

import numpy as np
x = np.arange(10, 20)
y = np.array([2, 1, 4, 5, 8, 12, 18, 25, 96, 48])
r = np.corrcoef(x, y)
print(r)

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 91

Output:

[[1. 0.75864029]
[0.75864029 1. ]]

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 92

Result:

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 93

Ex.No: 5d CORRELATION COEFFICIENT USING SCIPY

Date:

Aim:

To write a python program to execute the correlation coefficient using scipy.

Procedure:

Step 1: Start the program.

Step 2: Import Numpy Libraries.

Step 3: Import scipy.

Step 4: Assign the value of x And y.

Step 5: Print pearsonr value.

Step 6: Print spearmanr value.

Step 7: Print kendalltau value.

Step 8: Stop the program.

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 94

Program:

import numpy as np
import scipy.stats
x = np.arange(10, 20)
y = np.array([2, 1, 4, 5, 8, 12, 18, 25, 96, 48])
print(scipy.stats.pearsonr(x, y))
print(scipy.stats.spearmanr(x, y))
print(scipy.stats.kendalltau(x, y))

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 95

Output:

(0.758640289091187, 0.010964341301680813)
SpearmanrResult(correlation=0.9757575757575757, pvalue=1.4675461874042197e-06)
KendalltauResult(correlation=0.911111111111111, pvalue=2.9761904761904762e-05)

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 96

Result:
AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 97
Ex.No: 6 REGRESSION
Date:

Aim:
To write a python program to execute the regression.

Procedure:

Step 1: Start the program.

Step 2: Import Numpy library.

Step 3: Define the function to estimate the coefficient.

Step 4: Return b0 and b1.

Step 5: Define the plot regression line function.

Step 6: Display the plot.

Step 7: Define the main function.

Step 8: Assign the values of x and y.

Step 9: Assign the value of estimate coefficient to b.

Step 10: Print the estimated coefficient.

Step 11: Stop the program.

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 98

Program:

import numpy as np
import matplotlib.pyplot as plt
def estimate_coef(x, y):
n = np.size(x)
m_x = np.mean(x)
m_y = np.mean(y)
SS_xy = np.sum(y*x) - n*m_y*m_x
SS_xx = np.sum(x*x) - n*m_x*m_x
b_1 = SS_xy / SS_xx
b_0 = m_y - b_1*m_x
return (b_0, b_1)
def plot_regression_line(x, y, b):
plt.scatter(x, y, color = "m",marker = "o", s = 30)
y_pred = b[0] + b[1]*x
plt.plot(x, y_pred, color = "g")
plt.xlabel('x')
plt.ylabel('y')
plt.show()
def main():
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])
b = estimate_coef(x, y)

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 99

print("Estimated coefficients:\nb_0 = {} \\nb_1 = {}".format(b[0], b[1]))
plot_regression_line(x, y, b)
if __name__ == "__main__":
main()

Output:

Estimated coefficients:
b_0 = 1.2363636363636363
b_1 = 1.1696969696969697

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 100

Result:

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 101

Ex. No: 7 Z TEST CASE STUDIES
Date:
AIM:
To write the program for Z-test (both one tailed and two tailed hypotheses test).

EXPLANATION:

Z-test is a test for the proportions. In other words this is a statistical test that helps us
evaluate our beliefs about certain proportions in the population based on the sample at hand.

This can help us answer the questions like:

 is the proportion of female students at SKEMA equal to 0.5.

 is the proportion of smokers in France equal to 0.15.

For conducting Z-test you do not need much calculations on your sample data. The only thing
you need to know is the proportion of observations that qualify to belong to the sub-sample
you are interested in (e.g. a “female SKEMA student”, or a “French smoker” in examples
above).We will use the dataset on cars in the US for learning purposes. This contains a list of
32 cars and their characteristics.

In the simplest example involving the data at hand, we can ask the question whether the
share of cars with variable “am” being equal to 0 is equal to 50%.

Function used for z-testing is scipy.stats.binom_test. It requires three arguments x - number

of qualified observations in our data (19 in our case) n - number of total observations (32 in
our case) p - the null hypothesis on the share of qualified data (0.5 in our case)

Output of the test gives rich information about the test:

 It specifies the alternative hypothesis (by default it is set to conduct a two-sided test,
so the alternative hypothesis is that the share is not equal to the proportion specified
in the null hypotheses. However, we will see how to adjust this in next chapter)
 It specifies the confidence level and interval
 However, by default, it only returns the most important piece of information - the p-
value of the test

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 102

This value can be understood as the probability that we are making a mistake if we reject our
null hypothesis in favor of the alternative one. In this case this probability is 38% which is
very high (anything above 10% is high), which would prompt us to conclude that we do not
have enough statistical evidence to claim that the share of cars with am=0 was not 50% in the
population.

PROGRAM:

import seaborn as sns

import scipy.stats as stats
import numpy as np
import random
import warnings
import matplotlib.pyplot as plt
%matplotlib inline
sns.set(rc={'figure.figsize':(13, 7.5)})
sns.set_context('talk')
warnings.filterwarnings('ignore')

#Visualization of One-Tail
values = np.random.normal(loc=0, scale=10, size=6000)
two_std_from_mean = np.mean(values) + np.std(values)*1.645
kde = stats.gaussian_kde(values)
pos = np.linspace(np.min(values), np.max(values), 10000)
plt.plot(pos, kde(pos), color='teal')
shade = np.linspace(two_std_from_mean, 40, 300)
plt.fill_between(shade, kde(shade), alpha=0.45, color='teal')
plt.title("Sampling Distribution for One-Tail Hypothesis Test", y=1.015, fon
tsize=20)
plt.xlabel("sample mean value", labelpad=14)
plt.ylabel("frequency of occurence", labelpad=14);
round(1-stats.norm.cdf(1.645), 2)
round(1-stats.norm.cdf(2.33), 2)
round(1-stats.norm.cdf(3.1), 3)

#Two-Tailed Hypothesis Tests

values = np.random.normal(loc=0, scale=10, size=6000)

alpha_05_positive = np.mean(values) + np.std(values)*1.96
alpha_05_negative = np.mean(values) - np.std(values)*1.96
kde = stats.gaussian_kde(values)
pos = np.linspace(np.min(values), np.max(values), 10000)
plt.plot(pos, kde(pos), color='dodgerblue')
shade = np.linspace(alpha_05_positive, 40, 300)
plt.fill_between(shade, kde(shade), alpha=0.45, color='dodgerblue')

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 103

shade2 = np.linspace(alpha_05_negative, -40, 300)
plt.fill_between(shade2, kde(shade2), alpha=0.45, color='dodgerblue')
plt.title("Sampling Distribution for Two-Tail Hypothesis Test", y=1.015, fon
tsize=20)
plt.xlabel("sample mean value", labelpad=14)
plt.ylabel("frequency of occurence", labelpad=14);
round(1-stats.norm.cdf(1.96), 3)
round(1-stats.norm.cdf(2.575), 3)
round(1-stats.norm.cdf(3.29), 3)
population_mean_pounds = 160
population_size = 5500
population_std_dev_pounds = 22

np.random.seed(50)
population_gym_goers_mass = np.random.normal(loc=population_mean_pounds, sca
le=population_std_dev_pounds, size=5500)
n = 30
treatment_sample_mean_pounds = 169
np.random.seed(50)
sample_means = []
for sample in range(0, 500):
sample_values = np.random.choice(a=population_gym_goers_mass, size=n)
sample_mean = np.mean(sample_values)
sample_means.append(sample_mean)
#sampling distribution
sns.distplot(sample_means, color='darkviolet')
plt.title("Sampling Distribution ($n=30$) of Gym Goers' Mass in Pounds", y=1
.015, fontsize=20)
plt.xlabel("sample mean mass [pounds]", labelpad=14)
plt.ylabel("frequency of occurence", labelpad=14);

standard_error_pounds = population_std_dev_pounds / np.sqrt(n)

standard_error_pounds
sample_mean_at_positive_z_critical = 1.96*standard_error_pounds
+population_mean_pounds
sample_mean_at_positive_z_critical
sample_mean_at_negative_z_critical = 1.96*standard_error_pounds+population_mean_poun
ds
sample_mean_at_negative_z_critical

kde = stats.gaussian_kde(sample_means)
pos = np.linspace(np.min(sample_means), np.max(sample_means), 10000)
plt.plot(pos, kde(pos), color='darkviolet')
shade = np.linspace(sample_mean_at_positive_z_critical, 175, 300)
plt.fill_between(shade, kde(shade), alpha=0.45, color='darkviolet')
shade2 = np.linspace(sample_mean_at_negative_z_critical, 145, 300)
plt.fill_between(shade2, kde(shade2), alpha=0.45, color='darkviolet')

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 104

plt.axvline(x=treatment_sample_mean_pounds, linestyle='--', linewidth=2.5, label="sa
mple mean with Joe personal trainer", c='purple')
plt.title("Sampling Distribution ($n=30$) of Gym Goers' Mass in Pounds", y=1
.015, fontsize=20)
plt.xlabel("sample mean mass [pounds]", labelpad=14)
plt.ylabel("probability of occurence", labelpad=14)
plt.legend();

p_value = round(1-stats.norm.cdf(z_score), 3)
p_value
true_population_mean_pounds_with_joe_training = 162
z_true = (treatment_sample_mean_pounds - true_population_mean_pounds_with_jo
e_training)/standard_error_pounds
z_true
plt.plot(pos, kde(pos), color='darkviolet')
shade = np.linspace(sample_mean_at_positive_z_critical, 175, 300)
plt.fill_between(shade, kde(shade), alpha=0.45, color='darkviolet')
shade2 = np.linspace(sample_mean_at_negative_z_critical, 145, 300)
plt.fill_between(shade2, kde(shade2), alpha=0.45, color='darkviolet')
plt.axvline(x=treatment_sample_mean_pounds, linestyle='--', linewidth=2.5, l
abel="sample mean with Joe personal trainer", c='purple')
plt.axvline(x=true_population_mean_pounds_with_joe_training, linestyle='--', line
width=2.5, label="true population mean with Joe's training", c='c')
plt.xlabel("sample mean mass [pounds]", labelpad=14)
plt.ylabel("probability of occurence", labelpad=14)
plt.legend();

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 105

OUTPUT:

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 106

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 107
AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 108
RESULT:
Thus, the program for Z-Test (both one tailed and two tailed) has been studied,
executed and output has been verified successfully.

Ex. No: 8 T-test case studies

Date:
AIM:
To write the program for T-test (both one tailed and two tailed hypotheses
test).

EXPLANATION:
A T-test is among the most frequently utilized procedures in statistics. However, many people
who even use T-test frequently do not precisely know what happens to their data when
wheeled away and operated upon in the background using the applications such as R and
Python. The T-test is the test that compares two averages, also known as means, and tells us
whether they differ from each other or not. The T-test is also known as Student's T-test, and it
also tells us how significant the differences are. In other terms, it provides us knowledge of
whether those differences could have occurred by chance.

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 109

A Ratio between the difference between two groups and the difference within the groups is
known as the T-score. If the T-score is more significant, this means that there is more
difference present between the groups. At the same time, the smaller T-score signifies the
similarities between the groups. A T-score of Three (3) indicates that the group is three times
different from each other and within each other. When we get a bigger T-value while running
a T-test, it more list that the outcomes are repeatable.
Thus, we can conclude that the following:
A large T-score implies that the groups are different from each other.
A small T-score implies that the groups are similar.
Now, let us understand the T-values and P-values.
Understanding T-values and P-values
Every T-value contains a P-value to work with it. A P-value is referred to as the probability
that the outcomes from the sample data happened coincidentally. P-values have values
starting from 0% to 100%. They are generally written as a decimal. For instance, a P-value of
10% is 0.1. It is good to have low P-values. Lower P-values indicate that the data did not
happen coincidentally. For instance, a P-value of 0.1 indicates that there is only a 1%
probability that the experiment's outcomes occurred coincidentally. Generally, in many cases,
a P-value of 5%, that is 0.05, is accepted to mean the data is said to be valid.
There are Three significant Types of T-test:
Independent Samples T-test: This test is used to compare the averages or means for two
groups.
Paired Sample T-test: This test is used to compare means from the same group at different
times (For example, one year apart).
One Sample T-test: This test is used to test the mean of a single group against an
acknowledged mean.
Performing a Sample T-test
Suppose that we need to test if the men's height in the population differs from the women's
height in general. Thus, we will take a sample from the population and utilize the T-test to
check whether the result is significant or not.
Step 1: Determining a Null and Alternate Hypothesis
Step 2: Collecting Sample data
Step 3: Determining a Confidence Interval and Degrees of Freedom
Step 4: Calculating the T-Statistics
Step 5: Calculating the critical T-value from the T-Distribution
Step 6: Comparing the critical T-values with the calculated T-Statistics

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 110

Determining a null and alternate hypothesis

Starting with defining a null and alternate hypothesis is necessary. In general, the null
hypothesis will express that the two being tested populations have no significant difference
statistically. On the other hand, the alternate hypothesis will express that there is one present.
For this example, we can conclude the following statements:

1. Null Hypothesis: The height of men & women is the same.

2. Alternate Hypothesis: The height of men differs from the height of women.

Collecting sample data

Once we determined the hypothesis, we will start collecting the data from each population
group. For this example, we will be collecting two sets of data. The one data set containing the
height of men and the other one with the height of men. The size of sample data ideally needs
to be identical; however, it can be different. Suppose that the sizes of sample data are n x and
ny.

Determining a Confidence interval and degrees of freedom

Confidence interval is generally called alpha (α). The typical value of alpha (α) is 0.05. This
statement implies that there is 95% confidence for the valid conclusion of the test. We can
define the degree of freedom by using the formula given below:

Calculating the T-Statistic

We can calculate the t-statistic by using the following formula:

n = number of scores per group

T-Test in Python
x = individual scores
M = mean

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 111

n = number of scores in group

Moreover, Mx and My are the values of the mean of the two female and male samples. Nx and
Ny are the sample space of the two samples, and S is the standard deviation.
Calculating the critical T-value from the T-Distribution
We require two objects in order to calculate the critical t-value. The first is the alpha's chosen
value, and the other is the degrees of freedom. The formula of critical t-value is complex;
however, it is static for a fixed degree of freedom pair and the alpha's value. We thus, utilize a
table in order to calculate the critical t-value.
However, Python provides a function in the SciPy library that serves the same purpose.
Comparing the critical T-values with the calculated T-Statistic. Once the critical T-value is
calculated, we will compare the value with the T-Statistic that we have calculated earlier. If
the critical t-value is less than the calculated T-Statistic, the test deduces that a significant
difference is present between the two populations statistically. Hence, we have to reject the
null hypothesis that no significant difference is present between the two samples statistically.
However, in another case where there is no significant difference between the two
populations, the test fails to reject the null hypothesis. Thus, we accept the alternate
hypothesis implying that the men's and women's height are statistically different.

PROGRAM:
# Importing the required libraries and packages
import numpy as np
from scipy import stats
# Defining two random distributions
# Sample Size
N = 10
# Gaussian distributed data with mean = 2 and var = 1
x = np.random.randn(N) + 2

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 112

# Gaussian distributed data with mean = 0 and var = 1
y = np.random.randn(N)
# Calculating the Standard Deviation
# Calculating the variance to get the standard deviation
var_x = x.var(ddof = 1)
var_y = y.var(ddof = 1)
# Standard Deviation
SD = np.sqrt((var_x + var_y) / 2)
print("Standard Deviation =", SD)
# Calculating the T-Statistics
tval = (x.mean() - y.mean()) / (SD * np.sqrt(2 / N))
# Comparing with the critical T-Value
# Degrees of freedom
dof = 2 * N - 2
# p-value after comparison with the T-Statistics
pval = 1 - stats.t.cdf( tval, df = dof)
print("t = " + str(tval))
print("p = " + str(2 * pval))
## Cross Checking using the internal function from SciPy Package
tval2, pval2 = stats.ttest_ind(x, y)
print("t = " + str(tval2))
print("p = " + str(pval2))

OUTPUT:
Standard Deviation = 1.0840799841818152
t = 4.4083686523600845
p = 0.00033912894968146645
t = 4.408368652360084
p = 0.0003391289496815314

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 113

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 114
RESULT:
Thus, the program for T-Test (both one tailed and two tailed) has been
studied, executed and output has been verified successfully.

Ex. No: 9 ANOVA CASE STUDIES

Date:

AIM:
To write the python program for ANOVA-test.

EXPLANATION:

ANOVA (ANalysis Of VAriance):

 ANOVA test used to compare the means of more than 2 groups (t-test can
be used to compare 2 groups).
 Groups mean differences inferred by analyzing variances.
AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 115
 ANOVA uses variance-based F test to check the group mean equality.
Sometimes, ANOVA F test is also called omnibus test as it tests non-
specific null hypothesis i.e. all group means are equal.
 Main types: One-way (one factor) and two-way (two factors) ANOVA
(factor is an independent variable).
 It is also called univariate ANOVA as there is only one dependent variable
in the model. MANOVA is used when there are multiple dependent
variables in the dataset. If there is an additional continuous independent
variable in the model, then ANCOVA is used.
 If you have repeated measurements for treatments or time on same
subjects, you should use Repeated Measure ANOVA.
ANOVA Hypotheses:

Null hypothesis: Groups means are equal (no variation in means of

groups)
H0: μ1=μ2=…=μp
Alternative hypothesis: At least, one group mean is different from other
groups
H1: All μ are not equal
ANOVA Assumptions

 Residuals (experimental error) are approximately normally

distributed (Shapiro-Wilks test or histogram).
 homoscedasticity or Homogeneity of variances (variances are equal
between treatment groups) (Levene’s, Bartlett’s, or Brown-Forsythe
test).
 Observations are sampled independently from each other (no
relation in observations between the groups and within the groups)
i.e., each subject should have only one response.
 The dependent variable should be continuous. If the dependent
variable is ordinal or rank (e.g. Likert item data), it is more likely to
violate the assumptions of normality and homogeneity of variances.
If these assumptions are violated, you should consider the non-
parametric tests.
How ANOVA works?
 Check sample sizes: equal number of observations in each group
 Calculate Mean Square for each group (MS) (SS of group/level-1); level-1
is a degree of freedom (df) for a group
AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 116
 Calculate Mean Square error (MSE) (SS error/df of residuals)
 Calculate F value (MS of group/MSE)
 Calculate p value based on F value and degrees of freedom (df)
One-way (one factor) ANOVA:
The ANOVA table represents between- and within-group sources of variation,
and their associated degree of freedoms, the sum of squares (SS), and mean
squares (MS). The total variation is the sum of between- and within-group
variances. The F value is a ratio of between- and within-group mean squares
(MS). p value is estimated from F value and degree of freedoms.
Two-way (two factor) ANOVA (factorial design):

PROGRAM:
import pandas as pd
# load data file
df =
pd.read_csv("https://reneshbedre.github.io/assets/posts/anova/onewayanova.
txt", sep="\t")
# reshape the d dataframe suitable for statsmodels package
df_melt = pd.melt(df.reset_index(), id_vars=['index'], value_vars=['A', 'B', 'C', 'D'])
# replace column names
df_melt.columns = ['index', 'treatments', 'value']

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 117

# generate a boxplot to see the data distribution by treatments. Using boxplot,
we can
# easily detect the differences between different treatments
import matplotlib.pyplot as plt
import seaborn as sns
ax = sns.boxplot(x='treatments', y='value', data=df_melt, color='#99c2a2')
ax = sns.swarmplot(x="treatments", y="value", data=df_melt, color='#7d0013')
plt.show()

import scipy.stats as stats

# stats f_oneway functions takes the groups as input and returns ANOVA F and p
value
fvalue, pvalue = stats.f_oneway(df['A'], df['B'], df['C'], df['D'])
print(fvalue, pvalue)
# 17.492810457516338 2.639241146210922e-05

# get ANOVA table as R like output

import statsmodels.api as sm
from statsmodels.formula.api import ols
# Ordinary Least Squares (OLS) model
model = ols('value ~ C(treatments)', data=df_melt).fit()

anova_table = sm.stats.anova_lm(model, typ=2)

anova_table
# ANOVA table using bioinfokit v1.0.3 or later (it uses wrapper script for
anova_lm)
from bioinfokit.analys import stat

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 118

res = stat()
res.anova_stat(df=df_melt, res_var='value', anova_model='value ~
C(treatments)')
res.anova_summary

from bioinfokit.analys import stat

# perform multiple pairwise comparison (Tukey's HSD)
# unequal sample size data, tukey_hsd uses Tukey-Kramer test
res = stat()
res.tukey_hsd(df=df_melt, res_var='value', xfac_var='treatments',
anova_model='value ~ C(treatments)')
res.tukey_summary

#QQ PLOT
import statsmodels.api as sm
import matplotlib.pyplot as plt
# res.anova_std_residuals are standardized residuals obtained from ANOVA
(check above)
sm.qqplot(res.anova_std_residuals, line='45')
plt.xlabel("Theoretical Quantiles")
plt.ylabel("Standardized Residuals")
plt.show()
# histogram
plt.hist(res.anova_model_out.resid, bins='auto', histtype='bar', ec='k')
plt.xlabel("Residuals")
plt.ylabel('Frequency')
plt.show()

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 119

Two-way (two factor) ANOVA
import pandas as pd
import seaborn as sns
# load data file
d=pd.read_csv("https://reneshbedre.github.io/assets/posts/anova/
twowayanova.txt", sep="\t")
# reshape the d dataframe suitable for statsmodels package
# you do not need to reshape if your data is already in stacked format. Compare
d and d_melt tables for detail
# understanding
d_melt = pd.melt(d, id_vars=['Genotype'], value_vars=['1_year', '2_year',
'3_year'])
# replace column names
d_melt.columns = ['Genotype', 'years', 'value']
d_melt.head()
import statsmodels.api as sm
from statsmodels.formula.api import ols
model = ols('value ~ C(Genotype) + C(years) + C(Genotype):C(years)',
data=d_melt).fit()
anova_table = sm.stats.anova_lm(model, typ=2)
anova_table
from bioinfokit.analys import stat
res = stat()
res.anova_stat(df=d_melt, res_var='value', anova_model='value~C(Genotype)
+C(years)+C(Genotype):C(years)')
res.anova_summary

from statsmodels.graphics.factorplots import interaction_plot

import matplotlib.pyplot as plt
AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 120
fig = interaction_plot(x=d_melt['Genotype'], trace=d_melt['years'],
response=d_melt['value'],
colors=['#4c061d','#d17a22', '#b4c292'])
plt.show()

Multiple pairwise comparisons

from bioinfokit.analys import stat
# perform multiple pairwise comparison (Tukey HSD)
# unequal sample size data, tukey_hsd uses Tukey-Kramer test
res = stat()
# for main effect Genotype
res.tukey_hsd(df=d_melt, res_var='value', xfac_var='Genotype',
anova_model='value~C(Genotype)+C(years)+C(Genotype):C(years)')
res.tukey_summary
# for main effect years
res.tukey_hsd(df=d_melt, res_var='value', xfac_var='years', anova_model='value
~ C(Genotype) + C(years) + C(Genotype):C(years)')
res.tukey_summary
# for interaction effect between genotype and years
res.tukey_hsd(df=d_melt, res_var='value', xfac_var=['Genotype','years'],
anova_model='value ~ C(Genotype) + C(years) + C(Genotype):C(years)')
res.tukey_summary.head()
# QQ-plot
import statsmodels.api as sm
import matplotlib.pyplot as plt
# res.anova_std_residuals are standardized residuals obtained from two-way
ANOVA (check above)
sm.qqplot(res.anova_std_residuals, line='45')
plt.xlabel("Theoretical Quantiles")

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 121

plt.ylabel("Standardized Residuals")
plt.show()

# histogram
plt.hist(res.anova_model_out.resid, bins='auto', histtype='bar', ec='k')
plt.xlabel("Residuals")
plt.ylabel('Frequency')
plt.show()

# if you have a stacked table, you can use bioinfokit v1.0.3 or later for the
Levene's test
from bioinfokit.analys import stat
res = stat()
res.levene(df=d_melt, res_var='value', xfac_var=['Genotype', 'years'])
res.levene_summary

OUTPUT:

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 122

17.492810457516338 2.639241146210922e-05

group1 group2 Diff Lower Upper q-value p-value

0 A B 15.4 1.692871 29.107129 4.546156 0.025070
1 A C 1.6 -12.107129 15.307129 0.472328 0.900000
2 A D 30.4 16.692871 44.107129 8.974231 0.001000
3 B C 13.8 0.092871 27.507129 4.073828 0.048178
4 B D 15.0 1.292871 28.707129 4.428074 0.029578
5 C D 28.8 15.092871 42.507129 8.501903 0.001000

TWO WAY ANOVA TEST

Genotype yearsvalue
0 A 1_year 1.53
1 A 1_year 1.83

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 123

2 A 1_year 1.38
3 B 1_year 3.60
4 B 1_year 2.94

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 124

Parameter Value
0 Test statistics (W) 1.6849
1 Degrees of freedom (Df) 17.0000
2 p value 0.0927

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 125

RESULT:
Thus, the program for ANOVA-Test has been studied, executed and output
has been verified successfully.

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 126

Ex. No: 10 BUILDING AND VALIDATING LINEAR MODELS
Date:

AIM:
To write the python program for building and validating linear models.

EXPLANATION:
A linear regression is one of the easiest statistical models in machine learning. Understanding
its algorithm is a crucial part of the Data Science Python Certification’s course curriculum. It is
used to show the linear relationship between a dependent variable and one or more
independent variables.

Importing the dataset

Importing the dataset using pandas and also import other libraries such as numpy and
matplotlib. The dataset.head() shows the first few columns of our dataset.
Data Preprocessing
The X is independent variable array and y is the dependent variable vector. Note the
difference between the array and vector. The dependent variable must be in vector and
independent variable must be an array itself.
Splitting the dataset
We need to split our dataset into the test and train set. Generally, we follow the 20-80 policy
or the 30-70 policy respectively. This is because we wish to train our model according to the
years and salary. We then test our model on the test set. We check whether the predictions
made by the model on the test set data matches what was given in the dataset. If it matches, it
implies that our model is accurate and is making the right predictions.

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 127

Fitting linear regression model into the training set
From sklearn’s linear model library, import linear regression class. Create an object for a
linear regression class called regressor. To fit the regressor into the training set, we will call
the fit method – function to fit the regressor into the training set. We need to fit X_train
(training data of matrix of features) into the target values y_train. Thus the model learns the
correlation and learns how to predict the dependent variables based on the independent
variable.
Predicting the test set results
We create a vector containing all the predictions of the test set salaries. The predicted salaries
are then put into the vector called y_pred.(contains prediction for all observations in the test
set) predict method makes the predictions for the test set. Hence, the input is the test set. The
parameter for predict must be an array or sparse matrix, hence input is X_test.
Visualizing the results
To visualize the data, we plot graphs using matplotlib. To plot real observation points ie
plotting the real given values. The X-axis will have years of experience and the Y-axis will have
the predicted salaries. plt.scatter plots a scatter plot of the data. Parameters include:
 X – coordinate (X_train: number of years)
 Y – coordinate (y_train: real salaries of the employees)
 Color ( Regression line in red and observation line in blue)
2. Plotting the regression line
plt.plot have the following parameters :
 X coordinates (X_train) – number of years
 Y coordinates (predict on X_train) – prediction of X-train (based on a number of years).
Steps to build a Linear Regression model
Step 1: Importing the dataset
Step 2: Data pre-processing
Step 3: Splitting the test and train sets
Step 4: Fitting the linear regression model to the training set
Step 5: Predicting test results
Step 6: Visualizing the test results

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 128

PROGRAM:

# importing the dataset

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

dataset = pd.read_csv('Salary_Data.csv')
dataset.head()

# data preprocessing
X = dataset.iloc[:, :-1].values #independent variable array
y = dataset.iloc[:,1].values #dependent variable vector

# splitting the dataset

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=1/3,random_state=0)

# fitting the regression model

from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train,y_train) #actually produces the linear eqn for the data

# predicting the test set results

y_pred = regressor.predict(X_test)
y_pred

y_test

# visualizing the results

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 129

#plot for the TRAIN

plt.scatter(X_train, y_train, color='red') # plotting the observation line

plt.plot(X_train, regressor.predict(X_train), color='blue') # plotting the regression line
plt.title("Salary vs Experience (Training set)") # stating the title of the graph

plt.xlabel("Years of experience") # adding the name of x-axis

plt.ylabel("Salaries") # adding the name of y-axis
plt.show() # specifies end of graph

#plot for the TEST

plt.scatter(X_test, y_test, color='red')

plt.plot(X_train, regressor.predict(X_train), color='blue') # plotting the regression line
plt.title("Salary vs Experience (Testing set)")

plt.xlabel("Years of experience")
plt.ylabel("Salaries")
plt.show()

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 130

OUTPUT:

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 131

RESULT:

Thus, the program for building and validating linear models has been
studied, executed and output has been verified successfully.

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 132

Ex. No: 11 LOGISTIC REGRESSIONS: HANDWRITING RECOGNITION
Date:

AIM:
To write the program for Logistic Regressions: Handwriting Recognition.
EXPLANATION:
Logistic Regression is a Machine Learning algorithm used to make predictions to find the
value of a dependent variable such as the condition of a tumour (malignant or benign),
classification of email (spam or not spam), or admission into a university (admitted or not
admitted) by learning from independent variables. Logistic Regression is a supervised
Machine Learning algorithm, which means the data provided for training is labelled i.e.,
answers are already provided in the training set. The algorithm learns from those examples
and their corresponding answers (labels) and then uses that to classify new examples. In
mathematical terms, suppose the dependent variable is Y and the set of independent
variables is X, then logistic regression will predict the dependent variable P(Y=1) as a
function of X, the set of independent variables.
It is a technique to analyse a data-set which has a dependent variable and one or more
independent variables to predict the outcome in a binary variable, meaning it will have only
two outcomes. The dependent variable is categorical in nature. Dependent variable is also
referred as target variable and the independent variables are called the predictors. Logistic
regression is a special case of linear regression where we only predict the outcome in a
categorical variable. It predicts the probability of the event using the log function. We use the
Sigmoid function/curve to predict the categorical value. The threshold value decides the
outcome(win/lose). Linear regression equation: y = β0 + β1X1 + β2X2 …. + βnXn.
 Y stands for the dependent variable that needs to be predicted.
 β0 is the Y-intercept, which is basically the point on the line which touches the y-axis.
 β1 is the slope of the line (the slope can be negative or positive depending on the
relationship between the dependent variable and the independent variable.)
 X here represents the independent variable that is used to predict our resultant
dependent value.
Sigmoid function: p = 1 / 1 + e-y. Apply sigmoid function on the linear regression equation.

The goal is to find the logistic regression function 𝑝(𝐱) such that the predicted responses
𝑝(𝐱ᵢ) are as close as possible to the actual response 𝑦ᵢ for each observation 𝑖 = 1, …, 𝑛.
AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 133
means that each 𝑝(𝐱ᵢ) should be close to either 0 or 1. That’s why it’s convenient to use the
Remember that the actual response can be only 0 or 1 in binary classification problems! This

sigmoid function. Once you have the logistic regression function 𝑝(𝐱), you can use it to
predict the outputs for new and unseen inputs, assuming that the underlying mathematical
dependence is unchanged.
METHODOLOGY:
Logistic regression is a linear classifier, so you’ll use a linear function 𝑓𝐱
( ) = 𝑏₀ + 𝑏₁𝑥₁ + ⋯ +
𝑏ᵣ 𝑥ᵣ , also called the logit. The variables 𝑏₀, 𝑏₁, …, 𝑏ᵣ are the estimators of the regression

regression function 𝑝(𝐱) is the sigmoid function of 𝑓𝐱 ( ): 𝑝(𝐱) = 1 / (1 + exp(−𝑓𝐱

coefficients, which are also called the predicted weights or just coefficients. The logistic

it’s often close to either 0 or 1. The function 𝑝(𝐱) is often interpreted as the predicted
( )). As such,

probability that the output for a given 𝐱 is equal to 1. Therefore, 1 − 𝑝( 𝑥) is the probability
that the output is 0. Logistic regression determines the best predicted weights 𝑏₀, 𝑏₁, …, 𝑏ᵣ
such that the function 𝑝(𝐱) is as close as possible to all actual responses 𝑦ᵢ, 𝑖 = 1, …, 𝑛,
where 𝑛 is the number of observations. The process of calculating the best weights using
available observations is called model training or fitting. To get the best weights, you usually
maximize the log-likelihood function (LLF) for all observations 𝑖 = 1, …, 𝑛. This method is

log(𝑝(𝐱ᵢ)) + (1 − 𝑦ᵢ) log(1 − 𝑝(𝐱ᵢ))).

called the maximum likelihood estimation and is represented by the equation LLF = Σᵢ( 𝑦ᵢ

When 𝑦ᵢ = 0, the LLF for the corresponding observation is equal to log(1 − 𝑝( 𝐱ᵢ)). If 𝑝( 𝐱ᵢ) is
close to 𝑦ᵢ = 0, then log(1 − 𝑝(𝐱ᵢ)) is close to 0. This is the result you want. If 𝑝( 𝐱ᵢ) is far from
0, then log(1 − 𝑝(𝐱ᵢ)) drops significantly. You don’t want that result because your goal is to
obtain the maximum LLF. Similarly, when 𝑦ᵢ = 1, the LLF for that observation is 𝑦ᵢ log(𝑝( 𝐱ᵢ)).
If 𝑝(𝐱ᵢ) is close to 𝑦ᵢ = 1, then log(𝑝(𝐱ᵢ)) is close to 0. If 𝑝(𝐱ᵢ) is far from 1, then log(𝑝(𝐱ᵢ)) is a
large negative number. Once you determine the best weights that define the function 𝑝(𝐱),
you can get the predicted outputs 𝑝(𝐱ᵢ) for any given input 𝐱ᵢ. For each observation 𝑖 = 1,
…, 𝑛, the predicted output is 1 if 𝑝(𝐱ᵢ) > 0.5 and 0 otherwise. The threshold doesn’t have to be

your situation. There’s one more important relationship between 𝑝(𝐱) and 𝑓𝐱
0.5, but it usually is. You might define a lower or higher value if that’s more convenient for

log(𝑝(𝐱) / (1 − 𝑝(𝐱))) = 𝑓𝐱 ( ). This equality explains why 𝑓𝐱 ( ) is the logit. It implies that 𝑝( 𝐱)
( ), which is that

= 0.5 when 𝑓(𝑓 ) = 0 and that the predicted output is 1 if 𝑓 (𝐱 ) > 0 and 0 otherwise.
CLASSIFICATION PERFORMANCE:
Binary classification has four possible types of results:
 True negatives: correctly predicted negatives (zeros)
 True positives: correctly predicted positives (ones)
 False negatives: incorrectly predicted negatives (zeros)
 False positives: incorrectly predicted positives (ones)
The most straightforward indicator of classification accuracy is the ratio of the number of
correct predictions to the total number of predictions (or observations). Other indicators of
binary classifiers include the following:

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 134

 The positive predictive value is the ratio of the number of true positives to the sum of
the numbers of true and false positives.
 The negative predictive value is the ratio of the number of true negatives to the sum of
the numbers of true and false negatives.
 The sensitivity (also known as recall or true positive rate) is the ratio of the number of
true positives to the number of actual positives.
 The specificity (or true negative rate) is the ratio of the number of true negatives to the
number of actual negatives.
 The most suitable indicator depends on the problem of interest. In this tutorial, you’ll
use the most straightforward form of classification accuracy.
This example is about image recognition. To be more precise, you’ll work on the recognition
of handwritten digits. You’ll use a dataset with 1797 observations, each of which is an image
of one handwritten digit. Each image has 64 px, with a width of 8 px and a height of 8 px.
The inputs (𝐱 ) are vectors with 64 dimensions or values. Each input vector describes one
image. Each of the 64 values represents one pixel of the image. The input values are the
integers between 0 and 16, depending on the shade of gray for the corresponding pixel. The
output (𝑦) for each observation is an integer between 0 and 9, consistent with the digit on the
image. There are ten classes in total, each corresponding to one image.

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 135

PROGRAM:
Import Packages:
import matplotlib.pyplot as plt
import numpy as np
from sklearn.datasets import load_digits
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
Get Data:
x, y = load_digits(return_X_y=True)
x
y
Split Data:
x_train, x_test, y_train, y_test =\
train_test_split(x, y, test_size=0.2, random_state=0)
Scale Data:
scaler = StandardScaler()
x_train = scaler.fit_transform(x_train)
Create a Model and Train It:
LogisticRegression(C=0.05, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, l1_ratio=None, max_iter=100,
multi_class='ovr', n_jobs=None, penalty='l2', random_state=0,
solver='liblinear', tol=0.0001, verbose=0, warm_start=False)
Evaluate the Model:
x_test = scaler.transform(x_test)
y_pred = model.predict(x_test)
model.score(x_train, y_train)
model.score(x_test, y_test)

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 136

confusion_matrix(y_test, y_pred)
Visualization:
cm = confusion_matrix(y_test, y_pred)
fig, ax = plt.subplots(figsize=(8, 8))
ax.imshow(cm)
ax.grid(False)
ax.set_xlabel('Predicted outputs', fontsize=font_size, color='black')
ax.set_ylabel('Actual outputs', fontsize=font_size, color='black')
ax.xaxis.set(ticks=range(10))
ax.yaxis.set(ticks=range(10))
ax.set_ylim(9.5, -0.5)
for i in range(10):
for j in range(10):
ax.text(j, i, cm[i, j], ha='center', va='center', color='white')
plt.show()
Classification report:
print(classification_report(y_test, y_pred))

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 137

OUTPUT:
x
array([[ 0., 0., 5., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 10., 0., 0.],
[ 0., 0., 0., ..., 16., 9., 0.],
...,
[ 0., 0., 1., ..., 6., 0., 0.],
[ 0., 0., 2., ..., 12., 0., 0.],
[ 0., 0., 10., ..., 12., 1., 0.]])
Y
array([0, 1, 2, ..., 8, 9, 8])

array([[27, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 32, 0, 0, 0, 0, 1, 0, 1, 1],
[ 1, 1, 33, 1, 0, 0, 0, 0, 0, 0],
[ 0, 0, 1, 28, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 29, 0, 0, 1, 0, 0],
[ 0, 0, 0, 0, 0, 39, 0, 0, 0, 1],
[ 0, 1, 0, 0, 0, 0, 43, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 39, 0, 0],
[ 0, 2, 1, 2, 0, 0, 0, 1, 33, 0],
[ 0, 0, 0, 1, 0, 1, 0, 2, 1, 36]])

Classification Report

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 138

RESULT:

Thus, the program for Logistic Regressions: Handwriting Recognition has been
studied, executed and output has been verified successfully.

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 139

EX.NO:12 TIME SERIES ANALYSIS
DATE:

AIM:
To write the python program for Time Series Analysis.

EXPLANATION:
Time series is a sequence of observations recorded at regular time intervals. Depending on
the frequency of observations, a time series may typically be hourly, daily, weekly, monthly,
quarterly and annual. Sometimes, you might have seconds and minute-wise time series as
well, like, number of clicks and user visits every minute etc. Time series analysis comprises
methods for analyzing time series data in order to extract meaningful statistics and other
characteristics of the data. Time series forecasting is the use of a model to predict future
values based on previously observed values. Time series are widely used for non-stationary
data, like economic, weather, stock price, and retail sales in this post.
DATASET: Superstore sales data
There are several categories in the Superstore sales data, we start from time series analysis
and forecasting for furniture sales.
DATA PREPROCESSING:
This step includes removing columns we do not need, check missing values, aggregate sales
by date and so on.
INDEXING WITH TIME SERIES DATA:
Our current datetime data can be tricky to work with, therefore, we will use the averages
daily sales value for that month instead, and we are using the start of each month as the
timestamp.
VISUALIZING FURNITURE SALES TIME SERIES DATA:
Some distinguishable patterns appear when we plot the data. The time-series has seasonality
pattern, such as sales are always low at the beginning of the year and high at the end of the
year. There is always an upward trend within any single year with a couple of low months in
the mid of the year. We can also visualize our data using a method called time-series
decomposition that allows us to decompose our time series into three distinct components:
trend, seasonality, and noise.
TIME SERIES FORECASTING WITH ARIMA:
The most commonly used method for time-series forecasting, known as ARIMA, which stands
for Autoregressive Integrated Moving Average. ARIMA models are denoted with the notation
ARIMA(p, d, q). These three parameters account for seasonality, trend, and noise in data: This
step is parameter Selection for our furniture’s sales ARIMA Time Series Model. Our goal here

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 140

is to use a “grid search” to find the optimal set of parameters that yields the best performance
for our model.
VALIDATING FORECASTS:
To help us understand the accuracy of our forecasts, we compare predicted sales to real sales
of the time series, and we set forecasts to start at 2017–01–01 to the end of the data. The line
plot is showing the observed values compared to the rolling forecast predictions. Overall, our
forecasts align with the true values very well, showing an upward trend starts from the
beginning of the year and captured the seasonality toward the end of the year.
DATA EXPLORATION:
We are going to compare two categories’ sales in the same time period. This means combine
two data frames into one and plot these two categories’ time series into one plot.
TIME SERIES MODELING WITH PROPHET:
Released by Facebook in 2017, forecasting tool Prophet is designed for analyzing time-series
that display patterns on different time scales such as yearly, weekly and daily. It also has
advanced capabilities for modeling the effects of holidays on a time-series and implementing
custom changepoints. Therefore, we are using Prophet to get a model up and running.

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 141

PROGRAM:

import warnings
import itertools
import numpy as np
import matplotlib.pyplot as plt
warnings.filterwarnings("ignore")
plt.style.use('fivethirtyeight')
import pandas as pd
import statsmodels.api as sm
import matplotlib
matplotlib.rcParams['axes.labelsize'] = 14
matplotlib.rcParams['xtick.labelsize'] = 12
matplotlib.rcParams['ytick.labelsize'] = 12
matplotlib.rcParams['text.color'] = 'k'
df = pd.read_excel("Superstore.xls")
furniture = df.loc[df['Category'] == 'Furniture']
furniture['Order Date'].min(), furniture['Order Date'].max()
cols = ['Row ID', 'Order ID', 'Ship Date', 'Ship Mode', 'Customer ID', 'Customer Name', 'Segment
', 'Country', 'City', 'State', 'Postal Code', 'Region', 'Product ID', 'Category', 'Sub-Category', 'Prod
uct Name', 'Quantity', 'Discount', 'Profit']
furniture.drop(cols, axis=1, inplace=True)
furniture = furniture.sort_values('Order Date')
furniture.isnull().sum()
furniture = furniture.groupby('Order Date')['Sales'].sum().reset_index()
furniture = furniture.groupby('Order Date')['Sales'].sum().reset_index()
furniture = furniture.set_index('Order Date')
furniture.index
y = furniture['Sales'].resample('MS').mean()
y['2017':]
y.plot(figsize=(15, 6))
plt.show()
from pylab import rcParams
rcParams['figure.figsize'] = 18, 8
decomposition = sm.tsa.seasonal_decompose(y, model='additive')
fig = decomposition.plot()
plt.show()
p = d = q = range(0, 2)
pdq = list(itertools.product(p, d, q))
seasonal_pdq = [(x[0], x[1], x[2], 12) for x in list(itertools.product(p, d, q))]
print('Examples of parameter combinations for Seasonal ARIMA...')
print('SARIMAX: {} x {}'.format(pdq[1], seasonal_pdq[1]))
print('SARIMAX: {} x {}'.format(pdq[1], seasonal_pdq[2]))
print('SARIMAX: {} x {}'.format(pdq[2], seasonal_pdq[3]))
print('SARIMAX: {} x {}'.format(pdq[2], seasonal_pdq[4]))

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 142

for param in pdq:
for param_seasonal in seasonal_pdq:
try:
mod = sm.tsa.statespace.SARIMAX(y,
order=param,
seasonal_order=param_seasonal,
enforce_stationarity=False,
enforce_invertibility=False)
results = mod.fit()
print('ARIMA{}x{}12 - AIC:{}'.format(param, param_seasonal, results.aic))
except:
continue
mod = sm.tsa.statespace.SARIMAX(y,
order=(1, 1, 1),
seasonal_order=(1, 1, 0, 12),
enforce_stationarity=False,
enforce_invertibility=False)
results = mod.fit()
print(results.summary().tables[1])
results.plot_diagnostics(figsize=(16, 8))
plt.show()
pred = results.get_prediction(start=pd.to_datetime('2017-01-01'), dynamic=False)
pred_ci = pred.conf_int()
ax = y['2014':].plot(label='observed')
pred.predicted_mean.plot(ax=ax, label='One-step ahead Forecast', alpha=.7, figsize=(14, 7))
ax.fill_between(pred_ci.index,
pred_ci.iloc[:, 0],
pred_ci.iloc[:, 1], color='k', alpha=.2)
ax.set_xlabel('Date')
ax.set_ylabel('Furniture Sales')
plt.legend()
plt.show()
y_forecasted = pred.predicted_mean
y_truth = y['2017-01-01':]
mse = ((y_forecasted - y_truth) ** 2).mean()
print('The Mean Squared Error of our forecasts is {}'.format(round(mse, 2)))
print('The Root Mean Squared Error of our forecasts is {}'.format(round(np.sqrt(mse), 2)))
pred_uc = results.get_forecast(steps=100)
pred_ci = pred_uc.conf_int()
ax = y.plot(label='observed', figsize=(14, 7))
pred_uc.predicted_mean.plot(ax=ax, label='Forecast')
ax.fill_between(pred_ci.index,
pred_ci.iloc[:, 0],
pred_ci.iloc[:, 1], color='k', alpha=.25)
ax.set_xlabel('Date')
ax.set_ylabel('Furniture Sales')
plt.legend()

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 143

plt.show()
furniture = df.loc[df['Category'] == 'Furniture']
office = df.loc[df['Category'] == 'Office Supplies']
furniture.shape, office.shape
cols = ['Row ID', 'Order ID', 'Ship Date', 'Ship Mode', 'Customer ID', 'Customer Name', 'Segment
', 'Country', 'City', 'State', 'Postal Code', 'Region', 'Product ID', 'Category', 'Sub-Category', 'Prod
uct Name', 'Quantity', 'Discount', 'Profit']
furniture.drop(cols, axis=1, inplace=True)
office.drop(cols, axis=1, inplace=True)
furniture = furniture.sort_values('Order Date')
office = office.sort_values('Order Date')
furniture = furniture.groupby('Order Date')['Sales'].sum().reset_index()
office = office.groupby('Order Date')['Sales'].sum().reset_index()
furniture = furniture.set_index('Order Date')
office = office.set_index('Order Date')
y_furniture = furniture['Sales'].resample('MS').mean()
y_office = office['Sales'].resample('MS').mean()
furniture = pd.DataFrame({'Order Date':y_furniture.index, 'Sales':y_furniture.values})
office = pd.DataFrame({'Order Date': y_office.index, 'Sales': y_office.values})
store = furniture.merge(office, how='inner', on='Order Date')
store.rename(columns={'Sales_x': 'furniture_sales', 'Sales_y': 'office_sales'}, inplace=True)
store.head()
plt.figure(figsize=(20, 8))
plt.plot(store['Order Date'], store['furniture_sales'], 'b-', label = 'furniture')
plt.plot(store['Order Date'], store['office_sales'], 'r-', label = 'office supplies')
plt.xlabel('Date'); plt.ylabel('Sales'); plt.title('Sales of Furniture and Office Supplies')
plt.legend();
irst_date = store.loc[np.min(list(np.where(store['office_sales'] > store['furniture_sales'])[0])),
'Order Date']
print("Office supplies first time produced higher sales than furniture is {}.".format(first_date.d
ate()))
from fbprophet import Prophet
furniture = furniture.rename(columns={'Order Date': 'ds', 'Sales': 'y'})
furniture_model = Prophet(interval_width=0.95)
furniture_model.fit(furniture)
office = office.rename(columns={'Order Date': 'ds', 'Sales': 'y'})
office_model = Prophet(interval_width=0.95)
office_model.fit(office)
furniture_forecast = furniture_model.make_future_dataframe(periods=36, freq='MS')
furniture_forecast = furniture_model.predict(furniture_forecast)
office_forecast = office_model.make_future_dataframe(periods=36, freq='MS')
office_forecast = office_model.predict(office_forecast)
plt.figure(figsize=(18, 6))
furniture_model.plot(furniture_forecast, xlabel = 'Date', ylabel = 'Sales')
plt.title('Furniture Sales');
plt.figure(figsize=(18, 6))
office_model.plot(office_forecast, xlabel = 'Date', ylabel = 'Sales')
plt.title('Office Supplies Sales');

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 144

furniture_names = ['furniture_%s' % column for column in furniture_forecast.columns]
office_names = ['office_%s' % column for column in office_forecast.columns]
merge_furniture_forecast = furniture_forecast.copy()
merge_office_forecast = office_forecast.copy()
merge_furniture_forecast.columns = furniture_names
merge_office_forecast.columns = office_names
forecast = pd.merge(merge_furniture_forecast, merge_office_forecast, how = 'inner', left_on = '
furniture_ds', right_on = 'office_ds')
forecast = forecast.rename(columns={'furniture_ds': 'Date'}).drop('office_ds', axis=1)
forecast.head()
plt.figure(figsize=(10, 7))
plt.plot(forecast['Date'], forecast['furniture_trend'], 'b-')
plt.plot(forecast['Date'], forecast['office_trend'], 'r-')
plt.legend(); plt.xlabel('Date'); plt.ylabel('Sales')
plt.title('Furniture vs. Office Supplies Sales Trend');
plt.figure(figsize=(10, 7))
plt.plot(forecast['Date'], forecast['furniture_yhat'], 'b-')
plt.plot(forecast['Date'], forecast['office_yhat'], 'r-')
plt.legend(); plt.xlabel('Date'); plt.ylabel('Sales')
plt.title('Furniture vs. Office Supplies Estimate');
furniture_model.plot_components(furniture_forecast);
office_model.plot_components(office_forecast);

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 145

OUTPUT:
Timestamp(‘2014–01–06 00:00:00’), Timestamp(‘2017–12–30 00:00:00’)

Data Preprocessing

Indexing with Time Series Data:

DatetimeIndex(['2014-01-06', '2014-01-07', '2014-01-10', '2014-01-11',
'2014-01-13', '2014-01-14', '2014-01-16', '2014-01-19',
'2014-01-20', '2014-01-21',
...
'2017-12-18', '2017-12-19', '2017-12-21', '2017-12-22',
'2017-12-23', '2017-12-24', '2017-12-25', '2017-12-28',
'2017-12-29', '2017-12-30'],
dtype='datetime64[ns]', name='Order Date', length=889,
freq=None)

2017 furniture sales data:

Order Date
2017-01-01 397.602133
2017-02-01 528.179800
2017-03-01 544.672240
2017-04-01 453.297905
2017-05-01 678.302328
2017-06-01 826.460291
2017-07-01 562.524857
2017-08-01 857.881889
2017-09-01 1209.508583
2017-10-01 875.362728
2017-11-01 1277.817759
2017-12-01 1256.298672
Freq: MS, Name: Sales, dtype: float64

Visualizing Furniture Sales Time Series Data

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 146

Time series forecasting with ARIMA:
Examples of parameter combinations for Seasonal ARIMA...
SARIMAX: (0, 0, 1) x (0, 0, 1, 12)
SARIMAX: (0, 0, 1) x (0, 1, 0, 12)
SARIMAX: (0, 1, 0) x (0, 1, 1, 12)
SARIMAX: (0, 1, 0) x (1, 0, 0, 12)

ARIMA(0, 0, 1)x(0, 0, 1, 12)12 - AIC:2931.4459685689417

ARIMA(0, 0, 1)x(0, 1, 0, 12)12 - AIC:466.5607429809145
/usr/local/lib/python3.7/dist-packages/statsmodels/base/model.py:512: ConvergenceWarning:
Maximum Likelihood optimization failed to converge. Check mle_retvals
"Check mle_retvals", ConvergenceWarning)
/usr/local/lib/python3.7/dist-packages/statsmodels/base/model.py:512: ConvergenceWarning:
Maximum Likelihood optimization failed to converge. Check mle_retvals
"Check mle_retvals", ConvergenceWarning)
ARIMA(0, 0, 1)x(1, 0, 0, 12)12 - AIC:499.588499811078
ARIMA(0, 0, 1)x(1, 0, 1, 12)12 - AIC:2578.407685878101
ARIMA(0, 0, 1)x(1, 1, 0, 12)12 - AIC:319.9884876946868
ARIMA(0, 1, 0)x(0, 0, 0, 12)12 - AIC:677.8947668259312
ARIMA(0, 1, 0)x(0, 0, 1, 12)12 - AIC:1363.5571341107245
ARIMA(0, 1, 0)x(0, 1, 0, 12)12 - AIC:486.6378567269187
ARIMA(0, 1, 0)x(1, 0, 0, 12)12 - AIC:497.78896630044073
/usr/local/lib/python3.7/dist-packages/statsmodels/base/model.py:512: ConvergenceWarning:
Maximum Likelihood optimization failed to converge. Check mle_retvals
"Check mle_retvals", ConvergenceWarning)
ARIMA(0, 1, 0)x(1, 0, 1, 12)12 - AIC:1379.5770594611533
ARIMA(0, 1, 0)x(1, 1, 0, 12)12 - AIC:319.7714068109212
ARIMA(0, 1, 1)x(0, 0, 0, 12)12 - AIC:649.9056176817331
ARIMA(0, 1, 1)x(0, 0, 1, 12)12 - AIC:2704.9650459821123
ARIMA(0, 1, 1)x(0, 1, 0, 12)12 - AIC:458.87055484827687
ARIMA(0, 1, 1)x(1, 0, 0, 12)12 - AIC:486.18329774425456
ARIMA(0, 1, 1)x(1, 0, 1, 12)12 - AIC:2560.808670239328
ARIMA(0, 1, 1)x(1, 1, 0, 12)12 - AIC:310.75743684172687
ARIMA(1, 0, 0)x(0, 0, 0, 12)12 - AIC:692.1645522067713
ARIMA(1, 0, 0)x(0, 0, 1, 12)12 - AIC:1355.136316958002
ARIMA(1, 0, 0)x(0, 1, 0, 12)12 - AIC:479.4632147852136
/usr/local/lib/python3.7/dist-packages/statsmodels/base/model.py:512: ConvergenceWarning:
Maximum Likelihood optimization failed to converge. Check mle_retvals
"Check mle_retvals", ConvergenceWarning)

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 147

ARIMA(1, 0, 0)x(1, 0, 0, 12)12 - AIC:480.92593679352154
ARIMA(1, 0, 0)x(1, 0, 1, 12)12 - AIC:1334.896860563096
/usr/local/lib/python3.7/dist-packages/statsmodels/base/model.py:512: ConvergenceWarning:
Maximum Likelihood optimization failed to converge. Check mle_retvals
"Check mle_retvals", ConvergenceWarning)
ARIMA(1, 0, 0)x(1, 1, 0, 12)12 - AIC:304.4664675084565
ARIMA(1, 0, 1)x(0, 0, 0, 12)12 - AIC:665.7794442185481
ARIMA(1, 0, 1)x(0, 0, 1, 12)12 - AIC:82103.26964285906
ARIMA(1, 0, 1)x(0, 1, 0, 12)12 - AIC:468.36851958149913
ARIMA(1, 0, 1)x(1, 0, 0, 12)12 - AIC:482.5763323876879
ARIMA(1, 0, 1)x(1, 0, 1, 12)12 - AIC:2519.493065167048
ARIMA(1, 0, 1)x(1, 1, 0, 12)12 - AIC:306.0156002122771
ARIMA(1, 1, 0)x(0, 0, 0, 12)12 - AIC:671.2513547541902
ARIMA(1, 1, 0)x(0, 0, 1, 12)12 - AIC:1345.8589896655533
/usr/local/lib/python3.7/dist-packages/statsmodels/base/model.py:512: ConvergenceWarning:
Maximum Likelihood optimization failed to converge. Check mle_retvals
"Check mle_retvals", ConvergenceWarning)
ARIMA(1, 1, 0)x(0, 1, 0, 12)12 - AIC:479.2003422281136
ARIMA(1, 1, 0)x(1, 0, 0, 12)12 - AIC:475.34036587860555
ARIMA(1, 1, 0)x(1, 0, 1, 12)12 - AIC:1912.1819232761209
ARIMA(1, 1, 0)x(1, 1, 0, 12)12 - AIC:300.6270901345412
ARIMA(1, 1, 1)x(0, 0, 0, 12)12 - AIC:649.0318019835189
ARIMA(1, 1, 1)x(0, 0, 1, 12)12 - AIC:2516.1759453415243
ARIMA(1, 1, 1)x(0, 1, 0, 12)12 - AIC:460.4762687609516
/usr/local/lib/python3.7/dist-packages/statsmodels/base/model.py:512: ConvergenceWarning:
Maximum Likelihood optimization failed to converge. Check mle_retvals
"Check mle_retvals", ConvergenceWarning)
ARIMA(1, 1, 1)x(1, 0, 0, 12)12 - AIC:469.5250354660858
ARIMA(1, 1, 1)x(1, 0, 1, 12)12 - AIC:nan
/usr/local/lib/python3.7/dist-packages/statsmodels/base/model.py:512: ConvergenceWarning:
Maximum Likelihood optimization failed to converge. Check mle_retvals
"Check mle_retvals", ConvergenceWarning)
ARIMA(1, 1, 1)x(1, 1, 0, 12)12 - AIC:297.78754395474454

============================================================================
==
coef std err z P>|z| [0.025
0.975]
----------------------------------------------------------------------------
--
ar.L1 0.0146 0.342 0.043 0.966 -0.655
0.684
ma.L1 -1.0000 0.360 -2.781 0.005 -1.705 -
0.295
ar.S.L12 -0.0253 0.042 -0.609 0.543 -0.107
0.056
sigma2 2.958e+04 1.22e-05 2.43e+09 0.000 2.96e+04
2.96e+04

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 148

Validating forecasts

Mean Squared Error of our forecasts:

The Mean Squared Error of our forecasts is 22993.58

Root Mean Squared Error of our forecasts:

The Root Mean Squared Error of our forecasts is 151.64

Producing and visualizing forecasts:

Data Exploration:
Order Date furniture_sales office_sales
0 2014-01-01 480.194231 285.357647
1 2014-02-01 367.931600 63.042588

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 149

2 2014-03-01 857.291529 391.176318
3 2014-04-01 567.488357 464.794750
4 2014-05-01 432.049188 324.346545

Time Series Modeling with Prophet:

Compare Forecasts:

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 150

Trend and Forecast Visualization:

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 151

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 152
RESULT:
Thus, the program for Time Series Analysis has been studied, executed and output has
been verified successfully.

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY 153

Data Analytics Using Python Lab Manual
50% (2)
Data Analytics Using Python Lab Manual
8 pages
Data Analysis With STATA
100% (2)
Data Analysis With STATA
270 pages
SPSS - Practice Questions For Exam
50% (2)
SPSS - Practice Questions For Exam
7 pages
Fdsa Lab Manual Final
No ratings yet
Fdsa Lab Manual Final
70 pages
unit 5
No ratings yet
unit 5
20 pages
FOD Record Sem 1
No ratings yet
FOD Record Sem 1
25 pages
EX-02-Data Manipulation Pandas Matplot
No ratings yet
EX-02-Data Manipulation Pandas Matplot
9 pages
Ilovepdf Merged (2) Merged
No ratings yet
Ilovepdf Merged (2) Merged
65 pages
FDS Lab
No ratings yet
FDS Lab
43 pages
Fundamentals of Data Science Lab Manual New1
No ratings yet
Fundamentals of Data Science Lab Manual New1
32 pages
FDS Record-1-4
No ratings yet
FDS Record-1-4
18 pages
Programs
No ratings yet
Programs
8 pages
AD3301 DEV Lab Manual
No ratings yet
AD3301 DEV Lab Manual
26 pages
XII - Informatics Practices (LAB MANUAL)
100% (1)
XII - Informatics Practices (LAB MANUAL)
42 pages
EXP1-siddhant Gupta (23 - SE - 148)
No ratings yet
EXP1-siddhant Gupta (23 - SE - 148)
17 pages
Pds Record Document Ds II
No ratings yet
Pds Record Document Ds II
36 pages
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
No ratings yet
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
28 pages
Rudra Aiml 1.4
No ratings yet
Rudra Aiml 1.4
4 pages
Pythonfile
No ratings yet
Pythonfile
37 pages
Fundamentals of Data Science Lab Manual
No ratings yet
Fundamentals of Data Science Lab Manual
34 pages
Python Unit IV
No ratings yet
Python Unit IV
12 pages
1) Write A Pandas Program To Create and Display A One-Dimensional Array-Like Object Containing An Array of Data Using Pandas Module
No ratings yet
1) Write A Pandas Program To Create and Display A One-Dimensional Array-Like Object Containing An Array of Data Using Pandas Module
22 pages
11th PGM
No ratings yet
11th PGM
9 pages
Nishanrt Aiml1.4
No ratings yet
Nishanrt Aiml1.4
4 pages
CS3361-Data Science Lab Manual - B.rethina Kumar
No ratings yet
CS3361-Data Science Lab Manual - B.rethina Kumar
36 pages
Himanshu Aiml 1.4
No ratings yet
Himanshu Aiml 1.4
4 pages
Fundamentals of Data Science Lab Manual New
No ratings yet
Fundamentals of Data Science Lab Manual New
33 pages
IP Practical Record 2022-23
No ratings yet
IP Practical Record 2022-23
43 pages
Fods Lab Manual
No ratings yet
Fods Lab Manual
26 pages
Python Lab PRG
No ratings yet
Python Lab PRG
20 pages
4 Introduction To Python Part 3
No ratings yet
4 Introduction To Python Part 3
48 pages
b396a0c09d3f443d96040e69acf38aeb (2)
No ratings yet
b396a0c09d3f443d96040e69acf38aeb (2)
7 pages
Fdsa Lab Manual
No ratings yet
Fdsa Lab Manual
53 pages
Practical File Question 28.09.2022
No ratings yet
Practical File Question 28.09.2022
15 pages
QB - 22ADS35 (Python For Data Science)
No ratings yet
QB - 22ADS35 (Python For Data Science)
6 pages
Lab 7 Submission PPJ
No ratings yet
Lab 7 Submission PPJ
13 pages
Fds PDF
No ratings yet
Fds PDF
58 pages
Ge - Computer Science Data Analysis
No ratings yet
Ge - Computer Science Data Analysis
16 pages
Lucknow Public School - 20241201 - 220143 - 0000
No ratings yet
Lucknow Public School - 20241201 - 220143 - 0000
44 pages
Ds Lab-1
No ratings yet
Ds Lab-1
40 pages
ELE492 - ELE492 - Image Process Lecture Notes 5
No ratings yet
ELE492 - ELE492 - Image Process Lecture Notes 5
41 pages
Python Course Cheat Sheet
No ratings yet
Python Course Cheat Sheet
30 pages
Data Science
No ratings yet
Data Science
42 pages
Manual
No ratings yet
Manual
21 pages
Week 4 - Introduction To Python #3
No ratings yet
Week 4 - Introduction To Python #3
47 pages
Python Myssql Programs For Practical File Class 12 Ip
No ratings yet
Python Myssql Programs For Practical File Class 12 Ip
26 pages
OCS353-Data Science Fundamentals Manual 1
No ratings yet
OCS353-Data Science Fundamentals Manual 1
34 pages
Practical of R
No ratings yet
Practical of R
38 pages
DS - Lab Manual
No ratings yet
DS - Lab Manual
31 pages
Final Print
No ratings yet
Final Print
43 pages
IP Grade 12 Record
No ratings yet
IP Grade 12 Record
12 pages
Python Programming U5
No ratings yet
Python Programming U5
46 pages
Dfs Manual
No ratings yet
Dfs Manual
43 pages
R20 Ai&ds 2ND B.tech Cse Intro To Ai&ds Lab Manual
No ratings yet
R20 Ai&ds 2ND B.tech Cse Intro To Ai&ds Lab Manual
1 page
FDS Record
No ratings yet
FDS Record
59 pages
DS Practical
No ratings yet
DS Practical
30 pages
CLASS1
No ratings yet
CLASS1
7 pages
PANDAS
No ratings yet
PANDAS
24 pages
Fdsa Record Ai&Ds
No ratings yet
Fdsa Record Ai&Ds
26 pages
Section 7
No ratings yet
Section 7
33 pages
C Programs To Become Expert In Programming
From Everand
C Programs To Become Expert In Programming
Shubham Yadav
No ratings yet
Python For Beginners
From Everand
Python For Beginners
Célio Azevedo
No ratings yet
Machine-Learning Question Paper
No ratings yet
Machine-Learning Question Paper
5 pages
Vit Syllabus
No ratings yet
Vit Syllabus
3 pages
Sem 2 Syllabus
No ratings yet
Sem 2 Syllabus
6 pages
Ad3301 Set3
No ratings yet
Ad3301 Set3
2 pages
OS Lab - Lesson Plan
No ratings yet
OS Lab - Lesson Plan
4 pages
Exp 10
No ratings yet
Exp 10
6 pages
CCS370 Set3
No ratings yet
CCS370 Set3
2 pages
CP4152 Database Practices Lab
No ratings yet
CP4152 Database Practices Lab
49 pages
Adsa U4,3
No ratings yet
Adsa U4,3
4 pages
Smart Soil Nutrition Monitoring System For Sustainable Agriculture
No ratings yet
Smart Soil Nutrition Monitoring System For Sustainable Agriculture
25 pages
Adsa U2,1
100% (1)
Adsa U2,1
8 pages
Rohini - 54944938803 Heap
No ratings yet
Rohini - 54944938803 Heap
6 pages
Adsa U5
No ratings yet
Adsa U5
6 pages
Rohini - 50243532261 2
No ratings yet
Rohini - 50243532261 2
15 pages
cs3351 Dpco Unit 4
No ratings yet
cs3351 Dpco Unit 4
26 pages
Unit Iii Computer Fundamentals
No ratings yet
Unit Iii Computer Fundamentals
37 pages
Rohini 15970278123
No ratings yet
Rohini 15970278123
15 pages
Heron
No ratings yet
Heron
563 pages
Basic Statistical Concepts and Methods
100% (1)
Basic Statistical Concepts and Methods
122 pages
University of Toronto Scarborough STAB22 Final Examination: December 2009
No ratings yet
University of Toronto Scarborough STAB22 Final Examination: December 2009
18 pages
Program: MBA Semester-III Course: Syndicated Learning Program (SLP-3) Academic Year: 2023-24 Department of Marketing & Strategy IBS, IFHE, Hyderabad
No ratings yet
Program: MBA Semester-III Course: Syndicated Learning Program (SLP-3) Academic Year: 2023-24 Department of Marketing & Strategy IBS, IFHE, Hyderabad
81 pages
(FREE PDF Sample) Statistical Tools For Environmental Quality Measurement 1st Edition Douglas E. Splitstone Ebooks
100% (3)
(FREE PDF Sample) Statistical Tools For Environmental Quality Measurement 1st Edition Douglas E. Splitstone Ebooks
72 pages
Modelling and Stats Guide
No ratings yet
Modelling and Stats Guide
60 pages
15-The Effects of Various Variables On University Students' Writer's Block Levels
No ratings yet
15-The Effects of Various Variables On University Students' Writer's Block Levels
12 pages
SMDM Project
100% (1)
SMDM Project
22 pages
PJMS 27 1 17
No ratings yet
PJMS 27 1 17
18 pages
Econometrics With Eviews: Chapter 02 Part 02 Simple Regression Model
No ratings yet
Econometrics With Eviews: Chapter 02 Part 02 Simple Regression Model
21 pages
Biostatistics Kmu Final 2020 With Key
67% (6)
Biostatistics Kmu Final 2020 With Key
7 pages
Hacking
100% (2)
Hacking
431 pages
Total Quality Improvement
No ratings yet
Total Quality Improvement
21 pages
Paired T Test
No ratings yet
Paired T Test
12 pages
232-Article Text-427-1-10-20240123
No ratings yet
232-Article Text-427-1-10-20240123
9 pages
Q-A Sigma Green Belt
No ratings yet
Q-A Sigma Green Belt
32 pages
Bauer Chapter 16 - Excel Extension 0
No ratings yet
Bauer Chapter 16 - Excel Extension 0
46 pages
Lecture Note 1
No ratings yet
Lecture Note 1
99 pages
Amos Et Al 2012 Factors That Were Found To Influence Ghanaian Adolescents Eating Habits
No ratings yet
Amos Et Al 2012 Factors That Were Found To Influence Ghanaian Adolescents Eating Habits
6 pages
The Use of Cubing Technique To Teach Writing Recount Text
No ratings yet
The Use of Cubing Technique To Teach Writing Recount Text
9 pages
Hypothesis
100% (1)
Hypothesis
61 pages
Nutrition
No ratings yet
Nutrition
15 pages
Session 2-8 Enhancing The Performance of Grade VI-C Pupils in Mathematics in Buyagan Elementary School Through The Use of Arts in Math (AIM)
100% (1)
Session 2-8 Enhancing The Performance of Grade VI-C Pupils in Mathematics in Buyagan Elementary School Through The Use of Arts in Math (AIM)
9 pages
Selection Criteria For Islamic Home Financing: A Case Study of Pakistan
No ratings yet
Selection Criteria For Islamic Home Financing: A Case Study of Pakistan
14 pages
Report Last Sarah
No ratings yet
Report Last Sarah
59 pages
Probability Statistics With R For Engineers and Scientists 1st Edition Michael Akritas - The Complete Ebook Version Is Now Available For Download
100% (2)
Probability Statistics With R For Engineers and Scientists 1st Edition Michael Akritas - The Complete Ebook Version Is Now Available For Download
77 pages
NPar Tests
No ratings yet
NPar Tests
16 pages
SHS Stat Proba Q4 For Print 40 Pages v2
No ratings yet
SHS Stat Proba Q4 For Print 40 Pages v2
40 pages