0% found this document useful (0 votes)

69 views49 pages

Lab Mannual

The document provides a syllabus for a data science and analytics laboratory course. It outlines the course objectives, tools used, suggested exercises covering topics like working with data frames, plotting, distributions, regression, hypothesis testing and time series. It also lists the hardware and software requirements.

Uploaded by

vickyakfan152002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views49 pages

Lab Mannual

Uploaded by

vickyakfan152002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 49

SYLLABUS

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY LTPC

COURSE OBJECTIVES 0042

• To develop data analytic code in python
• To be able to me python libraries for handling data
• To develop analytical applications song python
• To perform data visualization sang plats

Tools: Python, Numpy, Scipy, Matplotlib, Pandus, statinodels, seuhorn, plotly, Ivich

SUGGESTED EXERCISE:

1. Working with Pandas data frames.

2. Basic plots using Matplotlib.
3. Frequency distributions. Averages, Variability.
4. Normal curves, Correlation and scatter plots, Correlation coefficient.
5. Regression.
6.Z-test.
7.T-test.
8.ANOVA.
9. Building and validating foscar models.
10. Building and validating logistic models.
11.Time Series Analysis.

TOTAL: 60 PERIODS
HARDWARE
• Standalone Desktops with Windows DS

SOFTWARE
• Python with statistical Packages
Name of the Page Staff
S. Date Marks
Experiment No. Signature
No (100)

Tools:Python, Numpy, Scipy, Matplotib, Pandas,Statmodels,

Seaborn,Plotly,Bokeh,working with Numpy arrays

Working with Pandas

1
data frame

Basic Plots using

2
Matplotlib
Frequency
3 distributors, Averages,
Variability
Normal Curves,
Correlation and scatter
4 plots, Correlation
coefficient

5 Regression

Z-test
6

T-test
7

Anova
8

Building and
9 validating linear
models
Building and
validating logistic
10 models

Time series analysis

11
Experiment No:1
WORKING WITH PANDAS DATA FRAMES
Date:

AIM:

To work with Pandas data frames

ALGORITHM:

Step1: Start
Step2: import numpy and pandas module
Step3: Create a dataframe using the dictionary
Step4: Print the output
Step5: Stop

PROGRAM:
import pandas as pd data = {"calories": [420, 380, 390], "duration": [50, 40, 45]}
#load data into a DataFrame object:
df = pd.DataFrame(data) print (df.loc[0])
OUTPUT:
calories 420 duration 50
Name: 0, dtype: int64

RESULT:

Thus the working with Pandas data frames was successfully completed.
Experiment No: 2
BASIC PLOTS USING MATPLOTLIB
Date :

AIM:
To draw basic plots in Python program using Matplotlib

ALGORITHM:
Step1: Start
Step2: import Matplotlib module
Step3: Create a Basic plots using Matplotlib
Step4: Print the output
Step5: Stop
PROGRAM:
import matplotlib.pyplot as plt a = [1, 2, 3, 4, 5] b = [0,
0.6, 0.2, 15, 10, 8, 16, 21] plt.plot(a)
# o is for circles and r is
# for red plt.plot(b, "or") plt.plot(list(range(0, 22,
3))) # naming the x-axis plt.xlabel('Day ->')
# naming the y-axis plt.ylabel('Temp ->') c =
[4, 2, 6, 8, 3, 20, 13, 15] plt.plot(c, label =
'4th Rep') # get current axes command ax =
plt.gca()
# get command over the individual # boundary line of the
graph body ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False) # set the range or the
bounds of # the left boundary line to fixed range
ax.spines['left'].set_bounds(-3, 40)
# set the interval by which # the x-axis set the marks
plt.xticks(list(range(-3, 10))) # set the
intervals by which y-axis

# set the marks

plt.yticks(list(range(-3, 20, 3)))
# legend denotes that what color
# signifies what ax.legend(['1st Rep', '2nd Rep', '3rd Rep', '4th Rep'])
# annotate command helps to write
# ON THE GRAPH any text xy denotes # the position on the graph
plt.annotate('Temperature V / s Days', xy = (1.01, -2.15))
# gives a title to the Graph plt.title('All Features Discussed')
plt.show()
OUTPUT:

RESULT:

Thus the basic plots using Matplotlib in Python program was successfully completed.
Experiment No: 3a
FREQUENCY DISTRIBUTIONS
Date :

AIM:

To Count the frequency of occurrence of a word in a body of text is often needed during text
processing.

ALGORITHM:

Step 1: Start the Program

Step 2: Create text file blake-poems.txt
Step 3: Import the word_tokenize function and gutenberg
Step 4: Write the code to count the frequency of occurrence of a word in a body of text
Step 5: Print the result
Step 6: Stop the process
PROGRAM:

from nltk.tokenize import word_tokenize

from nltk.corpus import gutenberg

sample = gutenberg.raw("blake-poems.txt")

token = word_tokenize(sample)
wlist = []

for i in range(50):
wlist.append(token[i])

wordfreq = [wlist.count(w) for w in wlist]

print("Pairs\n" + str(zip(token, wordfreq)))
OUTPUT:

[([', 1), (Poems', 1), (by', 1), (William', 1), (Blake', 1), (1789', 1), (]', 1), (SONGS', 2),
(OF', 3),
(INNOCENCE', 2), (AND', 1), (OF', 3), (EXPERIENCE', 1), (and', 1), (THE', 1),
(BOOK', 1), (of', 2),
(THEL', 1), (SONGS', 2), (OF', 3), (INNOCENCE', 2), (INTRODUCTION', 1),
(Piping', 2), (down', 1),
(the', 1), (valleys', 1), (wild', 1), (,', 3), (Piping', 2), (songs', 1), (of', 2), (pleasant', 1),
(glee', 1), (,', 3),
(On', 1), (a', 2), (cloud', 1), (I', 1), (saw', 1), (a', 2), (child', 1), (,', 3), (And', 1), (he', 1),
(laughing', 1), (said', 1), (to', 1), (me', 1), (:', 1), (``', 1)]

RESULT:
Thus the compute weighted averages in Python either defining your own functions
or using Numpy was successfully completed.
Experiment No: 3b AVERAGES

Date :

AIM:
To compute weighted averages in Python either defining your own functions or using Numpy

ALGORITHM :

Step 1: Start the Program

Step 2: Create the employees_salary table and save as .csv file
Step 3: Import packages (pandas and numpy) and the employees_salary table itself:
Step 4: Calculate weighted sum and average using Numpy Average() Function
Step 5 : Stop the process

PROGRAM:

#Method Using Numpy Average() Function

weighted_avg_m3 = round(average( df['salary_p_year'], weights =

df['employees_number']),2)

weighted_avg_m3
OUTPUT:

44225.35

RESULT:
Thus the compute weighted averages in Python either defining your own functions
or using Numpy was successfully completed.
Experiment No: 3c VARIABILITY

Date :

AIM:

To write a python program to calculate the variance.

ALGORITHM :

Step 1: Start the Program

Step 2: Import statistics module from statistics import variance
Step 3: Import fractions as parameter values from fractions import Fraction as fr
Step 4: Create tuple of a set of positive and negative numbers
Step 5: Print the variance of each samples
Step 6: Stop the process
PROGRAM:

# Python code to demonstrate variance()

# function on varying range of data-types

# importing statistics module

from statistics import variance

# importing fractions as parameter values

from fractions import Fraction as fr

# tuple of a set of positive integers

# numbers are spread apart but not
very much
sample1 = (1, 2, 5, 4, 8, 9, 12)

# tuple of a set of negative integers

sample2 = (-2, -4, -3, -1, -5, -6)

# tuple of a set of positive and negative

numbers # data-points are spread apart
considerably sample3 = (-9, -1, -0, 2, 1,
3, 4, 19)

# tuple of a set of fractional numbers

sample4 = (fr(1, 2), fr(2, 3), fr(3, 4), fr(5, 6), fr(7, 8))

# tuple of a set of floating point values

sample5 = (1.23, 1.45, 2.1, 2.2, 1.9)
# Print the variance of each samples
print("Variance of Sample1 is % s "
%(variance(sample1))) print("Variance of
Sample2 is % s " %(variance(sample2)))
print("Variance of Sample3 is % s "
%(variance(sample3))) print("Variance of
Sample4 is % s " %(variance(sample4)))
print("Variance of Sample5 is % s "
%(variance(sample5)))
OUTPUT:

Variance of Sample 1 is 15.80952380952381

Variance of Sample 2 is 3.5
Variance of Sample 3 is 61.125
Variance of Sample 4 is 1/45
Variance of Sample 5 is 0.17613000000000006

RESULT:

Thus the computation for variance was successfully completed.

Experiment No: 4a
NORMAL CURVES
Date :

AIM:

To create a normal curve using python program.

To write a python program for correlation with scatter plot and compute correlation
coefficient

ALGORITHM:

Step 1: Start the Program

Step 2: Import packages scipy and call function scipy.stats
Step 3: Import packages numpy, matplotlib and seaborn
Step 4: Create the distribution
Step 5: Visualizing the distribution
Step 6: Stop the process
PROGRAM:

# import required
libraries from
scipy.stats import
norm import numpy
as np import
matplotlib.pyplot as
plt
import seaborn as sb

# Creating the
distribution data =
np.arange(1,10,0.01)
pdf = norm.pdf(data , loc = 5.3 , scale = 1 )

#Visualizing the distribution

sb.set_style('whitegrid')
sb.lineplot(data, pdf , color =
'black') plt.xlabel('Heights')
plt.ylabel('Probability
Density')
OUTPUT:

RESULT:

Thus the normal curve using python program was successfully completed.
Experiment No: 4b
CORRELATION AND SCATTER PLOTS
Date :

AIM:

To write a python program for correlation with scatter plot .

ALGORITHM :

Step 1: Start the Program

Step 2: Create variable y1, y2
Step 3: Create variable x, y3 using random function
Step 4: plot the scatter plot
Step 5: Print the result
Step 6: Stop the process
PROGRAM:

# Scatterplot and Correlations

# Data

x-pp random
randn(100)
yl=x*5+9 y2=-
5°x
y3=no_random.randn(100)

#Plot

plt.reParams update('figure figsize' (10,8), 'figure dpi¹:100}) plt scatter(x,

yl, label=fyl, Correlation = {np.round(np.corrcoef(x,y1)[0,1], 2)}) plt
scatter(x, y2, label=fy2 Correlation = (np.round(np.corrcoef(x,y2)[0,1],
2)}) plt scatter(x, y3, label=fy3 Correlation =
(np.round(np.corrcoef(x,y3)[0,1], 2)})

# Plot

plt titlef('Scatterplot and Correlations')

plt(legend)
plt(show)
OUTPUT:

RESULT:

Thus the Correlation and scatter plots using python program was successfully completed.
Experiment No: 4c
CORRELATION COEFFICIENT
Date :

AIM:
To write a python program to compute correlation coefficient.

ALGORITHM :

Step 1: Start the Program

Step 2: Import math package
Step 3: Define correlation coefficient function
Step 4: Calculate correlation using formula
Step 5:Print the result
Step 6 : Stop the process
PROGRAM:

# Python Program to find correlation coefficient.

import math

# function that returns correlation

coefficient. def
correlationCoefficient(X, Y, n) :
sum_X = 0 sum_Y = 0
sum_XY = 0 squareSum_X = 0
squareSum_Y = 0

i=
0
whi
le i
<n
:
# sum of elements of array X.
sum_X = sum_X + X[i]

# sum of elements of array Y.

sum_Y = sum_Y + Y[i]

# sum of X[i] * Y[i].

sum_XY = sum_XY + X[i] * Y[i]

# sum of square of array elements.

squareSum_X = squareSum_X + X[i] * X[i]
squareSum_Y = squareSum_Y + Y[i] * Y[i] i=i
+1

# use formula for calculating

correlation # coefficient.
corr = (float)(n * sum_XY - sum_X *
sum_Y)/ (float)(math.sqrt((n *
squareSum_X - sum_X *
sum_X)* (n * squareSum_Y -
sum_Y * sum_Y)))
return corr

# Driver function
X = [15, 18, 21, 24, 27]
Y = [25, 25, 27, 31, 32]
# Find the size
of array. n =
len(X)

# Function call to correlationCoefficient.

print
('{0:.6f}'.format(correlationCoefficient(X,
Y, n)))
OUTPUT:

0.953463

RESULT:
Thus the computation for correlation coefficient was Successfully completed
Experiment No: 5
REGRESSION
Date :

AIM:

To write a python program for Simple Linear Regression .

ALGORITHM :

Step 1: Start the Program

Step 2: Import numpy and matplotlib package
Step 3: Define coefficient function
Step 4: Calculate cross-deviation and deviation about x
Step 5: Calculate regression coefficients
Step 6: Plot the Linear regression and define main function
Step 7: Print the result
Step 8: Stop the process
PROGRAM:

import numpy as np
import matplotlib.pyplot as plt

def estimate_coef(x, y):

# number of observations/points
n = np.size(x)

# mean of x
and y vector m_x =
np.mean(x) m_y =
np.mean(y)

# calculating cross-deviation and deviation about x

SS_xy = np.sum(y*x) - n*m_y*m_x
SS_xx = np.sum(x*x) - n*m_x*m_x

# calculating regression
coefficients b_1 = SS_xy /
SS_xx b_0 = m_y -
b_1*m_x

return (b_0, b_1)

def plot_regression_line(x, y, b):

# plotting the actual points as scatter
plot plt.scatter(x, y, color = "m",
marker = "o", s = 30)

# predicted response
vector y_pred = b[0] + b[1]*x

# plotting the regression line

plt.plot(x, y_pred, color = "g")

# putting labels
plt.xlabel('x')
plt.ylabel('y')

# function to show plot

plt.show()
def main():
# observations / data
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])
# estimating
coefficients b =
estimate_coef(x, y)
print("Estimated coefficients:\nb_0 = {} \
\nb_1 = {}".format(b[0], b[1]))

# plotting regression line

plot_regression_line(x, y, b)
if __name__ ==
"__main__":
main()
OUTPUT:

Estimated
coefficients: b_0
=-
0.0586206896552
b_1 =
1.45747126437

Graph:

RESULT:
Thus the computation for Simple Linear Regression was successfully completed.
Experiment No: 6 Z-TEST
Date :

AIM:

To Perform Z-test

ALGORITHM:

Stepl: Start
Step2: Import math, numpy, stats models & z-test
Step3: Create a list &Print the z-test list
Step4: Stop
PROGRAM:

# imports import math import numpy as np from numpy.random

import randn from statsmodels.stats.weightstats import ztest
# Generate a random array of 50 numbers having mean 110 and sd 15
# similar to the IQ scores data we assume above mean_iq = 110
sd_iq = 15/math.sqrt(50) alpha = 0.05 null_mean =100 data =
sd_iq*randn(50)+mean_iq
# print mean and sd print('mean=%.2f stdv=%.2f' % (np.mean(data),
np.std(data)))
# now we perform the test. In this function, we passed data, in the value parameter
# we passed mean value in the null hypothesis, in alternative hypothesis we check
whether the
# mean is larger ztest_Score,p_value=ztest(data,value=null_mean,alternative='la
rger')
# the function outputs a p_value and z-score corresponding to that value, we
compare the
# p-value with alpha, if it is greater than alpha then we do not null hypothesis # else
we reject it. if(p_value < alpha):
print("Reject Null Hypothesis")

else:
print("Fail to Reject NUll Hypothesis")
OUTPUT:

Reject Null Hypothesis

RESULT:
Thus the program for Z-Test case studies has been executed and verified successfully.
Experiment No: 7
T-TEST
Date :

AIM:
To Perform T-test for sampling distribution.

ALGORITHM:

Stepl: Start
Step2: Import random &numpy
Step3: Calculate the standard deviation
Step4: Stop
PROGRAM:

# Importing the required libraries and packages import numpy as np

from scipy import stats
# Defining two random distributions
# Sample Size
N = 10
# Gaussian distributed data with mean = 2 and var = 1 x = np.random.randn(N) + 2
# Gaussian distributed data with mean = 0 and var = 1 y = np.random.randn(N)
# Calculating the Standard Deviation
# Calculating the variance to get the standard deviation var_x = x.var(ddof = 1)
var_y = y.var(ddof = 1)
# Standard Deviation
SD = np.sqrt((var_x + var_y) / 2) print("Standard Deviation =", SD) #
Calculating the T-Statistics tval = (x.mean() - y.mean()) / (SD *
np.sqrt(2 / N))
# Comparing with the critical T-Value
# Degrees of freedom dof = 2 * N - 2
# p-value after comparison with the T-Statistics pval = 1 - stats.t.cdf( tval,
df = dof) print("t = " + str(tval)) print("p = " + str(2 * pval))
## Cross Checking using the internal function from SciPy Packa ge tval2, pval2 =
stats.ttest_ind(x, y) print("t = " + str(tval2)) print("p = " + str(pval2))
OUTPUT:
Standard Deviation = 0.7642398582227466
t = 4.87688162540348
p = 0.0001212767169695983 t = 4.876881625403479
p = 0.00012127671696957205

RESULT:
Thus the program for T-test case studies has been executed and verified successfully.
Experiment No: 8
ANOVA
Date :

AIM:
To Perform ANOVA test.

ALGORITHM:
Stepl: Start
Step2: Import scipy
Step3: Import statsmodels
Step4: Calculate ANOVA F and p value
Step 5: Stop
PROGRAM:

# Installing the package install.packages("dplyr")

# Loading the package library(dplyr)
# Variance in mean within group and between group
boxplot(mtcars$disp~factor(mtcars$gear), xlab = "gear", ylab
= "disp")
# Step 1: Setup Null Hypothesis and Alternate Hypothesis
# H0 = mu = mu01 = mu02 (There is no difference
# between average displacement for different gear)
# H1 = Not all means are equal
# Step 2: Calculate test statistics using aov function mtcars_aov <-
aov(mtcars$disp~factor(mtcars$gear)) summary(mtcars_aov)
# Step 3: Calculate F-Critical Value
# For 0.05 Significant value, critical value = alpha = 0.05
# Step 4: Compare test statistics with F-Critical value
# and conclude test p <alpha, Reject Null Hypothesis
OUTPUT:

RESULT:
Thus the program for ANOVA case studies has been executed and verified successfully.
Experiment No: 9
BUILDING AND VALIDATING LINEAR MODELS
Date :

AIM:
To Perform Linear Regression

ALGORITHM

Stepl: Start
Step2: Import numpy.pandas,seaborn,matplotlib&sklearn
Step3: calculate linear regression using the appropriate functions
Step4: display the result
Step 5: Stop
PROGRAM:

# Importing the necessary libraries import pandas as pd import

numpy as np import matplotlib.pyplot as plt import
seaborn as sns from sklearn.datasets import load_boston
sns.set(style=”ticks”,color_codes=True)
plt.rcParams[‘figure.figsize’] = (8,5)
plt.rcParams[‘figure.dpi’] = 150
# loading the databoston = load_boston() You can check those keys with the
following code. print(boston.keys()) The output will be as follow:
dict_keys([‘data’, ‘target’, ‘feature_names’, ‘DESCR’,
‘filename’]) print(boston.DESCR)

You will find these details in output:

Attribute Information (in order):
— CRIM per capita crime rate by town
— ZN proportion of residential land zoned for lots over 25,000 sq.ft. — INDUS
proportion of non-retail business acres per town
— CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
— NOX nitric oxides concentration (parts per 10 million)
— RM average number of rooms per dwelling
— AGE proportion of owner-occupied units built prior to 1940
— DIS weighted distances to five Boston employment centres
— RAD index of accessibility to radial highways
— TAX full-value property-tax rate per $10,000
— PTRATIO pupil-teacher ratio by town
— B 1000 (Bk — 0.63)² where Bk is the proportion of blacks by town
— LSTAT % lower status of the population
— MEDV Median value of owner-occupied homes in $1000’s :Missing
Attribute Values: None
df=pd.DataFrame(boston.data,columns=boston.feature_names) df.head()
# print the columns present in the dataset print(df.columns)
# print the top 5 rows in the dataset print(df.head())
OUTPUT:
First five records from data set

#plotting heatmap for overall data setsns.heatmap(df.corr(), square=True, cmap=’RdYlGn’)

Heat map of overall data set

So let’s plot a regression plot to see the correlation between RM and MEDV.
sns.lmplot(x = ‘RM’, y = ‘MEDV’, data = df)

Regression plot with RM and MEDV

RESULT:
Thus the program for Linear Regression has been executed and verified successfully.
Experiment No: 10
BUILDING AND VALIDATING LOGISTICS MODELS
Date :

AIM:
To Perform Logistic Regression

ALGORITHM:

Stepl: Start
Step2: Import numpy.pandas,seaborn,matplotlib&sklearn
Step3: Calculate logistic regression using the appropriate functions
Step4: Display the result
Step 5: Stop

PROGRAM:
Building the Logistic Regression model:

# importing libraries import statsmodels.api as sm

import pandas as pd
# loading the training dataset df = pd.read_csv('logit_train1.csv', index_col = 0)
# defining the dependent and independent variables Xtrain =
df[['gmat', 'gpa', 'work_experience']] ytrain = df[['admitted']]
# building the model and fitting the data log_reg = sm.Logit(ytrain,
Xtrain).fit()
OUTPUT :
Optimization terminated successfully.
Current function value: 0.352707 Iterations 8
# printing the summary table print(log_reg.summary())

Logit Regression Results

=============================================================
Dep. Variable: admitted No. Observations: 30
Model: Logit Df Residuals: 27
Method: MLE Df Model: 2
Date: Wed, 15 Jul 2020 Pseudo R-squ.: 0.4912
Time: 16:09:17 Log-Likelihood: -10.581
converged: True LL-Null: -20.794
Covariance Type: nonrobust LLR p-value: 3.668e-05
=============================================================
===
coef std err z P>|z| [0.025 0.975]
-----------------------------------------------------------------------------------
gmat -0.0262 0.011 -2.383 0.017 -0.048 -0.005
gpa 3.9422 1.964 2.007 0.045 0.092 7.792

work_experience 1.1983 0.482 2.487 0.013 0.254 2.143

Predicting on New Data :

# loading the testing dataset df = pd.read_csv('logit_test1.csv', index_col = 0) #

defining the dependent and independent variables Xtest = df[['gmat', 'gpa',
'work_experience']] ytest = df['admitted']
# performing predictions on the test dataset yhat = log_reg.predict(Xtest)
prediction = list(map(round, yhat))
# comparing original and predicted values of y print('Actual values', list(ytest.values))
print('Predictions :', prediction)

OUTPUT:
Optimization terminated successfully.
Current function value: 0.352707
Iterations 8
Actual values [0, 0, 0, 0, 0, 1, 1, 0, 1, 1]
Predictions : [0, 0, 0, 0, 0, 0, 0, 0, 1, 1]
Testing the accuracy of the model :

from sklearn.metrics import (confusion_matrix, accuracy_score)

# confusion matrix cm = confusion_matrix(ytest, prediction) print ("Confusion Matrix :
\n", cm) # accuracy score of the model print('Test accuracy = ',
accuracy_score(ytest, prediction))

OUTPUT:

Confusion Matrix :
[[6 0]
[2 2]]
Test accuracy = 0.8

RESULT:
Thus the program for Logistics Regression has been executed and verified successfully
Experiment No: 11
TIME SERIES ANALYSIS

Date:

AIM:

To Perform Time series analysis.

ALGORITHM:

Step1: Start Time Series Analysis

Step2: Import numpy.pandas, matplotlib&seaborn

Step3: draw the plot

Step4: display the plo

Step 5: Stop
PROGRAM:

We are using Superstore sales data .

import warnings import itertools import numpy as np import matplotlib.pyplot as plt
warnings.filterwarnings("ignore") plt.style.use('fivethirtyeight') import pandas
as pd import statsmodels.api as sm import
matplotlibmatplotlib.rcParams['axes.labelsize'] = 14
matplotlib.rcParams['xtick.labelsize'] = 12 matplotlib.rcParams['ytick.labelsize']
= 12 matplotlib.rcParams['text.color'] = 'k'

We start from time series analysis and forecasting for furniture sales.
df=pd.read_excel("Superstore.xls") furniture = df.loc[df['Category'] ==
'Furniture'] A good 4-year furniture sales data.
furniture['Order Date'].min(), furniture['Order Date'].max()
Timestamp(‘2014–01–06 00:00:00’), Timestamp(‘2017–12–30 00:00:00’)

Data Preprocessing
This step includes removing columns we do not need, check missing values, aggregate sales by
date and so on.
cols = ['Row ID', 'Order ID', 'Ship Date', 'Ship Mode', 'Customer ID',
'Customer Name', 'Segment', 'Country', 'City', 'State', 'Postal Code', 'Region', 'Product ID',
'Category', 'Sub-Category', 'Product Name', 'Quantity', 'Discount', 'Profit']

furniture.drop(cols,axis=1,inplace=True) furniture=furniture.sort_values('Order
Date')furniture.isnull().sum()
furniture=furniture.groupby('OrderDate')['Sales'].sum().reset_ index()

Order Date 0
Sales dtype: int64 0

Figure 1
Indexing with Time Series Data furniture=furniture.set_index('OrderDate')
furniture.index

Figure 2

We will use the averages daily sales value for that month instead, and we are using the start of each
month as the timestamp.
y = furniture['Sales'].resample('MS').mean() Have a quick peek
2017 furniture sales data.
y['2017':]

Figure 3
OUTPUT:

Visualizing Furniture Sales Time Series Data

y.plot(figsize=(15,6)) plt.show()

RESULT:

Thus the program for Time series analysis has been executed and verified successfully

Application Handbook USTER Statistics 2013
100% (2)
Application Handbook USTER Statistics 2013
38 pages
Manual Lista Parametros MD
No ratings yet
Manual Lista Parametros MD
1,264 pages
Foundation Design Principles and Practices 3rd Edition by Coduto Kitch Yeung ISBN Solution Manual
100% (46)
Foundation Design Principles and Practices 3rd Edition by Coduto Kitch Yeung ISBN Solution Manual
16 pages
2018 Biosatics MCQ
100% (4)
2018 Biosatics MCQ
33 pages
Introduction To Sampling Methods/Theory
No ratings yet
Introduction To Sampling Methods/Theory
34 pages
Iso 29201 2012 en PDF
100% (1)
Iso 29201 2012 en PDF
11 pages
BOGE Trinity Controller 596.1031 - EN - 201001 Trinity
No ratings yet
BOGE Trinity Controller 596.1031 - EN - 201001 Trinity
16 pages
An Introduction To Bayesian Data Analysis
No ratings yet
An Introduction To Bayesian Data Analysis
916 pages
Dsf-Pyt-Lab Manual
No ratings yet
Dsf-Pyt-Lab Manual
50 pages
Guidlines For Use of The Scaled Span Method For Surface Crown Pillar Stability Assessment
No ratings yet
Guidlines For Use of The Scaled Span Method For Surface Crown Pillar Stability Assessment
34 pages
B. Delineating A Region
No ratings yet
B. Delineating A Region
17 pages
AD3411 DATA SCIENCE AND ANALYTICS LAB (2) - Removed
No ratings yet
AD3411 DATA SCIENCE AND ANALYTICS LAB (2) - Removed
24 pages
FDSA Lab Manual
No ratings yet
FDSA Lab Manual
32 pages
Dsf-Pyt-Lab Manual
No ratings yet
Dsf-Pyt-Lab Manual
54 pages
Lab Manual (DAV)
No ratings yet
Lab Manual (DAV)
33 pages
Fdsa Record Ai&Ds
No ratings yet
Fdsa Record Ai&Ds
26 pages
Data Science & Analytics Lab Manual
No ratings yet
Data Science & Analytics Lab Manual
39 pages
CSI Sap2000 Bridge Examples
No ratings yet
CSI Sap2000 Bridge Examples
78 pages
Maxw GS 2020R2 en Le01
No ratings yet
Maxw GS 2020R2 en Le01
63 pages
Modeling Mindsets
No ratings yet
Modeling Mindsets
113 pages
Data Science Experiments
No ratings yet
Data Science Experiments
31 pages
Lab Manual Data Science
No ratings yet
Lab Manual Data Science
24 pages
FDSA Lab Manual Aim Algorithm
No ratings yet
FDSA Lab Manual Aim Algorithm
32 pages
AD3411
No ratings yet
AD3411
28 pages
FDSA Lab Manual
No ratings yet
FDSA Lab Manual
31 pages
Practical File Physics
No ratings yet
Practical File Physics
11 pages
Pds Record Document Ds II
No ratings yet
Pds Record Document Ds II
36 pages
Abhishek Lodhi
No ratings yet
Abhishek Lodhi
18 pages
Applied Python Programming (Cycle-1) - 1
No ratings yet
Applied Python Programming (Cycle-1) - 1
26 pages
DSF Lab Exp Full
No ratings yet
DSF Lab Exp Full
88 pages
Ad8311 Data Science Laboratory
No ratings yet
Ad8311 Data Science Laboratory
17 pages
Manual
No ratings yet
Manual
21 pages
Dsa Lab
No ratings yet
Dsa Lab
28 pages
ML Lab Manual
No ratings yet
ML Lab Manual
37 pages
697e9176-7141-4407-ac59-183e04ddf458
No ratings yet
697e9176-7141-4407-ac59-183e04ddf458
44 pages
KJD ML File
No ratings yet
KJD ML File
45 pages
Fundamentals of Data Science Lab Manual
No ratings yet
Fundamentals of Data Science Lab Manual
34 pages
FDSA Lab Manual
No ratings yet
FDSA Lab Manual
32 pages
Data Science Algorithmen Master - 02 Data Handling
No ratings yet
Data Science Algorithmen Master - 02 Data Handling
76 pages
Data Science and Analtics Laboratory
No ratings yet
Data Science and Analtics Laboratory
21 pages
DSA Lab Manual Pgms - fINAL
No ratings yet
DSA Lab Manual Pgms - fINAL
34 pages
2014 Wellmann Tough-2 PDF
No ratings yet
2014 Wellmann Tough-2 PDF
15 pages
Cycle 1 Programs
No ratings yet
Cycle 1 Programs
20 pages
AD3411 - 1 To 5
No ratings yet
AD3411 - 1 To 5
11 pages
FDSA Lab Manual
No ratings yet
FDSA Lab Manual
27 pages
DS - Lab Manual
No ratings yet
DS - Lab Manual
31 pages
11th PGM
No ratings yet
11th PGM
9 pages
FDS Lab
No ratings yet
FDS Lab
43 pages
Fdsa Lab Manual Final
No ratings yet
Fdsa Lab Manual Final
70 pages
Ad3411 - Data Science and Analytics Laboratory
No ratings yet
Ad3411 - Data Science and Analytics Laboratory
26 pages
Python Lab PRG
No ratings yet
Python Lab PRG
20 pages
Urhsndhrvanakwhzbxhdak
No ratings yet
Urhsndhrvanakwhzbxhdak
10 pages
Dsa Lab Manual
No ratings yet
Dsa Lab Manual
17 pages
Ad3411-Data Science and Analytics Laboratory
No ratings yet
Ad3411-Data Science and Analytics Laboratory
27 pages
Fundamentals of Data Science Lab Manual New
No ratings yet
Fundamentals of Data Science Lab Manual New
33 pages
Amandeep Daa
No ratings yet
Amandeep Daa
9 pages
FDS Lab 1 Manuel .1..1new
No ratings yet
FDS Lab 1 Manuel .1..1new
38 pages
Ankit Python
No ratings yet
Ankit Python
26 pages
ML (Sudhanshu)
No ratings yet
ML (Sudhanshu)
24 pages
ML Lab Manual
No ratings yet
ML Lab Manual
28 pages
Wind Resource Assessment - Coursera2
100% (1)
Wind Resource Assessment - Coursera2
1 page
AIML Lab Manual
No ratings yet
AIML Lab Manual
39 pages
FDS Lab 1 Manuel .1..1new
No ratings yet
FDS Lab 1 Manuel .1..1new
34 pages
Data Science Fundamentals Lab
No ratings yet
Data Science Fundamentals Lab
24 pages
Class X - A.I. - Practical Lab Manual - VVA 2024-25
No ratings yet
Class X - A.I. - Practical Lab Manual - VVA 2024-25
50 pages
10 Supplier Sampling GEN IP1001 10-11-13
No ratings yet
10 Supplier Sampling GEN IP1001 10-11-13
8 pages
Dal Programs With Output
No ratings yet
Dal Programs With Output
11 pages
FDSA Lab Record
No ratings yet
FDSA Lab Record
30 pages
10749-Article Text PDF-30111-3-10-20180322 PDF
No ratings yet
10749-Article Text PDF-30111-3-10-20180322 PDF
13 pages
Fds Mannual
No ratings yet
Fds Mannual
39 pages
Astm D5457
No ratings yet
Astm D5457
9 pages
3.bayesian Modeling
No ratings yet
3.bayesian Modeling
13 pages
Your Roll No ..............
No ratings yet
Your Roll No ..............
6 pages
DA Lab ANSWERS
No ratings yet
DA Lab ANSWERS
10 pages
Soil Dynamics and Earthquake Engineering
100% (1)
Soil Dynamics and Earthquake Engineering
12 pages
Data Sci
No ratings yet
Data Sci
10 pages
Nishanrt Aiml1.4
No ratings yet
Nishanrt Aiml1.4
4 pages
Fundamentals of Data Science Lab Manual-5-26
No ratings yet
Fundamentals of Data Science Lab Manual-5-26
22 pages
Time Series Analysis Group 9
No ratings yet
Time Series Analysis Group 9
16 pages
Lesson Plan For DEMO 2019 Java Functions
No ratings yet
Lesson Plan For DEMO 2019 Java Functions
3 pages
HO4 Estimation
No ratings yet
HO4 Estimation
9 pages
ROLE of CBC in Diagnosis of Tonsillitis
No ratings yet
ROLE of CBC in Diagnosis of Tonsillitis
6 pages
Correlation Steps Load Runner
No ratings yet
Correlation Steps Load Runner
13 pages
Worksheet I: Statistics and Probability - Grade 11
No ratings yet
Worksheet I: Statistics and Probability - Grade 11
2 pages
Efficiency of The Philippine Stock Market
No ratings yet
Efficiency of The Philippine Stock Market
9 pages
Veritas d1.6.1 Final
No ratings yet
Veritas d1.6.1 Final
28 pages
High Speed Unwind Tester: MODEL HSU-2000 Operating Instructions
No ratings yet
High Speed Unwind Tester: MODEL HSU-2000 Operating Instructions
23 pages