Mini Project - Cold Storage Case Study
Mini Project - Cold Storage Case Study
Project Report
Table of Contents
1 Project Objective............................................................................................................................. 3
2 Assumptions.................................................................................................................................... 3
3 Exploratory Data Analysis – Step by step approach ....................................................................... 3
3.1 Environment Set up and Data Import ..................................................................................... 3
3.1.1 Install necessary Packages and Invoke Libraries ............................................................. 3
3.1.2 Set up working Directory ................................................................................................ 3
3.1.3 Import and Read the Dataset.......................................................................................... 4
3.2 Variable Identification ............................................................................................................ 4
3.2.1 Variable Identification – Inferences ................................................................................ 4
3.3 Univariate Analysis.................................................................................................................. 4
3.4 Bi-Variate Analysis .................................................................................................................. 5
3.5 Missing Value Identification ................................................................................................... 5
3.6 Outlier Identification............................................................................................................... 5
3.7 Variable Transformation / Feature Creation .......................................................................... 5
4 Conclusion....................................................................................................................................... 5
5 Appendix A – Source Code .............................................................................................................. 5
1 Project Objective
The objective of the report is to explore the cold storage data sets (“Cold_Storage_Mar2018” and
“Cold_Storage_Temp_Data (1)” ) in R and generate insights about the data set. This exploration
report will consist of the following:
2 Assumptions
Assumptions are as below:
1. We assume that the temperatures recorded are normally distributed.
2. The cold storage plant is careful about recording the temperatures everyday hence not
expecting null values.
3. The temperatures are always maintained at optimum level at the cold storage to avoid
contamination of products.
Although Steps 5 and 6 are not in scope for this project, a brief about these steps (and other
steps as well) is given, as these are important steps for Data Exploration journey.
5|Page
3.1.2 Set up working Directory
Setting a working directory on starting of the R session makes importing and exporting data
files and code files easier. Basically, working directory is the location/ folder on the PC where
you have the data, codes etc. related to the project.
Please refer Appendix A for Source Code.
1. The temperature of cold storage is given for each day through the entire year along with
month and season details.
2. “Temperature” is a Quantitative and continuous variable.
3. From the structure we see 2 Nominal- categorical variables are present as below:
a. “Season” variable is a factored variable having 3 level of factors naming Summer, Winter
and rainy.
b. “Month” variable is a factored variable having 12 levels of factors with all the month
names from January till December
6|Page
4. The minimum temperature maintained at Cold Storage is 1.7 and maximum temperature is
5.0
5. The number of days in each season is approximately the same.
6. There are no missing values in any of the columns in the dataset.
7. The Seasons variables gives info of all the seasons in the entire year
8. The month variable has data from Jan-DecThe date variable is having dates of respective
month
2. February has 28 days. Hence the year in which this data was recorded is a non-leap year.
7|Page
4. As the total number of observations are 365, temperature is recorded on all the days
throughout the year.
1. The mean temperature maintained at cold storage is more in summer and less in winter.
2. There are few abnormal temperatures recorded in Rainy and Winter season.
3. Temperature in summer is maintained at optimum value.
8|Page
4. The least and highest temperature is recorded in September during rainy season.
5. Below 2 C temperature was recorded twice in the month of September (Rainy Season) and
November (Winter Season)
8. Month vs Temperature
9|Page
9. Season vs Temperature
Temperature was recorded unevenly more in Winter
One outlier is found from the entire dataset in the temperature column.
4 Conclusion
Temperature at cold storage started falling apart from September month. September, October, January
are the months where there is a negligence that was noticed in the maintenance of temperature due to
which dairy products started getting contaminated. In summer and Rainy season no complaints would
have been received as there was due diligence except on one day in Rainy season.
The average temperature maintained at Cold storage is low in Winter and high in Summer.
Immediate attention is needed for placing the corrective measures at the Cold Storage plant to avoid
complaints from customers.
10 | P a g
e
5 Appendix A – Source Code
##
## Attaching package: 'dplyr'
library(ggplot2)
library(lattice)
Descriptive Statistics:
View(ColdStorage1)
dim(ColdStorage1)
## [1] 365 4
summary(ColdStorage1)
str(ColdStorage1)
11 | P a g
e
## $ Month : Factor w/ 12 levels "Apr","Aug","Dec",..: 5 5 5 5 5 5 5 5
5 5 ...
## $ Date : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Temperature: num 2.4 2.3 2.4 2.8 2.5 2.4 2.8 2.3 2.4 2.8 ...
head(ColdStorage1)
tail(ColdStorage1)
colnames(ColdStorage1)
anyNA(ColdStorage1)
## [1] FALSE
Exploratory Analysis:
a.Univariate Analysis:
boxplot(Temperature,xlab="Temperature",col="green",horizontal = TRUE,main="Te
mperature")
12 | P a g
e
There is only one outlier recorded considering the data in the entire year.
13 | P a g
e
plot(Season,col=c("green","blue","red"),main="Season Chart")
## Month
## Apr Aug Dec Feb Jan Jul Jun Mar May Nov Oct Sep
## 30 31 31 28 31 31 30 31 31 30 31 30
14 | P a g
e
table(Season)
## Season
## Rainy Summer Winter
## 122 120 123
b. Bivariate Analysis
boxplot(Temperature~Season,col=c("green","blue","red"))
The relative average temperature in winter is low and summer is high. There are few outliers noticed in
winter and Rainy season
15 | P a g
e
High Negligence is noticed in maintaining temperature at cold storage during winter season
In the month of Jan, Cold storage would have started receiving complaints because as we see there are
more outliers
16 | P a g
e
qplot(Month,Temperature,data=ColdStorage1,col=Season,Main="Temperature across
months")
qplot(Season,Temperature,data=ColdStorage1,col=Season)
17 | P a g
e
Assignment Questions:
1. Problem 1:
1.1 Find mean cold storage temperature for Summer, Winter and Rainy Season
WinterTemp=ColdStorage1%>% select(Season,Temperature) %>% filter(Season=="Win
ter")%>% summarise(mean=mean(Temperature))
SummerTemp=ColdStorage1%>% select(Season,Temperature) %>% filter(Season=="Sum
mer")%>% summarise(mean=mean(Temperature))
RainyTemp=ColdStorage1%>% select(Season,Temperature) %>% filter(Season=="Rain
y")%>% summarise(mean=mean(Temperature))
WinterTemp
## mean
## 1 2.700813
SummerTemp
## mean
## 1 3.153333
RainyTemp
## mean
## 1 3.039344
MeanforYear=mean(Temperature)
MeanforYear
## [1] 2.96274
SdforYear=sd(Temperature)
SdforYear
## [1] 0.508589
Ans: Standard deviation of the temperature though out the year is 0.508589
1.4 Assume Normal distribution, what is the probability of temperature having fallen
below 2 deg C?
lessThan2C=2
normMean=mean(ColdStorage1$Temperature)
normSD=sd(ColdStorage1$Temperature)
18 | P a g
e
below2Cprob=pnorm(lessThan2C,normMean,normSD)
below2Cprob
## [1] 0.02918146
1.5 Assume Normal distribution, what is the probability of temperature having gone
above 4 C?
moreThan4C=4
above4Cprob=1-pnorm(moreThan4C,normMean,normSD)
above4Cprob
## [1] 0.02070077
totalProb=below2Cprob+above4Cprob
Penalty=totalProb*100
Penalty
## [1] 4.988223
Ans: It was given that “probability of temperature going outside the 2 - 4 C during the one-year
contract was above 2.5% and less than 5% then the penalty would be 10% of AMC (annual
maintenance case).”
Since the calculated probability of temperature going outside 2 - 4 C is 4.988223 which is equal to
4.99% approximately, the penalty would be 10% of AMC (Annaual Maintenance Case)
Problem 2:
1.Which Hypothesis test shall be performed to check the if corrective action is needed at the
cold storage plant? Justify your answer?
Ans: We have to perform single sample right tail t-test to check if there is corrective action
needed at cold storage plant.
Single Sample: since we have only one variable called temperature. Right tail: since we are testing
hypothesis at greater sign T- test: since we have the sample mean, and n>30 and the population
standard deviation is not known. Hence we cannot perform Z t-test but only T- test.
19 | P a g
e
Alternate Hypothesis: This is something which are trying to prove or which is deviation from normal.
Here since the customers are complaning that dairy products are going sour and often smelling. This will
happen only if the temperature is maintained greater than 3.9
Hence Alternate Hypothesis: meanTemp>3.9
p-value Calculation:
setwd("C:/Users/ammu/Desktop/Great Lakes/2. Statistical Method Decision Makin
g/Project")
ColdStorage2=read.csv("Cold_Storage_Mar2018.csv")
attach(ColdStorage2)
MeanTemp2=mean(Temperature)
MeanTemp2
## [1] 3.974286
SdTemp2=sd(Temperature)
SdTemp2
## [1] 0.159674
n=35
alpha=0.1
confLevel=1-0.1
##
## One Sample t-test
##
## data: Temperature
## t = 2.7524, df = 34, p-value = 0.004711
## alternative hypothesis: true mean is greater than 3.9
## 90 percent confidence interval:
## 3.939011 Inf
## sample estimates:
## mean of x
## 3.974286
Ans: Here p-value calculated p-value is = 0.004711 and given alpha is =0.1
Since the p-value is low, null hypothesis will be rejected.
20 | P a g
e