Project 1 - Assignment: Cold Storage Case Study
Project 1 - Assignment: Cold Storage Case Study
Project 1 - Assignment: Cold Storage Case Study
List of Figures
Figure 1 - Box plot of Entire year Temp .......................................................................................... 4
Figure 2 - Histogram of entire year Temp....................................................................................... 4
Figure 3 - Seasonal Temp Variation Box plot .................................................................................. 5
Figure 4 - Monthly Avg Temp.......................................................................................................... 6
Figure 5 - Histogram of Temp - 35 days Summer ......................................................................... 10
Figure 6 - Box plot - 35 days Temp................................................................................................ 11
1. Project Objective & Background
Cold storage company which started business operations in 2016 has a strict policy of maintaining
the temperature at which the products such as Pasteurized Fresh Whole or Skimmed Milk, Sweet
Cream, Flavored Milk Drinks are stored in the cold room, between 2 degree Celsius (⁰C) and 4⁰C.
The plant maintenance is outsourced to a third-party professional company with stiff penalty
clauses. Also, there are customer complaints regarding the quality of products supplied from the
cold storage in the recent days.
In the first year of business they outsourced the plant maintenance work to a professional
company with stiff penalty clauses. It was agreed that if it was statistically proven that probability
of temperature going outside the 2 - 4 C during the one-year contract was above 2.5% (0.025)
and less than 5% (0.05) then the penalty would be 10% of Annual Maintenance Contract (AMC)
fee. In case it exceeded 5% (0.05) then the penalty would be 25% of the AMC fee.
Thus, the primary aim of this report is to
1. Identify the slab of penalty that is to be imposed on the maintenance company if statistically
proven that the probability of temperature variation
a. 10 % AMC
b. 25 % AMC
In Mar 2018, Cold Storage started getting complaints from their Clients that they have been
getting complaints from end consumers of the dairy products going sour and often smelling. On
getting these complaints, the data of last 35 days’ temperatures is fetched. In order to identify
corrective actions needed such as whether the commodities got spoilt due to temperature
variations in the cold storage facility or due to quality issues from the procurement side, the
secondary project objective is to
2. Conclude whether there is any corrective action required for the cold storage or the quality
issues lies at the procurement side by conducting a hypothesis testing and thus inferring the
result obtained.
*For research area 1, “Cold_Storage_Temp_Data.csv” is used and research area 2,
“Cold_Storage.csv” is used. The two data sheets are readily available for the entire duration of
the project.
Page | 1
2. Solutions
2.1 Problem 1
2.1.1 Assumptions
i. The data is distributed normally
ii. Data points are independent
iii. Temperature has no relation to other external factors
2.1.2 EDA
Setting up the working directories and importing the csv files and loading the libraries.
#setting working directory
setwd("C:/Users/bibin/OneDrive/Great Lakes/Statistical Methods for Decision
Making/A.PROJECT MAIN")
#loading libraries
library(tibble)
library(rpivotTable)
library(dplyr)
library(ggplot2)
#dimension of dataframes
dim(temp_data)
[1] 365 4
Page | 2
summary(temp_data)
Season Month Date Temperature
Rainy :122 Aug : 31 Min. : 1.00 Min. :1.700
Summer:120 Dec : 31 1st Qu.: 8.00 1st Qu.:2.500
Winter:123 Jan : 31 Median :16.00 Median :2.900
Jul : 31 Mean :15.72 Mean :2.963
Mar : 31 3rd Qu.:23.00 3rd Qu.:3.300
May : 31 Max. :31.00 Max. :5.000
(Other):179
Page | 3
main="Histogram of entire year tempearature")
# box plot of temperature - Season wise
plot(temp_data$Temperature~temp_data$Season, col=c(2:4))
Page | 4
Figure 3 - Seasonal Temp Variation Box plot
month_comp=temp_data %>%
group_by(Month) %>%
summarise(No.of.days=n(),
Average.temp=round(mean(Temperature),2))
ggplot(month_comp)+
geom_bar(aes(x=Month,y=Average.temp),
stat="identity", fill="tan1", colour="sienna3")+
geom_text(aes(label=Average.temp, x=month_comp$Month, y=0.2+Average.temp),
colour="red",vjust=1.5,size=3)+
theme(axis.text.x = element_text(angle = 0))+
ggtitle("Plot of Months vs Avg Temperature")
Page | 5
Figure 4 - Monthly Avg Temp.
Rainy=temp_data[which(temp_data$Season=="Rainy"),]
y=mean(Rainy$Temperature)
cat("Mean Temperature in Rains =",round(y,digits=5),"deg Celcius")
Winter=temp_data[which(temp_data$Season=="Winter"),]
Page | 6
z=mean(Winter$Temperature)
cat("Mean Temperature in Winter =",round(z,digits=5),"deg Celcius")
Mean Temperature in Winter = 2.70081 deg Celcius
b. Find overall mean for the full year (4 marks)
#getting mean from the complete temperature list
year.mean=mean(temp_data$Temperature)
cat("Mean Temperature of the entire year =",round(year.mean,digits=5),"deg
Celcius")
Mean Temperature of the entire year = 2.96274 deg Celcius
c. Find Standard Deviation for the full year (4 marks)
#getting sd from the complete temperature list
year.stddev=sd(temp_data$Temperature)
cat("Std. Dev. of Temperature during the entire year
=",round(year.stddev,digits=2),"deg Celcius")
Std. Dev. of Temperature during the entire year = 0.51 deg Celsius
d. Assume Normal distribution, what is the probability of temperature having fallen
below 2 C? (4 marks)
We have the values for mean and standard deviation from the previous calculations,
substituting these values in the pnorm function and getting the lower tail of the normal
distribution curve we can find the probability of temp. falling below 2 deg Celsius.
#probability that temperature is below 2 deg celsius
p1=pnorm(2,2.96274,0.508589,lower.tail=TRUE)
cat("Probability that mean temp is below 2 deg celcius
=",round(p1*100,digits=2),"%")
Probability that mean temp is below 2 deg celcius = 2.92 %
e. Assume Normal distribution, what is the probability of temperature having gone
above 4 C? (4 marks)
The probability that the temp. has gone above 4 deg Celsius is calculated as below, the
difference in calculation compared to the above calculation is that instead of the lower tail
the upper tail is considered and hence the subtraction from 1 is carried out.
Page | 7
#probability that temperature is above 4 deg celcius
p2=1-pnorm(4,2.96274,0.508589,lower.tail=TRUE)
cat("Probability that mean temp is above 4 deg celcius
=",round(p2*100,digits=2),"%")
Probability that mean temp is above 4 deg celcius = 2.07 %
f. What will be the penalty for the AMC Company? (5 marks)
The terms and conditions of the AMC as mentioned in the question are
a. 10 % penalty if the probability of temp. going below 2 deg C and above 4 deg C is
in between 2.5 % - 5 %
b. 25 % penalty if the probability of temp. going below 2 deg C and above 4 deg C is
greater than 5 %
Temperature falling
2.92%
below 2⁰C
Probability of
Temperature going above
2.07%
4⁰C
Total probability 4.9%
Page | 8
2.2 Solution Problem 2
2.2.1 Assumptions
i. The assumptions in this are similar to that of problem 1, there is no effect of outside
temperature on the recordings.
ii. Data collection is unbiased.
2.2.2 EDA
#Reading the second data set and storing it into data frame
temp.feb.mar=read.csv("Cold_Storage.csv",header=TRUE)
#dimension check
dim(temp.feb.mar)
[1] 35 4
#structure check
str(temp.feb.mar)
'data.frame': 35 obs. of 4 variables:
$ Season : Factor w/ 1 level "Summer": 1 1 1 1 1 1 1 1 1 1 ...
$ Month : Factor w/ 2 levels "Feb","Mar": 1 1 1 1 1 1 1 1 1 1 ...
$ Date : int 11 12 13 14 15 16 17 18 19 20 ...
$ Temperature: num 4 3.9 3.9 4 3.8 4 4.1 4 3.8 3.9 ...
#summary
summary(temp.feb.mar)
Season Month Date Temperature
Summer:35 Feb:18 Min. : 1.0 Min. :3.800
Mar:17 1st Qu.: 9.5 1st Qu.:3.900
Median :14.0 Median :3.900
Mean :14.4 Mean :3.974
3rd Qu.:19.5 3rd Qu.:4.100
Max. :28.0 Max. :4.600
Page | 9
2.2.3 Conclusion from EDA
a. There are 35 days of temperature recordings, Season is summer, since number
of samples are above 30 deg. Celsius, the central limit theorem thumb rule can
be applied – that the data is normally distributed.
b. The data set is a subset of the entire year data (from problem 1)
c. The maximum temp recorded is 4.6 deg C and the minimum is 3.8 deg C with
mean of 3.9 deg C
d. There is a very minimal difference between the mean and the minimum value
and also the 1st quartile and mean are equal -this implies that more than 70% of
the values lie above 3.9 deg Celsius.
Page | 10
Figure 6 - Box plot - 35 days Temp
The hypothesis tests – Z test and One Sample T test shall be performed here to check if
corrective actions are required at the cold storage plant. Corrective actions are to be
taken if the mean or avg temp. in the given data set of 35 days is above 3.9 deg C.
The Z test is conducted as the sample is assumed to be normally distributed (central
limit theorem-sample size greater than 30).
The t-test being an inferential statistics tool is normally used to check if there is any
significant difference between two means of two groups. However One sample t test is
used to determine whether a sample of observations could have been generated by a
process with a specific mean [Fetched from
https://www.statisticssolutions.com/manova-analysis-one-sample-t-test/]
The Z test requires a population standard deviation, in this case, this value is taken from
problem 1 as the current data set is a subset of the entire year data (concluded from the
EDA).
Page | 11
The One sample T test is chosen since there is only one sample to conduct the test on
and the selection of right/left or two tailed is based on the hypothesis formulated.
However, since it is given that we need to study whether the temperature is above a
certain value – it is fairly right to assume at this point that the test is a right tailed One
Sample Student’s T test
Page | 12
[1] 0.159674
n=35
se=sigma/sqrt(n)
z.value=(xbar-mu)/se
z.value
[1] -2.752359
#for z value -2.752
pnorm(z.value)
[1] 0.002958384
Page | 13
P value obtained is 0.0047
c. Give your inference (3 marks)
The p value obtained in both the tests are very much less than alpha 0.1 and hence the null
hypothesis is rejected, or the alternative hypothesis is accepted.
From both the tests combined it indicates that the mean temperature of the 35 days is
greater than the suggestible value of 3.9 deg C and hence the
Corrective action needs to be taken at Cold Storage instead of the supplier.
Page | 14
>
>
> #checking the first 10 and last 10 entries
> head(temp_data,10)
Season Month Date Temperature
1 Winter Jan 1 2.4
2 Winter Jan 2 2.3
3 Winter Jan 3 2.4
4 Winter Jan 4 2.8
5 Winter Jan 5 2.5
6 Winter Jan 6 2.4
7 Winter Jan 7 2.8
8 Winter Jan 8 2.3
9 Winter Jan 9 2.4
10 Winter Jan 10 2.8
> tail(temp_data,10)
Season Month Date Temperature
356 Winter Dec 22 3.3
357 Winter Dec 23 3.0
358 Winter Dec 24 3.7
359 Winter Dec 25 3.2
360 Winter Dec 26 2.7
361 Winter Dec 27 2.7
362 Winter Dec 28 2.3
363 Winter Dec 29 2.6
364 Winter Dec 30 2.3
365 Winter Dec 31 2.9
>
> #checking for any missing values
> anyNA(temp_data)
[1] FALSE
>
>
> #Exploratory Data Analysis Start
>
> #univariate analysis
> #temperature being a numeric data type
>
> boxplot(temp_data$Temperature,col="Green",main="Box Plot of DF 1 Temperatur
e",ylab="Temperature in deg Celcius")
>
> #histogram of temperature - entire year
> hist(temp_data$Temperature,col="Pink",xlab="Temperature",
+ main="Histogram of entire year tempearature")
> # box plot of temperature - Season wise
> plot(temp_data$Temperature~temp_data$Season, col=c(2:4),main="Box plot - Te
mp vs Seasons")
>
> month_comp=temp_data %>%
+ group_by(Month) %>%
+ summarise(No.of.days=n(),
+ Average.temp=round(mean(Temperature),2))
>
> ggplot(month_comp)+
+ geom_bar(aes(x=Month,y=Average.temp),
+ stat="identity", fill="tan1", colour="sienna3")+
+ geom_text(aes(label=Average.temp, x=month_comp$Month, y=0.2+Average.temp)
, colour="red",vjust=1.5,size=3)+
+ theme(axis.text.x = element_text(angle = 0))+
+ ggtitle("Plot of Months vs Avg Temperature")
>
>
> #mean in rainy summer and winter
> Summer=temp_data[which(temp_data$Season=="Summer"),]
Page | 15
> x=mean(Summer$Temperature)
> cat("Mean Temperature in Summer =",round(x,digits=5),"deg Celcius")
Mean Temperature in Summer = 3.15333 deg Celcius>
> Rainy=temp_data[which(temp_data$Season=="Rainy"),]
> y=mean(Rainy$Temperature)
> cat("Mean Temperature in Rains =",round(y,digits=5),"deg Celcius")
Mean Temperature in Rains = 3.03934 deg Celcius>
> Winter=temp_data[which(temp_data$Season=="Winter"),]
> z=mean(Winter$Temperature)
> cat("Mean Temperature in Winter =",round(z,digits=5),"deg Celcius")
Mean Temperature in Winter = 2.70081 deg Celcius>
> #getting mean from the complete temperature list
> year.mean=mean(temp_data$Temperature)
> cat("Mean Temperature of the entire year =",round(year.mean,digits=5),"deg
Celcius")
Mean Temperature of the entire year = 2.96274 deg Celcius>
>
> #getting sd from the complete temperature list
> year.stddev=sd(temp_data$Temperature)
> cat("Std. Dev. of Temperature during the entire year =",round(year.stddev,d
igits=2),"deg Celcius")
Std. Dev. of Temperature during the entire year = 0.51 deg Celcius>
> #getting no of rows
> nrow(temp_data)
[1] 365
> #365 days data is present
> #probability that temperature is below 2 deg celsius
> p1=pnorm(2,2.96274,0.508589,lower.tail=TRUE)
> cat("Probability that mean temp is below 2 deg celcius =",round(p1*100,digi
ts=2),"%")
Probability that mean temp is below 2 deg celcius = 2.92 %>
> #probability that temperature is above 4 deg celcius
> p2=1-pnorm(4,2.96274,0.508589,lower.tail=TRUE)
> cat("Probability that mean temp is above 4 deg celcius =",round(p2*100,digi
ts=2),"%")
Probability that mean temp is above 4 deg celcius = 2.07 %>
> #probability that temp goes below 2 = 2.91 %
> #probability that temp goes above 4 = 2.07 %
>
> #calculating the total probability
> tot.prob=p1+p2
> tot.prob*100
[1] 4.98822
>
> cat("Since total prob. is < 5%,
+ Penality imposed = 10 % AMC")
Since total prob. is < 5%,
Penality imposed = 10 % AMC>
> #penality = 10% AMC
>
>
> #### 2
> #Reading the second data set and storing it into data frame
> temp.feb.mar=read.csv("Cold_Storage.csv",header=TRUE)
> #dimension check
> dim(temp.feb.mar)
[1] 35 4
> #structure check
> str(temp.feb.mar)
'data.frame': 35 obs. of 4 variables:
$ Season : Factor w/ 1 level "Summer": 1 1 1 1 1 1 1 1 1 1 ...
$ Month : Factor w/ 2 levels "Feb","Mar": 1 1 1 1 1 1 1 1 1 1 ...
$ Date : int 11 12 13 14 15 16 17 18 19 20 ...
$ Temperature: num 4 3.9 3.9 4 3.8 4 4.1 4 3.8 3.9 ...
Page | 16
> #summary
> summary(temp.feb.mar)
Season Month Date Temperature
Summer:35 Feb:18 Min. : 1.0 Min. :3.800
Mar:17 1st Qu.: 9.5 1st Qu.:3.900
Median :14.0 Median :3.900
Mean :14.4 Mean :3.974
3rd Qu.:19.5 3rd Qu.:4.100
Max. :28.0 Max. :4.600
>
> hist(temp.feb.mar$Temperature, col="pink",main="histogram of 35 days temper
ature in Summer")
> boxplot(temp.feb.mar$Temperature,col="red", main="box plot of 35 days temp.
")
>
> #hypothesis testing
> # Hypothesis
> #Ho : mean temp = 3.9 C
> #Ha : mean temp >3.9 C
> # Z TEST
> xbar= 3.9
> mu=mean(temp.feb.mar$Temperature)
> mu
[1] 3.974286
> sigma=sd(temp.feb.mar$Temperature)
> sigma
[1] 0.159674
> n=35
> se=sigma/sqrt(n)
> z.value=(xbar-mu)/se
> z.value
[1] -2.752359
> #for z value -2.752
> pnorm(z.value)
[1] 0.002958384
>
> pnorm(z.value,lower.tail=TRUE)
[1] 0.002958384
>
> # one sample t test
> #at 90 percent confidence interval
> p.value=t.test(temp.feb.mar$Temperature,
+ alternative = "greater",
+ mu=3.9,
+ conf.level=0.90,var.equal = TRUE)
> p.value
data: temp.feb.mar$Temperature
t = 2.7524, df = 34, p-value = 0.004711
alternative hypothesis: true mean is greater than 3.9
90 percent confidence interval:
3.939011 Inf
sample estimates:
mean of x
3.974286
> #corresponding probability p.value in t test is 0.0047
> # As the p value is less than acceptable alpha of 0.1
> # Ho is rejected and Ha is accepted
>
> p.value
Page | 17
One Sample t-test
data: temp.feb.mar$Temperature
t = 2.7524, df = 34, p-value = 0.004711
alternative hypothesis: true mean is greater than 3.9
90 percent confidence interval:
3.939011 Inf
sample estimates:
mean of x
3.974286
Page | 18