Cold Storage Problem
Cold Storage Problem
Tahmid Bari
PGP - Data Science and Business Analytics
Problem 1
Cold Storage started its operations in Jan 2016. They are in the business of storing Pasteurized Fresh
Whole or Skimmed Milk, Sweet Cream, Flavoured Milk Drinks. To ensure that there is no change of
texture, body appearance, separation of fats the optimal temperature to be maintained is between
20C - 40C
In the first year of business, they outsourced the plant maintenance work to a professional company
with stiff penalty clauses. It was agreed that if it was statistically proven that the probability of
temperature going outside the 20 C - 40 C during the one-year contract was above 2.5% and less than
5% then the penalty would be 10% of AMC (annual maintenance contract). In case it exceeded 5%
then the penalty would be 25% of the AMC fee. The average temperature data at date level is given
in the file “Cold_Storage_Temp_Data.csv”
1. Find mean cold storage temperature for Summer, Winter and Rainy Season
The approach taken in this case is to filter the dataset provided based on seasons i.e. summer, rainy
and winter. The package ‘dplyr’ is being used here for data manipulation and filtering. By using this
package, we are able to extract the required columns i.e Seasons and Temperature, filter them by
seasons and get a summary of different seasons. The output and the mean cold storage temperature
for summer, winter and rainy season are provided below:
Therefore, the mean cold storage temperatures for Summer, Winter and Rainy seasons are 3.1470 C,
2.7760 C and 3.0880 C respectively.
Boxplot: from the box plot we see that the temperature between Summer and Rainy season
is close
R Code:
#=======================================================================#
#Data Analysis - Cold Storage Problem
#Developer - Tahmid Bari
#Date - April 11, 2020
#=======================================================================#
# Create a new data sub-set with the required columns i.e. Season and temperature
seasons_temp<-select(cold_storage_data_temp,Season,Temperature)
View(seasons_temp)
# Filter and view seasons_temp dataset w.r.t winter, summer and rainy
winter_temp<-filter(seasons_temp, Season == "Winter")
summer_temp<-filter(seasons_temp, Season == "Summer")
rainy_temp<-filter(seasons_temp, Season == "Rainy")
View(winter_temp)
View(summer_temp)
View(rainy_temp)
R Code:
# Overall mean for the full year
mean(cold_storage_data_temp$Temperature)
R Code:
# Standard deviation for the full year
sd(cold_storage_data_temp$Temperature)
R Code:
# Probability of temperature having fallen below 2 deg C
mean_temp<-mean(cold_storage_data_temp$Temperature)
sd_temp<-sd(cold_storage_data_temp$Temperature)
X<-2
pnorm(X,mean_temp,sd_temp)
Similar to # 4, since we know the mean and standard deviation of temperature, the probability of
temperature having gone above 40 C is 0.01612075 = 1.612%
R Code:
# Probability of temperature having gone above 4 deg C
Y<-4
prob<-1-pnorm(Y,mean_temp,sd_temp)
prob
R Code:
# Penalty for AMC company
Xl<-2
Xu<-4
P_Xl<-pnorm(Xl,mean_temp,sd_temp)
P_Xu<-1-pnorm(Xu,mean_temp,sd_temp)
P_total<-P_Xl+P_Xu
P_total
7. Cold Storage temperature between rainy, summer and winter seasons and comment on the
findings.
Using the aov() function we can see that the p-value is 5.08e-11 which is really small, which means
that we reject the null hypothesis that the 3 means of the temperature for the 3 seasons is equal to
each other.
Using the TukeyHSD() function we see that the p-value of the mean between Summer-Rainy season
temperature is 0.5376924 ~ 53.77% (confidence level), hence we do not reject the null hypothesis that
the means are equal. This implies that there is no significant difference between the temperatures in
Summer and Rainy season.
p-value of the means between Rainy-Winter and Winter-Summer is really small, which means there is
significant difference in the temperature.
R Code:
# Perform a one-way ANOVA test to determine if there is a significant difference in Cold Storage
# temperature between rainy, summer and winter seasons and comment on the findings.
seasons_tempaov = aov(seasons_temp$Temperature~seasons_temp$Season, data = seasons_te
mp)
summary(seasons_temp)
TukeyHSD(seasons_tempaov)
Problem 2
In Mar 2018, Cold Storage started getting complaints from their clients that they have been getting
complaints from end consumers of the dairy products going sour and often smelling. On getting these
complaints, the supervisor pulls out data of the last 35 days’ temperatures. As a safety measure, the
Supervisor decides to be vigilant to maintain the temperature at 3.90 C or below.
Assume 3.90 C as the upper acceptable value for mean temperature and at alpha = 0.1. Do you feel
that there is a need for some corrective action in the Cold Storage Plant or is it that the problem is
from the procurement side from where Cold Storage is getting the Dairy Products? The data of the
last 35 days is in “Cold_Storage_Mar2018.csv”
R Code:
# Which Hypothesis test shall be performed to check if corrective action is needed at the cold stor
age plant
# z-test
cold_storage_data_prob2 = read.csv("Cold_Storage_Mar2018.csv")
summary(cold_storage_data_prob2)
m2 = mean(cold_storage_data_prob2$Temperature)
m2
s2 = sd(cold_storage_data_prob2$Temperature)
s2
R Code:
# State the Hypothesis, perform hypothesis test and determine p-value
# t-test
t.test(cold_storage_data_prob2$Temperature, mu = 3.9, alternative = "greater", conf.level = 0.9)
pnorm(-abs(z_cal))