Project 1 - Assignment: Cold Storage Case Study

Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

PROJECT 1 - ASSIGNMENT

Cold Storage Case Study


Submitted by: Bibin Vadakkekara Bhaskaran (G1 - PGP BABI)

Great Lakes Institute of Management


Statistical Methods for Decision Making
Table of Contents
List of Tables ................................................................................................................................... 0
List of Figures .................................................................................................................................. 0
1. Project Objective & Background ............................................................................................. 1
2. Solutions.................................................................................................................................. 2
2.1 Problem 1 ......................................................................................................................... 2
2.1.1 Assumptions ................................................................................................................. 2
2.1.2 EDA ............................................................................................................................... 2
2.1.3 Conclusion from EDA .................................................................................................... 3
2.1.4 EDA – with Graphics ..................................................................................................... 3
2.1.5 Conclusion from EDA – with Graphics .......................................................................... 6
2.1.6 Answering the Questions ............................................................................................. 6
2.2 Solution Problem 2 ........................................................................................................... 9
2.2.1 Assumptions ................................................................................................................. 9
2.2.2 EDA ............................................................................................................................... 9
2.2.3 Conclusion from EDA .................................................................................................. 10
2.2.4 EDA – with graphical plots .......................................................................................... 10
2.2.5 Conclusion from EDA – Graphical Plots ...................................................................... 11
2.2.6 Answering the questions ............................................................................................ 11
Appendix 1 – Source Code ............................................................................................................ 14
List of Tables
Table 1 : Data Summary .................................................................................................................. 8

List of Figures
Figure 1 - Box plot of Entire year Temp .......................................................................................... 4
Figure 2 - Histogram of entire year Temp....................................................................................... 4
Figure 3 - Seasonal Temp Variation Box plot .................................................................................. 5
Figure 4 - Monthly Avg Temp.......................................................................................................... 6
Figure 5 - Histogram of Temp - 35 days Summer ......................................................................... 10
Figure 6 - Box plot - 35 days Temp................................................................................................ 11
1. Project Objective & Background
Cold storage company which started business operations in 2016 has a strict policy of maintaining
the temperature at which the products such as Pasteurized Fresh Whole or Skimmed Milk, Sweet
Cream, Flavored Milk Drinks are stored in the cold room, between 2 degree Celsius (⁰C) and 4⁰C.
The plant maintenance is outsourced to a third-party professional company with stiff penalty
clauses. Also, there are customer complaints regarding the quality of products supplied from the
cold storage in the recent days.
In the first year of business they outsourced the plant maintenance work to a professional
company with stiff penalty clauses. It was agreed that if it was statistically proven that probability
of temperature going outside the 2 - 4 C during the one-year contract was above 2.5% (0.025)
and less than 5% (0.05) then the penalty would be 10% of Annual Maintenance Contract (AMC)
fee. In case it exceeded 5% (0.05) then the penalty would be 25% of the AMC fee.
Thus, the primary aim of this report is to
1. Identify the slab of penalty that is to be imposed on the maintenance company if statistically
proven that the probability of temperature variation
a. 10 % AMC
b. 25 % AMC
In Mar 2018, Cold Storage started getting complaints from their Clients that they have been
getting complaints from end consumers of the dairy products going sour and often smelling. On
getting these complaints, the data of last 35 days’ temperatures is fetched. In order to identify
corrective actions needed such as whether the commodities got spoilt due to temperature
variations in the cold storage facility or due to quality issues from the procurement side, the
secondary project objective is to
2. Conclude whether there is any corrective action required for the cold storage or the quality
issues lies at the procurement side by conducting a hypothesis testing and thus inferring the
result obtained.
*For research area 1, “Cold_Storage_Temp_Data.csv” is used and research area 2,
“Cold_Storage.csv” is used. The two data sheets are readily available for the entire duration of
the project.

Page | 1
2. Solutions
2.1 Problem 1
2.1.1 Assumptions
i. The data is distributed normally
ii. Data points are independent
iii. Temperature has no relation to other external factors

2.1.2 EDA
Setting up the working directories and importing the csv files and loading the libraries.
#setting working directory
setwd("C:/Users/bibin/OneDrive/Great Lakes/Statistical Methods for Decision
Making/A.PROJECT MAIN")
#loading libraries
library(tibble)
library(rpivotTable)
library(dplyr)
library(ggplot2)

# Reading the first file and storing it into a data frame


temp_data=read.csv("Cold_Storage_Temp_Data.csv",header=TRUE)

#dimension of dataframes
dim(temp_data)
[1] 365 4

Getting a structure of the data types present


#getting structure of the data types
str(temp_data)
'data.frame': 365 obs. of 4 variables:
$ Season : Factor w/ 3 levels "Rainy","Summer",..: 3 3 3 3 3 3 3 3 3 3 ...
$ Month : Factor w/ 12 levels "Apr","Aug","Dec",..: 5 5 5 5 5 5 5 5 5 5 ...
$ Date : int 1 2 3 4 5 6 7 8 9 10 ...
$ Temperature: num 2.4 2.3 2.4 2.8 2.5 2.4 2.8 2.3 2.4 2.8 ...

Page | 2
summary(temp_data)
Season Month Date Temperature
Rainy :122 Aug : 31 Min. : 1.00 Min. :1.700
Summer:120 Dec : 31 1st Qu.: 8.00 1st Qu.:2.500
Winter:123 Jan : 31 Median :16.00 Median :2.900
Jul : 31 Mean :15.72 Mean :2.963
Mar : 31 3rd Qu.:23.00 3rd Qu.:3.300
May : 31 Max. :31.00 Max. :5.000
(Other):179

Missing values are checked


anyNA(temp_data)
[1] FALSE

2.1.3 Conclusion from EDA


a. The data file Cold storage temp data.csv contains recorded temperature data of 365 days or
12 months.
b. There are no missing values in the dataset.
c. 3 Seasons are present in the dataset – namely Rainy, Summer and Winter.
d. The structure of the dataset indicates that Season and Months are factor type, Date is
integer type and Temperature is numeric type – this implies that no changing of data type is
required.
e. The summary indicates that the maximum temperature is 5 deg Celsius and minimum
temperature is 1.7 deg Celsius with a mean of 2.9 deg Celsius

2.1.4 EDA – with Graphics


Visual identification of patterns and peculiar data styles is possible through the use of R
functions such as box plots, histograms etc.
#temperature being a numeric data type
boxplot(temp_data$Temperature,col="Green",main="Box Plot of DF 1
Temperature",ylab="Temperature in deg Celcius")
#histogram of temperature - entire year
hist(temp_data$Temperature,col="Pink",xlab="Temperature",

Page | 3
main="Histogram of entire year tempearature")
# box plot of temperature - Season wise
plot(temp_data$Temperature~temp_data$Season, col=c(2:4))

Figure 1 - Box plot of Entire year Temp

Figure 2 - Histogram of entire year Temp

Page | 4
Figure 3 - Seasonal Temp Variation Box plot

month_comp=temp_data %>%
group_by(Month) %>%
summarise(No.of.days=n(),
Average.temp=round(mean(Temperature),2))
ggplot(month_comp)+
geom_bar(aes(x=Month,y=Average.temp),
stat="identity", fill="tan1", colour="sienna3")+
geom_text(aes(label=Average.temp, x=month_comp$Month, y=0.2+Average.temp),
colour="red",vjust=1.5,size=3)+
theme(axis.text.x = element_text(angle = 0))+
ggtitle("Plot of Months vs Avg Temperature")

Page | 5
Figure 4 - Monthly Avg Temp.

2.1.5 Conclusion from EDA – with Graphics


a. From the graphical analysis it can be observed that there is not much skewness in
the data
b. Outliers are present during the rainy and winter months temperature recordings
whereas in summer it is not seen, in the overall year temperature there is a
presence of one outlier.
c. The average temperature drops during the winter season months; however, two
maximum average values are seen in Feb and September.
d. From the graphical representation we can see that the avg. temp. has not reduced
below 2 deg Celsius nor it has gone above 5 deg Celsius, thus it is fair to say that the
temperature has remained within this barrier.

2.1.6 Answering the Questions


a. Find mean cold storage temperature for Summer, Winter and Rainy Season (4 marks)
Summer=temp_data[which(temp_data$Season=="Summer"),]
x=mean(Summer$Temperature)
cat("Mean Temperature in Summer =",round(x,digits=5),"deg Celcius")

Mean Temperature in Summer = 3.15333 deg Celsius

Rainy=temp_data[which(temp_data$Season=="Rainy"),]
y=mean(Rainy$Temperature)
cat("Mean Temperature in Rains =",round(y,digits=5),"deg Celcius")

Mean Temperature in Rains = 3.03934 deg Celcius

Winter=temp_data[which(temp_data$Season=="Winter"),]

Page | 6
z=mean(Winter$Temperature)
cat("Mean Temperature in Winter =",round(z,digits=5),"deg Celcius")
Mean Temperature in Winter = 2.70081 deg Celcius
b. Find overall mean for the full year (4 marks)
#getting mean from the complete temperature list
year.mean=mean(temp_data$Temperature)
cat("Mean Temperature of the entire year =",round(year.mean,digits=5),"deg
Celcius")
Mean Temperature of the entire year = 2.96274 deg Celcius
c. Find Standard Deviation for the full year (4 marks)
#getting sd from the complete temperature list
year.stddev=sd(temp_data$Temperature)
cat("Std. Dev. of Temperature during the entire year
=",round(year.stddev,digits=2),"deg Celcius")
Std. Dev. of Temperature during the entire year = 0.51 deg Celsius
d. Assume Normal distribution, what is the probability of temperature having fallen
below 2 C? (4 marks)
We have the values for mean and standard deviation from the previous calculations,
substituting these values in the pnorm function and getting the lower tail of the normal
distribution curve we can find the probability of temp. falling below 2 deg Celsius.
#probability that temperature is below 2 deg celsius
p1=pnorm(2,2.96274,0.508589,lower.tail=TRUE)
cat("Probability that mean temp is below 2 deg celcius
=",round(p1*100,digits=2),"%")
Probability that mean temp is below 2 deg celcius = 2.92 %
e. Assume Normal distribution, what is the probability of temperature having gone
above 4 C? (4 marks)
The probability that the temp. has gone above 4 deg Celsius is calculated as below, the
difference in calculation compared to the above calculation is that instead of the lower tail
the upper tail is considered and hence the subtraction from 1 is carried out.

Page | 7
#probability that temperature is above 4 deg celcius
p2=1-pnorm(4,2.96274,0.508589,lower.tail=TRUE)
cat("Probability that mean temp is above 4 deg celcius
=",round(p2*100,digits=2),"%")
Probability that mean temp is above 4 deg celcius = 2.07 %
f. What will be the penalty for the AMC Company? (5 marks)
The terms and conditions of the AMC as mentioned in the question are
a. 10 % penalty if the probability of temp. going below 2 deg C and above 4 deg C is
in between 2.5 % - 5 %
b. 25 % penalty if the probability of temp. going below 2 deg C and above 4 deg C is
greater than 5 %

#calculating the total probability


tot.prob=p1+p2
tot.prob*100
[1] 4.98822

cat("Since total prob. is < 5%, Penality imposed = 10 % AMC")

Since total prob. is < 5%, Penality imposed = 10 % AMC


Table 1 : Data Summary

Item Description Value


Summer 3.15 ⁰C
Mean Temperature in Rain 3.04 ⁰C
Winter 2.70 ⁰C
Mean Temp. of the Entire Year 2.96 ⁰C
Standard Deviation of the
Entire Year 0.51 ⁰C
Temp.

Temperature falling
2.92%
below 2⁰C
Probability of
Temperature going above
2.07%
4⁰C
Total probability 4.9%

Penalty slab to be imposed = 10 % AMC

Page | 8
2.2 Solution Problem 2
2.2.1 Assumptions
i. The assumptions in this are similar to that of problem 1, there is no effect of outside
temperature on the recordings.
ii. Data collection is unbiased.

2.2.2 EDA
#Reading the second data set and storing it into data frame
temp.feb.mar=read.csv("Cold_Storage.csv",header=TRUE)
#dimension check
dim(temp.feb.mar)
[1] 35 4
#structure check
str(temp.feb.mar)
'data.frame': 35 obs. of 4 variables:
$ Season : Factor w/ 1 level "Summer": 1 1 1 1 1 1 1 1 1 1 ...
$ Month : Factor w/ 2 levels "Feb","Mar": 1 1 1 1 1 1 1 1 1 1 ...
$ Date : int 11 12 13 14 15 16 17 18 19 20 ...
$ Temperature: num 4 3.9 3.9 4 3.8 4 4.1 4 3.8 3.9 ...
#summary
summary(temp.feb.mar)
Season Month Date Temperature
Summer:35 Feb:18 Min. : 1.0 Min. :3.800
Mar:17 1st Qu.: 9.5 1st Qu.:3.900
Median :14.0 Median :3.900
Mean :14.4 Mean :3.974
3rd Qu.:19.5 3rd Qu.:4.100
Max. :28.0 Max. :4.600

Page | 9
2.2.3 Conclusion from EDA
a. There are 35 days of temperature recordings, Season is summer, since number
of samples are above 30 deg. Celsius, the central limit theorem thumb rule can
be applied – that the data is normally distributed.
b. The data set is a subset of the entire year data (from problem 1)
c. The maximum temp recorded is 4.6 deg C and the minimum is 3.8 deg C with
mean of 3.9 deg C
d. There is a very minimal difference between the mean and the minimum value
and also the 1st quartile and mean are equal -this implies that more than 70% of
the values lie above 3.9 deg Celsius.

2.2.4 EDA – with graphical plots


hist(temp.feb.mar$Temperature, col="pink",main="histogram of 35 days temperature in
Summer")
boxplot(temp.feb.mar$Temperature,col="red", main="box plot of 35 days temp.")

Figure 5 - Histogram of Temp - 35 days Summer

Page | 10
Figure 6 - Box plot - 35 days Temp

2.2.5 Conclusion from EDA – Graphical Plots


From the histogram and box plot it is evident that there is presence of outlier and that
most of the temp values lie above 3.9 deg C. This is also supported by the summary of
the data. This indicates primarily that the corrective action needs to be taken at the cold
storage plant since the avg. temperatures are above the prescribed limit. By doing a
hypothesis testing this conclusion can be confirmed.

2.2.6 Answering the questions


a. Which Hypothesis test shall be performed to check the if corrective action is needed at
the cold storage plant? Justify your answer. (5 marks)

The hypothesis tests – Z test and One Sample T test shall be performed here to check if
corrective actions are required at the cold storage plant. Corrective actions are to be
taken if the mean or avg temp. in the given data set of 35 days is above 3.9 deg C.
The Z test is conducted as the sample is assumed to be normally distributed (central
limit theorem-sample size greater than 30).
The t-test being an inferential statistics tool is normally used to check if there is any
significant difference between two means of two groups. However One sample t test is
used to determine whether a sample of observations could have been generated by a
process with a specific mean [Fetched from
https://www.statisticssolutions.com/manova-analysis-one-sample-t-test/]

The Z test requires a population standard deviation, in this case, this value is taken from
problem 1 as the current data set is a subset of the entire year data (concluded from the
EDA).

Page | 11
The One sample T test is chosen since there is only one sample to conduct the test on
and the selection of right/left or two tailed is based on the hypothesis formulated.
However, since it is given that we need to study whether the temperature is above a
certain value – it is fairly right to assume at this point that the test is a right tailed One
Sample Student’s T test

b. State the Hypothesis, perform hypothesis test and determine p-value


Based on the problem statement : THE HYPOTHESIS STATEMENTS are as follows:
Null Hypothesis Ho : Mean temperature = 3.9 ⁰C
Alternative Hypothesis Ha : Mean temperature > 3.9 ⁰C
Two tests can be carried out for this data namely the T test and the Z test
A Z test is carried out primarily to identify the probability value, with
x̅, sample mean = 3.90,
µ, population mean = 3.97 (calculated from the Problem 1),
σ, standard deviation = 0.159,
n, number of observations = 35
Since the alternate hypothesis is calculating if the mean is greater than 3.9 deg C, we are
performing a right-hand tailed test
#hypothesis testing
# Hypothesis
#Ho : mean temp = 3.9 C
#Ha : mean temp >3.9 C
# Z TEST
xbar= 3.9
mu=mean(temp.feb.mar$Temperature)
mu
[1] 3.974286
sigma=sd(temp.feb.mar$Temperature)
sigma

Page | 12
[1] 0.159674
n=35
se=sigma/sqrt(n)
z.value=(xbar-mu)/se
z.value
[1] -2.752359
#for z value -2.752
pnorm(z.value)
[1] 0.002958384

P value obtained is 0.00295

Similarly when conducting a one sample T test


p.value=t.test(temp.feb.mar$Temperature,
alternative = "greater",
mu=3.9,
conf.level=0.90,var.equal = TRUE)
p.value
One Sample t-test
data: temp.feb.mar$Temperature
t = 2.7524, df = 34, p-value = 0.004711
alternative hypothesis: true mean is greater than 3.9
90 percent confidence interval:
3.939011 Inf
sample estimates:
mean of x
3.974286

Page | 13
P value obtained is 0.0047
c. Give your inference (3 marks)
The p value obtained in both the tests are very much less than alpha 0.1 and hence the null
hypothesis is rejected, or the alternative hypothesis is accepted.
From both the tests combined it indicates that the mean temperature of the 35 days is
greater than the suggestible value of 3.9 deg C and hence the
Corrective action needs to be taken at Cold Storage instead of the supplier.

Appendix 1 – Source Code


EXPLORATORY DATA ANALYSIS – R Code
> #setting working directory
> setwd("C:/Users/bibin/OneDrive/Great Lakes/Statistical Methods for Decision
Making/A.PROJECT MAIN")
> #loading libraries
>
> library(tibble)
> library(rpivotTable)
> library(dplyr)
> library(ggplot2)
>
> # Reading the first file and storing it into a data frame
> temp_data=read.csv("Cold_Storage_Temp_Data.csv",header=TRUE)
>
>
> #dimension of dataframes
> dim(temp_data)
[1] 365 4
>
> #to identify the header names
> names(temp_data)
[1] "Season" "Month" "Date" "Temperature"
>
> #getting structure of the data types
> str(temp_data)
'data.frame': 365 obs. of 4 variables:
$ Season : Factor w/ 3 levels "Rainy","Summer",..: 3 3 3 3 3 3 3 3 3 3 .
..
$ Month : Factor w/ 12 levels "Apr","Aug","Dec",..: 5 5 5 5 5 5 5 5 5 5
...
$ Date : int 1 2 3 4 5 6 7 8 9 10 ...
$ Temperature: num 2.4 2.3 2.4 2.8 2.5 2.4 2.8 2.3 2.4 2.8 ...
>
>
> #getting a summary of the data
> summary(temp_data)
Season Month Date Temperature
Rainy :122 Aug : 31 Min. : 1.00 Min. :1.700
Summer:120 Dec : 31 1st Qu.: 8.00 1st Qu.:2.500
Winter:123 Jan : 31 Median :16.00 Median :2.900
Jul : 31 Mean :15.72 Mean :2.963
Mar : 31 3rd Qu.:23.00 3rd Qu.:3.300
May : 31 Max. :31.00 Max. :5.000
(Other):179
>

Page | 14
>
>
> #checking the first 10 and last 10 entries
> head(temp_data,10)
Season Month Date Temperature
1 Winter Jan 1 2.4
2 Winter Jan 2 2.3
3 Winter Jan 3 2.4
4 Winter Jan 4 2.8
5 Winter Jan 5 2.5
6 Winter Jan 6 2.4
7 Winter Jan 7 2.8
8 Winter Jan 8 2.3
9 Winter Jan 9 2.4
10 Winter Jan 10 2.8
> tail(temp_data,10)
Season Month Date Temperature
356 Winter Dec 22 3.3
357 Winter Dec 23 3.0
358 Winter Dec 24 3.7
359 Winter Dec 25 3.2
360 Winter Dec 26 2.7
361 Winter Dec 27 2.7
362 Winter Dec 28 2.3
363 Winter Dec 29 2.6
364 Winter Dec 30 2.3
365 Winter Dec 31 2.9
>
> #checking for any missing values
> anyNA(temp_data)
[1] FALSE
>
>
> #Exploratory Data Analysis Start
>
> #univariate analysis
> #temperature being a numeric data type
>
> boxplot(temp_data$Temperature,col="Green",main="Box Plot of DF 1 Temperatur
e",ylab="Temperature in deg Celcius")
>
> #histogram of temperature - entire year
> hist(temp_data$Temperature,col="Pink",xlab="Temperature",
+ main="Histogram of entire year tempearature")
> # box plot of temperature - Season wise
> plot(temp_data$Temperature~temp_data$Season, col=c(2:4),main="Box plot - Te
mp vs Seasons")
>
> month_comp=temp_data %>%
+ group_by(Month) %>%
+ summarise(No.of.days=n(),
+ Average.temp=round(mean(Temperature),2))
>
> ggplot(month_comp)+
+ geom_bar(aes(x=Month,y=Average.temp),
+ stat="identity", fill="tan1", colour="sienna3")+
+ geom_text(aes(label=Average.temp, x=month_comp$Month, y=0.2+Average.temp)
, colour="red",vjust=1.5,size=3)+
+ theme(axis.text.x = element_text(angle = 0))+
+ ggtitle("Plot of Months vs Avg Temperature")
>
>
> #mean in rainy summer and winter
> Summer=temp_data[which(temp_data$Season=="Summer"),]

Page | 15
> x=mean(Summer$Temperature)
> cat("Mean Temperature in Summer =",round(x,digits=5),"deg Celcius")
Mean Temperature in Summer = 3.15333 deg Celcius>
> Rainy=temp_data[which(temp_data$Season=="Rainy"),]
> y=mean(Rainy$Temperature)
> cat("Mean Temperature in Rains =",round(y,digits=5),"deg Celcius")
Mean Temperature in Rains = 3.03934 deg Celcius>
> Winter=temp_data[which(temp_data$Season=="Winter"),]
> z=mean(Winter$Temperature)
> cat("Mean Temperature in Winter =",round(z,digits=5),"deg Celcius")
Mean Temperature in Winter = 2.70081 deg Celcius>
> #getting mean from the complete temperature list
> year.mean=mean(temp_data$Temperature)
> cat("Mean Temperature of the entire year =",round(year.mean,digits=5),"deg
Celcius")
Mean Temperature of the entire year = 2.96274 deg Celcius>
>
> #getting sd from the complete temperature list
> year.stddev=sd(temp_data$Temperature)
> cat("Std. Dev. of Temperature during the entire year =",round(year.stddev,d
igits=2),"deg Celcius")
Std. Dev. of Temperature during the entire year = 0.51 deg Celcius>
> #getting no of rows
> nrow(temp_data)
[1] 365
> #365 days data is present
> #probability that temperature is below 2 deg celsius
> p1=pnorm(2,2.96274,0.508589,lower.tail=TRUE)
> cat("Probability that mean temp is below 2 deg celcius =",round(p1*100,digi
ts=2),"%")
Probability that mean temp is below 2 deg celcius = 2.92 %>
> #probability that temperature is above 4 deg celcius
> p2=1-pnorm(4,2.96274,0.508589,lower.tail=TRUE)
> cat("Probability that mean temp is above 4 deg celcius =",round(p2*100,digi
ts=2),"%")
Probability that mean temp is above 4 deg celcius = 2.07 %>
> #probability that temp goes below 2 = 2.91 %
> #probability that temp goes above 4 = 2.07 %
>
> #calculating the total probability
> tot.prob=p1+p2
> tot.prob*100
[1] 4.98822
>
> cat("Since total prob. is < 5%,
+ Penality imposed = 10 % AMC")
Since total prob. is < 5%,
Penality imposed = 10 % AMC>
> #penality = 10% AMC
>
>
> #### 2
> #Reading the second data set and storing it into data frame
> temp.feb.mar=read.csv("Cold_Storage.csv",header=TRUE)
> #dimension check
> dim(temp.feb.mar)
[1] 35 4
> #structure check
> str(temp.feb.mar)
'data.frame': 35 obs. of 4 variables:
$ Season : Factor w/ 1 level "Summer": 1 1 1 1 1 1 1 1 1 1 ...
$ Month : Factor w/ 2 levels "Feb","Mar": 1 1 1 1 1 1 1 1 1 1 ...
$ Date : int 11 12 13 14 15 16 17 18 19 20 ...
$ Temperature: num 4 3.9 3.9 4 3.8 4 4.1 4 3.8 3.9 ...

Page | 16
> #summary
> summary(temp.feb.mar)
Season Month Date Temperature
Summer:35 Feb:18 Min. : 1.0 Min. :3.800
Mar:17 1st Qu.: 9.5 1st Qu.:3.900
Median :14.0 Median :3.900
Mean :14.4 Mean :3.974
3rd Qu.:19.5 3rd Qu.:4.100
Max. :28.0 Max. :4.600
>
> hist(temp.feb.mar$Temperature, col="pink",main="histogram of 35 days temper
ature in Summer")
> boxplot(temp.feb.mar$Temperature,col="red", main="box plot of 35 days temp.
")
>
> #hypothesis testing
> # Hypothesis
> #Ho : mean temp = 3.9 C
> #Ha : mean temp >3.9 C
> # Z TEST
> xbar= 3.9
> mu=mean(temp.feb.mar$Temperature)
> mu
[1] 3.974286
> sigma=sd(temp.feb.mar$Temperature)
> sigma
[1] 0.159674
> n=35
> se=sigma/sqrt(n)
> z.value=(xbar-mu)/se
> z.value
[1] -2.752359
> #for z value -2.752
> pnorm(z.value)
[1] 0.002958384
>
> pnorm(z.value,lower.tail=TRUE)
[1] 0.002958384
>
> # one sample t test
> #at 90 percent confidence interval
> p.value=t.test(temp.feb.mar$Temperature,
+ alternative = "greater",
+ mu=3.9,
+ conf.level=0.90,var.equal = TRUE)
> p.value

One Sample t-test

data: temp.feb.mar$Temperature
t = 2.7524, df = 34, p-value = 0.004711
alternative hypothesis: true mean is greater than 3.9
90 percent confidence interval:
3.939011 Inf
sample estimates:
mean of x
3.974286
> #corresponding probability p.value in t test is 0.0047
> # As the p value is less than acceptable alpha of 0.1
> # Ho is rejected and Ha is accepted
>
> p.value

Page | 17
One Sample t-test

data: temp.feb.mar$Temperature
t = 2.7524, df = 34, p-value = 0.004711
alternative hypothesis: true mean is greater than 3.9
90 percent confidence interval:
3.939011 Inf
sample estimates:
mean of x
3.974286

Page | 18

You might also like