0% found this document useful (0 votes)

7 views8 pages

Week-3 NK

Uploaded by

Nagaraj Kulkarni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views8 pages

Week-3 NK

Uploaded by

Nagaraj Kulkarni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

Week-3 Homework submission

Question 5.1
Using crime data from http://www.statsci.org/data/general/uscrime.txt (description
at http://www.statsci.org/data/general/uscrime.html), test to see whether there is an
outlier in the last column (number of crimes per 100,000 people). Is the lowest-
crime city an outlier? Is the highest-crime city an outlier? Are there others? Use the
grubbs.test function in the outliers package in R.

Answer:

Sub-question: See whether there is an outlier in the last column (number of crimes
per 100,000 people)

Step-1. Observation of plots

Based on chart 1.(a), I don’t think there are any outliers, but looking at 1.(b)
it seems there are outliers. So I would not conclude just based on these
charts and investigate the data further.

Figure-1: Charts to observe outliers

1.(a). Simple dot plot of crime rate observations 1.(b)Box and whiskers plot

Step-2.
I conduct Grubbs test to test for outliers in the data. The results suggest that
there are no outliers in the data.

Figure-2: Grubbs test for outliers

No. Test results Conclusion

1 Test if highest value is outlier Because p-value is >0.05, we cannot reject the
G = 2.81290, U = 0.82426, p-value = 0.07887 null-hypothesis. So the highest value is not a outlier

1
alternative hypothesis: highest value 1993 is an at a 5% level of significance.
outlier

2 Test if highest and lowest values are outliers Because p-value is >0.05, we cannot reject the
G = 4.26880, U = 0.78103, p-value = 1 null-hypothesis. So there are no outliers in the data
alternative hypothesis: 342 and 1993 are outliers at a 5% level of significance.

Step-3: I also run a loop to test if of the values is a outlier (see the R-code in ‘code-
section). The output below confirms that there are no outliers in the data.

Figure- 3: Output of looped test of outliers at a 5% level of significance

Data Data Data
Value Is Outlier? Value Is Outlier? Value Is Outlier?
point point point
1 791 FALSE 17 539 FALSE 33 1072 FALSE

2 1635 FALSE 18 929 FALSE 34 923 FALSE

3 578 FALSE 19 750 FALSE 35 653 FALSE

4 1969 FALSE 20 1225 FALSE 36 1272 FALSE

5 1234 FALSE 21 742 FALSE 37 831 FALSE

6 682 FALSE 22 439 FALSE 38 566 FALSE

7 963 FALSE 23 1216 FALSE 39 826 FALSE

8 1555 FALSE 24 968 FALSE 40 1151 FALSE

9 856 FALSE 25 523 FALSE 41 880 FALSE

10 705 FALSE 26 1993 FALSE 42 542 FALSE

11 1674 FALSE 27 342 FALSE 43 823 FALSE

12 849 FALSE 28 1216 FALSE 44 1030 FALSE

13 511 FALSE 29 1043 FALSE 45 455 FALSE

14 664 FALSE 30 696 FALSE 46 508 FALSE

15 798 FALSE 31 373 FALSE 47 849 FALSE

16 946 FALSE 32 754 FALSE

Sub-question: Is the lowest-crime city an outlier?

Answer is No. To answer this, I conducted Grubbs test. The results of the test
confirms that lowest-crime city is not an outlier (as the p-value>0.05 suggests that
we cannot reject the null-hypothesis at a 5% level of significance.).

Grubbs test for two opposite outliers

data: cr
G = 4.26880, U = 0.78103, p-value = 1
alternative hypothesis: 342 and 1993 are outliers

Sub-question: Is the highest-crime city an outlier?

2
To answer this, I conducted Grubbs test. The results of the test confirm that highest-
crime city is not an outlier at a 5% level of significance. However at a 10% level of
significance, we can conclude that highest-crime city is an outlier.

> grubbs.test(cr, type = 10, opposite = FALSE, two.sided = FALSE)

Grubbs test for one outlier

data: cr
G = 2.81290, U = 0.82426, p-value = 0.07887
alternative hypothesis: highest value 1993 is an outlier

Sub-question- Are there others?

Based on the output presented in Figure-3, there are no other outliers.

Question 6.1
Describe a situation or problem from your job, everyday life, current events, etc., for
which a Change Detection model would be appropriate. Applying the CUSUM
technique, how would you choose the critical value and the threshold?

Answer:

Change detection model can be applied in macroeconomics. Specifically in

detecting a change in the inflation of a country. This is important because
significant change in inflation is usually followed by changes to the monetary policy
(i.e. interest rates). An early detection of change in inflation means one can predict
the central bank’s monetary policy changes, and hence the interest rate markets
(like bond prices).

Critical value would change from country to country. It will depend upon the
volatility of inflation in a country. I would use estimate the standard deviation of the
inflation during a period of relatively stable inflation. Then I would use this one-
standard deviation as a critical value.

Determining threshold will be a iterative process. I would use some historical

observations on what was the level of my CUMSUM (i.e. St) in last 10 years when
the central bank recognized that there was a change in the inflation of the
economy.

3
Question 6.2
1. Using July through October daily-high-temperature data for Atlanta for 1996
through 2015, use a CUSUM approach to identify when unofficial summer ends (i.e.,
when the weather starts cooling off) each year.
You can use R if you’d like, but it’s straightforward enough that an Excel
spreadsheet can easily do the job too.

Answer:
For formulae, I following the notations mentioned in the lecture.
I use Excel spreadsheet to conduct this analysis.

I build a CUMSUM model for each year. The procedure used to build each model is
as follows.

Before calculating St, we need to choose value for ‘C’. Intuitively, ‘C’ is the value,
below which the change in temperature is caused by random factors. So our ‘S t’
should not be affected by changes below the value of ‘C’.
I estimate the value of C as follows. For calculations see the attached excel
workbook (sheet-Data).
1. Calculate the daily change in temperature (∆T=X t-Xt-1). This is because we
are interested to know which change in temperature is caused by random
factors, and which ones are not.
2. Calculate the standard deviation of ∆T.
3. Because we are interested only in decreases (i.e. changes in one
direction), C=0.5* average standard deviation.

I then estimate St = max{0, St-1+(µ- Xt -C)}. Pls see the excel workbook (sheet-
Analysis-1).

The value of threshold ‘T’ is chosen as 5*C. This is also based on observation that
any T value below this number would detect false changes during that year.

4
Based on this approach, unofficial summer ends each year are as follows

29 Sep 1996 7 Sep 2003 27 Sep 2010

25 Sep 1997 16 Sep 2004 6 Sep 2011
30 Sep 1998 6 Oct 2005 1 Oct 2012
20 Sep 1999 21 Sep 2006 16 Aug 2013
6 Sep 2000 16 Sep 2007 25 Sep 2014
25 Sep 2001 17 Sep 2008 14 Sep 2015
24 Sep 2002 1 Oct 2009

Representing the control chart below.

2. Use a CUSUM approach to make a judgment of whether Atlanta’s summer climate

has gotten warmer in that time (and if so, when).

Answer:

5
To answer this question we need to look time-series of yearly mean (or median)
temperature, and conduct change point analysis over it. This is done in attached
excel workbook (sheet- Analysis-2).

For each year, when I looked at difference between mean and median, I felt there
was substantial difference. So I preferred to use median as measure of central
tendency.

By observation, one can feel that after 2009, there seems to be a shift higher in the
median temperature. Nevertheless, I conduct the change-point analysis as below.

I calculate ‘C’ in the same way as I did previously. However due to significant
increase in median temperature during 2010, the standard deviation estimate
would be higher as well. So I would prefer to estimate standard deviation by
excluding the 2010 point (of (∆T).

Standard deviation using all data=2.5489

Standard deviation excluding 2010=1.9704 (I use this)

So value of C = 1.9704/2= 0.9851

St is calculated using the formula: S t = max{0, St-1+(Xt - µ- C)}

I determine the value of ‘T’, by observing the St over time. We can observe that
before 2010, changes in St were not much. Indeed maximum St prior to 2010 was
0.71. If my T is les than 0.71, then I might get a false detection. So I keep T at 1.
This is more of a subjective assessment based on the data.

6
The result of calculations are presented in below chart. Based on this chart
the answer to the question is ‘Yes’- the Atlanta’s summer climate has
gotten warmer. Specifically, since 2009.

R-code
#import data and create new vector strong the values of crime rate (i.e. last column)

uscrime <- read.delim("N:/ISyE 6501/W-3/uscrime.txt")

View(uscrime)

cr <- as.numeric(uscrime$Crime)

#Simple dot plot of crime rate observations

plot(cr)

# Box and whiskers plot of crime rate

boxplot(cr,main="Box and whiskers plot of crime rate")

#Grubbs test for one outlier

grubbs.test(cr, type = 10, opposite = FALSE, two.sided = FALSE)

7
#Grubbs test for two opposite outliers

grubbs.test(cr, type = 11, opposite = FALSE, two.sided = FALSE)

# Testing if each data point is an outlier

function(x) {

outliers <- NULL

test <- x

grubbs.result <- grubbs.test(test)

pv <- grubbs.result$p.value

while(pv < 0.05) {

outliers <- c(outliers,as.numeric(strsplit(grubbs.result$alternative," ")[[1]][3]))

test <- x[!x %in% outliers]

grubbs.result <- grubbs.test(test)

pv <- grubbs.result$p.value

return(data.frame(X=x,Outlier=(x %in% outliers)))

grubbs.flag(cr)

Econ 306 HW 3
No ratings yet
Econ 306 HW 3
7 pages
Homework Chapter 13: Pooling Cross Sections Across Time: Simple Panel Data Methods
No ratings yet
Homework Chapter 13: Pooling Cross Sections Across Time: Simple Panel Data Methods
2 pages
Solutions Manual Using R Introductory ST
No ratings yet
Solutions Manual Using R Introductory ST
33 pages
Basic Econometrics Old Exam Questions Wi
100% (2)
Basic Econometrics Old Exam Questions Wi
9 pages
Discussion Assignment - Unit 3: Typos Faulty Measurements Outliers
100% (1)
Discussion Assignment - Unit 3: Typos Faulty Measurements Outliers
3 pages
Comprehensive Assignment 2 Question One
No ratings yet
Comprehensive Assignment 2 Question One
11 pages
Linear Models Bias
No ratings yet
Linear Models Bias
17 pages
Convergence of U - Processes in Hölder Spaces With Application To Robust Detection of A Changed Segment
No ratings yet
Convergence of U - Processes in Hölder Spaces With Application To Robust Detection of A Changed Segment
22 pages
Time Series Project
No ratings yet
Time Series Project
19 pages
Hanke9 Odd-Num Sol 03
100% (1)
Hanke9 Odd-Num Sol 03
10 pages
Detecting Data Outliers
No ratings yet
Detecting Data Outliers
7 pages
Econometric Project - Linear Regression Model
No ratings yet
Econometric Project - Linear Regression Model
17 pages
An Empirical Note About Additive Outliers and Nonstationarity in Latin-American in Ation Series
No ratings yet
An Empirical Note About Additive Outliers and Nonstationarity in Latin-American in Ation Series
12 pages
A Review of Basic Statistical Concepts: Answers To Odd Numbered Problems 1
No ratings yet
A Review of Basic Statistical Concepts: Answers To Odd Numbered Problems 1
32 pages
Isye HW2
No ratings yet
Isye HW2
10 pages
Outliers PDF
No ratings yet
Outliers PDF
5 pages
Week3HW 091323
No ratings yet
Week3HW 091323
8 pages
Library Library Library Library Set - Seed: (Outliers) (Dplyr) (Ggpubr) (Knitr)
No ratings yet
Library Library Library Library Set - Seed: (Outliers) (Dplyr) (Ggpubr) (Knitr)
23 pages
Mathematical Report On Weather of Seoul
No ratings yet
Mathematical Report On Weather of Seoul
11 pages
Some Methods of Detection of Outliers in Linear Regression Model-Ranjit PDF
No ratings yet
Some Methods of Detection of Outliers in Linear Regression Model-Ranjit PDF
19 pages
Refer Guide REDIM 2005 2
No ratings yet
Refer Guide REDIM 2005 2
43 pages
Set 1 Answer Key
No ratings yet
Set 1 Answer Key
3 pages
ISYE 6501 Georgia Tech hmwk6.2
No ratings yet
ISYE 6501 Georgia Tech hmwk6.2
32 pages
Chapter 02 Exploratory Data Analysis
No ratings yet
Chapter 02 Exploratory Data Analysis
38 pages
I. Time Series For 2000-2007 & Mean
No ratings yet
I. Time Series For 2000-2007 & Mean
7 pages
Questions With No Solutions
No ratings yet
Questions With No Solutions
20 pages
Lecture 12 1
No ratings yet
Lecture 12 1
46 pages
Exam 08.06 Data With Outliers - MJ Grade 6 Mathematics V14 (4026)
No ratings yet
Exam 08.06 Data With Outliers - MJ Grade 6 Mathematics V14 (4026)
1 page
ISYE6501 Homework 4
No ratings yet
ISYE6501 Homework 4
7 pages
Revised STATS
No ratings yet
Revised STATS
10 pages
Chap5 Chris Brooks
No ratings yet
Chap5 Chris Brooks
8 pages
A Review of Basic Statistical Concepts: Answers To Problems and Cases 1
No ratings yet
A Review of Basic Statistical Concepts: Answers To Problems and Cases 1
94 pages
ISYE6501 Homework 3
No ratings yet
ISYE6501 Homework 3
6 pages
Econometrics Homework
No ratings yet
Econometrics Homework
11 pages
Lab Wk1soln PDF
No ratings yet
Lab Wk1soln PDF
14 pages
Ch2b DATA315 2023W2
No ratings yet
Ch2b DATA315 2023W2
23 pages
Outlier Analysis
No ratings yet
Outlier Analysis
28 pages
Ran On Aaaaaaa
No ratings yet
Ran On Aaaaaaa
6 pages
Q Test
No ratings yet
Q Test
3 pages
Homework 2
No ratings yet
Homework 2
14 pages
Assignments
No ratings yet
Assignments
6 pages
Homework 3
No ratings yet
Homework 3
11 pages
Assignment4 Group3.CC01.Forecasting-1
No ratings yet
Assignment4 Group3.CC01.Forecasting-1
11 pages
Lesllie Salt Company
No ratings yet
Lesllie Salt Company
15 pages
R - (2017) Understanding and Applying Basic Statistical Methods Using R (Wilcox - R - R) (Sols.)
No ratings yet
R - (2017) Understanding and Applying Basic Statistical Methods Using R (Wilcox - R - R) (Sols.)
91 pages
Project1 - Cold Storage Case Study
No ratings yet
Project1 - Cold Storage Case Study
11 pages
Outlier Detection Algorithms
No ratings yet
Outlier Detection Algorithms
38 pages
4-Regression Diagnostics SAS
No ratings yet
4-Regression Diagnostics SAS
12 pages
Table of Evaluation: No. Full Name Level of Completion 1. Le Ngoc Phương Khanh 100%
No ratings yet
Table of Evaluation: No. Full Name Level of Completion 1. Le Ngoc Phương Khanh 100%
27 pages
Analysis Course HW1
No ratings yet
Analysis Course HW1
5 pages
Collection of Formulae and Statistical Tables For The B2-Econometrics and B3-Time Series Analysis Courses and Exams
No ratings yet
Collection of Formulae and Statistical Tables For The B2-Econometrics and B3-Time Series Analysis Courses and Exams
21 pages
DS 5-Marks Semeseter Suggestion
No ratings yet
DS 5-Marks Semeseter Suggestion
56 pages
Das FFFF
No ratings yet
Das FFFF
16 pages
A Review of Basic Statistical Concepts: Answers To Odd Numbered Problems 1
No ratings yet
A Review of Basic Statistical Concepts: Answers To Odd Numbered Problems 1
32 pages
Time Series
No ratings yet
Time Series
22 pages
A Review of Statistical Outlier Methods
No ratings yet
A Review of Statistical Outlier Methods
8 pages
Lesson 2.accuracy An Precision
No ratings yet
Lesson 2.accuracy An Precision
19 pages
How To Calculate Outliers
No ratings yet
How To Calculate Outliers
7 pages
Naan Muthalvan Project Report Stock Market Forecast 4310
No ratings yet
Naan Muthalvan Project Report Stock Market Forecast 4310
29 pages
CA Measure of Central Tendency & Dispersion New
No ratings yet
CA Measure of Central Tendency & Dispersion New
4 pages
General Chemistry Laboratory Manual Short
No ratings yet
General Chemistry Laboratory Manual Short
20 pages
Basic Stat 1
No ratings yet
Basic Stat 1
50 pages
Chapter 8 Measure of Dispersion For Ungrouped Data
No ratings yet
Chapter 8 Measure of Dispersion For Ungrouped Data
24 pages
SHS Statistics and Probability Q3 Mod2 Normal Distribution v4 1 2 Cutted
No ratings yet
SHS Statistics and Probability Q3 Mod2 Normal Distribution v4 1 2 Cutted
32 pages
Soal UTS Statekbis Genap 2019 2020 Set 1
No ratings yet
Soal UTS Statekbis Genap 2019 2020 Set 1
13 pages
Test Bank For Statistics For People Who Think They Hate Statistics Using Microsoft Excel 2016 4th Edition Salkind 1483374084 9781483374086 Download
No ratings yet
Test Bank For Statistics For People Who Think They Hate Statistics Using Microsoft Excel 2016 4th Edition Salkind 1483374084 9781483374086 Download
55 pages
Problem Set 5 (Measure of Dispersion) - 1
No ratings yet
Problem Set 5 (Measure of Dispersion) - 1
3 pages
Qualification of A Swab - Sampling Procedure For Cleaning Validation
No ratings yet
Qualification of A Swab - Sampling Procedure For Cleaning Validation
8 pages
U3 IntroSummaryStatistics
No ratings yet
U3 IntroSummaryStatistics
47 pages
Eco452 Applied Statistics
No ratings yet
Eco452 Applied Statistics
137 pages
Forests 16 00164
No ratings yet
Forests 16 00164
32 pages
Full Download PDF of (Ebook PDF) Business Statistics 9th by Kent D. Smith All Chapter
100% (18)
Full Download PDF of (Ebook PDF) Business Statistics 9th by Kent D. Smith All Chapter
43 pages
Applications of Microsoft Excel in Analytical Chemistry 2nd Ed 2nd Edition Stanley R. Crouch Ebook All Chapters PDF
No ratings yet
Applications of Microsoft Excel in Analytical Chemistry 2nd Ed 2nd Edition Stanley R. Crouch Ebook All Chapters PDF
42 pages
Fia117v 3 Extra Questions
No ratings yet
Fia117v 3 Extra Questions
6 pages
Data Science Question Bank Updated
No ratings yet
Data Science Question Bank Updated
15 pages
Individual Assignment Managerial Business Analytics MGT782 FINALIZE
No ratings yet
Individual Assignment Managerial Business Analytics MGT782 FINALIZE
14 pages
Geo Prac Exercise 11
No ratings yet
Geo Prac Exercise 11
3 pages
0 Statistical Functions in MS Excel
No ratings yet
0 Statistical Functions in MS Excel
4 pages
Measures of Dispersion
No ratings yet
Measures of Dispersion
2 pages
The Cybernetic Teammate
No ratings yet
The Cybernetic Teammate
56 pages
Class 11 Economics Sample Paper 1 Solutions
No ratings yet
Class 11 Economics Sample Paper 1 Solutions
13 pages
PHYS2A Lab 2 Constantx Acc Motion
No ratings yet
PHYS2A Lab 2 Constantx Acc Motion
12 pages
Adeegsiga Kalkuletarka
No ratings yet
Adeegsiga Kalkuletarka
46 pages
Biostatistics Problems
No ratings yet
Biostatistics Problems
11 pages
Unit 2 Project - Data Analysis Project
No ratings yet
Unit 2 Project - Data Analysis Project
4 pages
Bruktawit Chapter Four
No ratings yet
Bruktawit Chapter Four
7 pages
Assignment Stat-500
No ratings yet
Assignment Stat-500
2 pages
Student Solutions Manual to Accompany Loss Models: From Data to Decisions, Fourth Edition
From Everand
Student Solutions Manual to Accompany Loss Models: From Data to Decisions, Fourth Edition
Stuart A. Klugman
4/5 (1)
Its HOT! Build a Temperature Warning Sound Alarm with Thermistor
From Everand
Its HOT! Build a Temperature Warning Sound Alarm with Thermistor
GURUPRASAD N H
No ratings yet
Instruction for Using a Slide Rule
From Everand
Instruction for Using a Slide Rule
W. Stanley
No ratings yet

Week-3 NK

Uploaded by

Week-3 NK

Uploaded by

Week-3 Homework submission

Step-1. Observation of plots

Figure-1: Charts to observe outliers

Figure-2: Grubbs test for outliers

No. Test results Conclusion

Figure- 3: Output of looped test of outliers at a 5% level of significance

2 1635 FALSE 18 929 FALSE 34 923 FALSE

3 578 FALSE 19 750 FALSE 35 653 FALSE

4 1969 FALSE 20 1225 FALSE 36 1272 FALSE

5 1234 FALSE 21 742 FALSE 37 831 FALSE

6 682 FALSE 22 439 FALSE 38 566 FALSE

7 963 FALSE 23 1216 FALSE 39 826 FALSE

8 1555 FALSE 24 968 FALSE 40 1151 FALSE

9 856 FALSE 25 523 FALSE 41 880 FALSE

10 705 FALSE 26 1993 FALSE 42 542 FALSE

11 1674 FALSE 27 342 FALSE 43 823 FALSE

12 849 FALSE 28 1216 FALSE 44 1030 FALSE

13 511 FALSE 29 1043 FALSE 45 455 FALSE

14 664 FALSE 30 696 FALSE 46 508 FALSE

15 798 FALSE 31 373 FALSE 47 849 FALSE

16 946 FALSE 32 754 FALSE

Sub-question: Is the lowest-crime city an outlier?

Grubbs test for two opposite outliers

Sub-question: Is the highest-crime city an outlier?

> grubbs.test(cr, type = 10, opposite = FALSE, two.sided = FALSE)

Grubbs test for one outlier

Sub-question- Are there others?

Based on the output presented in Figure-3, there are no other outliers.

Change detection model can be applied in macroeconomics. Specifically in

Determining threshold will be a iterative process. I would use some historical

29 Sep 1996 7 Sep 2003 27 Sep 2010

Representing the control chart below.

2. Use a CUSUM approach to make a judgment of whether Atlanta’s summer climate

Standard deviation using all data=2.5489

Standard deviation excluding 2010=1.9704 (I use this)

So value of C = 1.9704/2= 0.9851

St is calculated using the formula: S t = max{0, St-1+(Xt - µ- C)}

uscrime <- read.delim("N:/ISyE 6501/W-3/uscrime.txt")

#Simple dot plot of crime rate observations

# Box and whiskers plot of crime rate

boxplot(cr,main="Box and whiskers plot of crime rate")

#Grubbs test for one outlier

grubbs.test(cr, type = 10, opposite = FALSE, two.sided = FALSE)

grubbs.test(cr, type = 11, opposite = FALSE, two.sided = FALSE)

# Testing if each data point is an outlier

outliers <- NULL

grubbs.result <- grubbs.test(test)

while(pv < 0.05) {

outliers <- c(outliers,as.numeric(strsplit(grubbs.result$alternative," ")[[1]][3]))

test <- x[!x %in% outliers]

grubbs.result <- grubbs.test(test)

return(data.frame(X=x,Outlier=(x %in% outliers)))

You might also like