0% found this document useful (0 votes)

25 views9 pages

07exercise Solution

The document appears to be an exam problem from a statistics course exploring work and living conditions in cities around the world in 1991 based on data from the Economic Research Department of the Union Bank of Switzerland. The data includes variables on working hours, cost of living indexes, wage indexes, and categorized wage levels for various cities. The student is asked to explore and describe the data through plots, tables and analysis to assess if the data from different countries are comparable. R code is provided to import, manipulate and examine the data.

Uploaded by

ayanabi8753

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views9 pages

07exercise Solution

Uploaded by

ayanabi8753

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

University of Zurich STA121

Institute of Mathematics Reinhard Furrer

Exercises 7 Gilles Kratzer, Simone Tiberi

Test Exam, 120 minutes

Problem 19 (Use R for the following tasks)
The dataset ec.txt was collected by the Economic Research Department of the Union Bank of Switzerland in
1991. We want to investigate the following question: What was the work and living conditions in different cities
around the globe in 1991? (The data are available at the course webpage)
The variables are:

City: City name

Work: Weighted average of the number of working hours in 12 occupations
Price: Index of the cost 112 goods and services excluding rent (Zurich = 100)
Salary: Index of hourly earnings in 12 occupations after deductions (Zurich = 100)
SalaryCat: Was generated from Salary with the R code: cut(Salary, breaks = 3)

(a) Explore the date and describe them with plots, tables and words. Do you think the collected data of the
different countries are comparable?
Solution:

ec <- read.csv("data/worldEconomy.txt", sep = "\t")

ec <- within(ec, SalaryCat <- cut(Salary, breaks = 3))
write.csv(ec, "data/ec.txt", row.names = FALSE)

rm(list = ls())
ec <- read.csv("data/ec.txt")
rownames(ec) <- ec[, 1]
ec <- ec[, -1]
head(ec)

## Work Price Salary SalaryCat

## Amsterdam 1714 65.6 49.0 (35.1,67.6]
## Athens 1792 53.8 30.4 (2.6,35.1]
## Bogota 2152 37.9 11.5 (2.6,35.1]
## Bombay 2052 30.3 5.3 (2.6,35.1]
## Brussels 1708 73.8 50.5 (35.1,67.6]
## Buenos Aires 1971 56.1 12.5 (2.6,35.1]

str(ec)

## ’data.frame’: 48 obs. of 4 variables:

## $ Work : int 1714 1792 2152 2052 1708 1971 NA 2041 1924 1717 ...
## $ Price : num 65.6 53.8 37.9 30.3 73.8 56.1 37.1 61 73.9 91.3 ...
## $ Salary : num 49 30.4 11.5 5.3 50.5 12.5 NA 10.9 61.9 62.9 ...
## $ SalaryCat: Factor w/ 3 levels "(2.6,35.1]","(35.1,67.6]",..: 2 1 1 1 2 1 NA 1 2 2 ...

summary(ec)
## Work Price Salary SalaryCat
## Min. :1583 Min. : 30.30 Min. : 2.70 (2.6,35.1] :21
## 1st Qu.:1745 1st Qu.: 49.65 1st Qu.: 14.38 (35.1,67.6]:21
## Median :1849 Median : 70.50 Median : 43.65 (67.6,100] : 4
## Mean :1880 Mean : 68.86 Mean : 39.55 NA’s : 2
## 3rd Qu.:1976 3rd Qu.: 81.70 3rd Qu.: 59.70
## Max. :2375 Max. :115.50 Max. :100.00
## NA’s :2 NA’s :2

head(ec[order(ec$Price), ])

## Work Price Salary SalaryCat

## Bombay 2052 30.3 5.3 (2.6,35.1]
## Cairo NA 37.1 NA <NA>
## Bogota 2152 37.9 11.5 (2.6,35.1]
## Manila 2268 40.0 4.0 (2.6,35.1]
## Kuala Lumpur 2167 43.5 9.9 (2.6,35.1]
## Jakarta NA 43.6 NA <NA>

tail(ec[order(ec$Price), ])

## Work Price Salary SalaryCat

## Geneva 1880 95.9 90.3 (67.6,100]
## Zurich 1868 100.0 100.0 (67.6,100]
## Stockholm 1805 111.3 39.2 (35.1,67.6]
## Helsinki 1667 113.6 66.6 (35.1,67.6]
## Tokyo 1880 115.0 68.0 (67.6,100]
## Oslo 1583 115.5 63.7 (35.1,67.6]

ec[is.na(ec$Work), ]

## Work Price Salary SalaryCat

## Cairo NA 37.1 NA <NA>
## Jakarta NA 43.6 NA <NA>

pairs(ec)

40 60 80 100 1.0 1.5 2.0 2.5 3.0

● ● ●
2200

● ● ●
● ● ● ●● ● ●
●
● ● ● ● ●●● ● ● ●
Work ● ●● ●
●●
● ●●
●
●
● ●●
●
●●
●
● ●●
●
●●
●
●
● ●
●
● ● ●
●
●
●●
● ●
●
●
● ●●●● ● ●
●
●● ● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●●●●● ●● ● ●
1600

● ●
●● ● ● ●
●
● ●
● ● ●

● ● ● ● ● ●●● ●
● ●
●● ● ● ●
●
●
● ● ● ●
●
80

●● ● ● ● ● ● ●
● ● ● ●
●
●●●● ●
●
●
● ● ●● ●●
● ● ● ●●
●
●
●

●
● Price ●●
●
●●●● ● ●
●● ● ●●
● ●●● ● ●
● ●● ●
●● ● ●
●
●
●
●
●
●
●
●
●
●
●
●

● ● ●● ● ● ●
●
●● ●●
40

● ● ●
●
● ●
● ● ●

● ● ●
● ● ●
40 80

● ● ● ● ● ● ●● ●● ● ●
●
● ● ●
● ●●
●●●● ● ● ●
●●●
●
● ●●
●
●
●
●
● ●●● ● ● ●
●●●
● ●
●●
●
●●●● ●●
●
●
●
●
Salary ●
●
●
●
●
●
●
● ● ● ● ●
●
● ● ●● ●● ● ●
● ●●●● ●
●
● ●● ●● ● ● ● ●●
● ●
●
●
0
3.0

● ●● ● ●● ● ●● ● ●

SalaryCat
2.0

● ●●●●
●●●●●
●●● ● ●● ● ● ●●
●●
●●
●●●
●●●
●●
● ●● ●●● ●● ●●
●
●●
●●●●●●
●●●
●
●
1.0

●
●●● ●●● ●●● ●●●● ●
●● ● ● ● ●● ●●●●
●●
●●●●●●●
●● ● ●
●●
●●●
●●
●●● ●●
●
●●●●

1600 1800 2000 2200 2400 0 20 40 60 80 100

par(mfrow = c(1, 3))
with(ec, {
hist(Work)
hist(Price)
hist(Salary)
})

Histogram of Work Histogram of Price Histogram of Salary

8
12

10
10

6
Frequency

Frequency

Frequency
8

4
6

4
4

2
2
2
0

0
1600 1800 2000 2200 2400 40 60 80 100 120 0 20 40 60 80 100

Work Price Salary

with(ec, table(cut(Work, 3), SalaryCat))

## SalaryCat
## (2.6,35.1] (35.1,67.6] (67.6,100]
## (1.58e+03,1.85e+03] 6 16 1
## (1.85e+03,2.11e+03] 10 5 3
## (2.11e+03,2.38e+03] 5 0 0

ec.noNA <- na.omit(ec)

(b) Which cities are similar? Conduct a cluster analysis. Decide on a “reasonable” size of the grouping. Plot
and interpret the results.
Solution:

dat <- scale(ec.noNA[, -4])

par(mfrow = c(1, 3))
hc1n <- hclust(dist(dat))
plot(hc1n)
hc2n <- hclust(dist(dat), method = "single")
plot(hc2n)
hc3n <- hclust(dist(dat), method = "ward")
plot(hc3n)
Cluster Dendrogram Cluster Dendrogram Cluster Dendrogram

35
6

1.5

30
5

Hong Kong

25
4

20
1.0

Tokyo
Taipei
3

15
Oslo Stockholm
2

10
Height

Height

Height
0.5

Angeles
Bombay

LosHouston
Luxembourg
LagosManila
Kuala LumpurHong Kong

5
1

Panama
Lisbon
Sao Paulo
Johannesburg

Copenhagen
Helsinki
Tel AvivTaipei

Madrid
Tel Aviv

York
Geneva
Zurich
Manila

CityAires
Stockholm
Tokyo

0
Rio de Janeiro

Amsterdam

Montreal
Seoul
0

Luxembourg

Athens
Nicosia
Bombay
Houston

Dublin
Dusseldorf
Frankfurt

Chicago
Toronto
Lisbon

0.0
York
Angeles

Kuala Lumpur
Bogota

Brussels
Sydney
Panama

New
Singpore
Caracas
Paulo
Montreal
Dublin

Nairobi
Aires
Johannesburg
Copenhagen
Madrid

Seoul
Helsinki
Oslo

Kong
Taipei
Geneva
Zurich

Vienna
Milan

Stockholm
Tokyo
Amsterdam

de Janeiro
Lagos

Manila
Luxembourg

Bombay
Houston
Buenos
Athens
Nicosia

Lisbon

Panama
Angeles
York
Dusseldorf
Frankfurt

Tel Aviv
Toronto
Chicago

Paulo
Bogota

Montreal

Johannesburg
Aires
Copenhagen
Madrid
Brussels
Sydney

Helsinki
Oslo

Seoul
Dublin
Caracas
Singpore

Geneva
Zurich

Amsterdam

SaoLagos
Rio de Janeiro
Athens
Nicosia
Chicago
Dusseldorf
City
Nairobi

Toronto
Frankfurt

Bogota
Lumpur
Brussels
Sydney

Caracas
Singpore
City
Nairobi
Milan
Vienna
Paris
London
Vienna
Milan
London
Paris

London
Paris
LosNew

Mexico
Buenos
Sao

Hong
Mexico

LosNew

Mexico
Buenos

Kuala
Rio

dist(dat) dist(dat) dist(dat)

hclust (*, "complete") hclust (*, "single") hclust (*, "ward.D")

## Hierarchical clustering
grp <- cutree(hc1n, k = 3)
list(first = names(grp[grp == 1]), second = names(grp[grp == 3]), third = names(grp[grp ==
3]))

## $first
## [1] "Amsterdam" "Brussels" "Chicago" "Copenhagen" "Dublin"
## [6] "Dusseldorf" "Frankfurt" "Geneva" "Helsinki" "Houston"
## [11] "London" "Los Angeles" "Luxembourg" "Madrid" "Milan"
## [16] "Montreal" "New York" "Oslo" "Paris" "Stockholm"
## [21] "Sydney" "Tokyo" "Toronto" "Vienna" "Zurich"
##
## $second
## [1] "Bogota" "Bombay" "Caracas" "Hong Kong" "Kuala Lumpur"
## [6] "Manila" "Panama" "Singpore" "Taipei" "Tel Aviv"
##
## $third
## [1] "Bogota" "Bombay" "Caracas" "Hong Kong" "Kuala Lumpur"
## [6] "Manila" "Panama" "Singpore" "Taipei" "Tel Aviv"

## K-means
km <- kmeans(dat, center = 3, nstart = 20)
grp2 <- km$cluster
list(first = names(grp2[grp2 == 1]), second = names(grp2[grp2 == 3]), third = names(grp2[grp2 ==
3]))

## $first
## [1] "Amsterdam" "Brussels" "Chicago" "Copenhagen" "Dublin"
## [6] "Dusseldorf" "Frankfurt" "Geneva" "Helsinki" "Houston"
## [11] "London" "Los Angeles" "Luxembourg" "Madrid" "Milan"
## [16] "Montreal" "New York" "Oslo" "Paris" "Stockholm"
## [21] "Sydney" "Tokyo" "Toronto" "Vienna" "Zurich"
##
## $second
## [1] "Athens" "Buenos Aires" "Johannesburg" "Lagos"
## [5] "Lisbon" "Mexico City" "Nairobi" "Nicosia"
## [9] "Rio de Janeiro" "Sao Paulo" "Seoul"
##
## $third
## [1] "Athens" "Buenos Aires" "Johannesburg" "Lagos"
## [5] "Lisbon" "Mexico City" "Nairobi" "Nicosia"
## [9] "Rio de Janeiro" "Sao Paulo" "Seoul"

(c) Find a “good” linear regression model with Salary as response variable. Interpret the results and discuss
the model fit.
Solution:

mod1 <- lm(Salary ~ Work + Price, data = ec)

summary(mod1)

##
## Call:
## lm(formula = Salary ~ Work + Price, data = ec)
##
## Residuals:
## Min 1Q Median 3Q Max
## -37.392 -10.615 1.819 8.081 34.279
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.09421 31.32573 0.322 0.749
## Work -0.01673 0.01422 -1.177 0.246
## Price 0.86876 0.11587 7.497 2.47e-09 ***
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
##
## Residual standard error: 14.83 on 43 degrees of freedom
## (2 observations deleted due to missingness)
## Multiple R-squared: 0.6572, Adjusted R-squared: 0.6412
## F-statistic: 41.21 on 2 and 43 DF, p-value: 1.009e-10

mod2 <- update(mod1, Salary ~ Price)

summary(mod2)

##
## Call:
## lm(formula = Salary ~ Price, data = ec)
##
## Residuals:
## Min 1Q Median 3Q Max
## -38.679 -10.279 -0.861 11.280 32.635
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -25.6766 7.6007 -3.378 0.00154 **
## Price 0.9304 0.1038 8.963 1.75e-11 ***
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
##
## Residual standard error: 14.89 on 44 degrees of freedom
## (2 observations deleted due to missingness)
## Multiple R-squared: 0.6461, Adjusted R-squared: 0.6381
## F-statistic: 80.34 on 1 and 44 DF, p-value: 1.746e-11
anova(mod2, mod1, test = "F")

## Analysis of Variance Table

##
## Model 1: Salary ~ Price
## Model 2: Salary ~ Work + Price
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 44 9760.5
## 2 43 9456.0 1 304.51 1.3847 0.2458

require(MASS)
step <- stepAIC(mod1, direction = "both")

## Start: AIC=250.99
## Salary ~ Work + Price
##
## Df Sum of Sq RSS AIC
## - Work 1 304.5 9760.5 250.44
## <none> 9456.0 250.99
## - Price 1 12361.2 21817.2 287.44
##
## Step: AIC=250.44
## Salary ~ Price
##
## Df Sum of Sq RSS AIC
## <none> 9760.5 250.44
## + Work 1 304.5 9456.0 250.99
## - Price 1 17822.0 27582.5 296.23

summary(step)

par(mfrow = c(2, 2))

plot(step)
Standardized residuals
Residuals Residuals vs Fitted Normal Q−Q
● Luxembourg Zurich
● ● ● Zurich
Luxembourg ● ●
−40 20 ● ● ●
● ●● ●
●●●● ●●●●●●● ●

−2 1
● ● ● ● ●● ● ● ●● ●
● ● ●●●●●●
● ● ●●●●●●●
● ●●
● ●● ●
● ● ● ● ● ●● ●●●●●●●●●●●●
● ● ● ● ● ●●●●●●
Stockholm ● ● Stockholm

0 20 40 60 80 −2 −1 0 1 2

Fitted values Theoretical Quantiles

Standardized residuals

Standardized residuals
Scale−Location Residuals vs Leverage
1.5

Stockholm ● ● Zurich 0.5

● Luxembourg Zurich
● ● ● ●
●
●
●●●● ● ●

1
● ● ● ● ●
● ● ● ●● ●●● ●● ●● ● ●● ●
●
●● ●
● ● ●● ● ●
● ●●●●●
● ● ●● ● ● ● ●● ● ● ●●
●● ● ●● ● ● ●
● ● ●
● ●● ●● ● ● ●
Cook's distance
● Oslo ●
0.0

−3
Stockholm ● 0.5

0 20 40 60 80 0.00 0.02 0.04 0.06 0.08 0.10 0.12

Fitted values Leverage

(d) Use PCA and display the results in a biplot. Interpret the results.
Solution:

pca <- prcomp(ec.noNA[, -4], scale = TRUE)

round(pca$sdev, 2)

## [1] 1.47 0.80 0.44

biplot(pca)

−6 −4 −2 0 2 4

Rio deLagos
Janeiro
4

Lisbon
0.2

Sydney
Frankfurt
Amsterdam Sao Paulo
Athens
Nicosia
Brussels
2

Dusseldorf
Dublin
OsloMadrid
Paris
London Mexico
Seoul City
Nairobi
Vienna
Copenhagen
Milan
Luxembourg
Buenos Bombay
JohannesburgAires
0.0

HelsinkiMontreal
0
PC2

Stockholm
Toronto Caracas
Panama
Singpore
Salary ChicagoTel AvivBogota
Kuala Lumpur
−6 −4 −2

Price Houston
New York
−0.2

Tokyo
Geneva Manila
Zurich
Los Angeles
Taipei Work
−0.4

Hong Kong

−0.4 −0.2 0.0 0.2

PC1

(e) Use linear and quadratic discriminant analysis to discriminate the three salary levels of SalaryCat base on
the variables Work and Price. Display the results graphically. Compare the results of the linear and the
quadratic approach.
Solution:

require(MASS)
lda1 <- lda(SalaryCat ~ Work + Price, data = ec.noNA)
Work <- with(ec.noNA, seq(min(Work), max(Work), length = 100))
Price <- with(ec.noNA, seq(min(Price), max(Price), length = 100))
grid <- expand.grid(Work = Work, Price = Price)
pred.lda1 <- predict(lda1, grid)$class
qda1 <- qda(SalaryCat ~ Work + Price, data = ec.noNA)
pred.qda1 <- predict(qda1, grid)$class
par(mfrow = c(1, 2))
cols <- c("white", rgb(0.9, 0.9, 0.8), rgb(0.8, 0.9, 0.9))
image(Work, Price, array(as.numeric(pred.lda1), c(100, 100)), col = cols)
with(ec, points(Work, Price, col = SalaryCat))
image(Work, Price, array(as.numeric(pred.qda1), c(100, 100)), col = cols)
with(ec, points(Work, Price, col = SalaryCat))

● ● ● ● ● ●
● ●
100

100
● ●
● ●
● ●
● ●
● ● ● ● ● ●
●● ●●
80

80
● ●
Price

● ● Price ● ●
● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ●
● ● ● ●
60

● 60 ●
● ●
● ● ● ●
● ●
●
● ● ●
● ●
● ●
● ● ● ● ● ● ● ●
● ●
40

● ●
● ●
● ●

1600 1800 2000 2200 1600 1800 2000 2200

Work Work

(f) Compare the findigs from the different approaches above. Make five distict statements and refere to all
points (a) to (e) at least once.
Solution:
Keywords: PCA and clustering: no assumption on the data’s distribution, unsupervised algorithm, no
prediction, just description. PCA gives the ’why’ to the clustering’s result. Linear regression, discriminant
analysis: predictive methods, Gaussian assumption.

Turn page.
Problem 20 (Do NOT use R for the following tasks)

(a) Consider the simple linear regression Yei = β0 + β1 X ei ∼ N (µ, σ 2 ) and Ui ∼ N (0, σu2 ). X
ei + Ui with X ei and
xe
Ui are independent and i = 1, . . . , n.

(i) What is the distribution of Ỹi ?

(ii) Suppose that we can only observe Yei subject to measurement errors, i.e., we observe Yi = Yei + Vi with
Vi ∼ N (0, σv2 ), where Vi is independent of X ei and Ui . Then we are trying to fit the regression line
using the equation Yi = β0 + β1 X ei + i . What is the distribution of i ?
(iii) Suppose that we can only observe X ei subject to measurement errors, i.e., we observe Xi = X e i + Wi
2
with Wi ∼ N (0, σw ), where Wi is independent of Xi and Ui . Then we are trying to fit the regression
e
line using the equation Yei = β0 + β1 Xi + i . What is the distribution of i ?

(b) Describe in a view lines:

• What is an estimator? What is an estimate?

• How can resampling techniques be used to derive confidence intervals?
• What is leave-one-out cross-validation?

Solution:

(a)

(i)
Yei ∼ N (β0 + β1 µ, σu2 + β12 σxe)

(ii)
i ∼ N (0, σu2 + σv2 )

(iii)
i ∼ N (0, σu2 + β12 σw
2
)

(b) • Consider a real-valued statistic Tn = h(X1:n ), based on a random sample X1:n from a distribution with
probability mass or density function f (x; θ) where θ is an unknown scalar parameter. If the random
variable Tn is computed to make inference about θ, then it is called an estimator. We may simply
write T rather than Tn if the sample size n is not important. The particular value t = h(x1:n ) that an
estimator takes for a realization x1:n of the random sample X1:n is called an estimate.

(c) Keywords: bootstrapping, sample n times the same data with replacement (with n large), derive n estimates,
construct the confidence interval based on those n estimates.

(d) Keywords: estimation of the true MSE, leave one observation yi out, predict ŷi , repeat over all the data,
calculate M
\ SE.

http://www.math.uzh.ch/sta121.1|07exercise.pdf 2014-10-13

pppcr1 5se
83% (6)
pppcr1 5se
126 pages
Method Validation - QC
No ratings yet
Method Validation - QC
39 pages
Cognitive Science and Intelligence Analysis by Robert S. Sinclair
100% (2)
Cognitive Science and Intelligence Analysis by Robert S. Sinclair
41 pages
Econometric S
100% (1)
Econometric S
348 pages
Elements of Logic
No ratings yet
Elements of Logic
4 pages
9 Statistics and Probability g11 Quarter 4 Module 9 Formulating Null and Alternative Hypotheses On A Population PR
No ratings yet
9 Statistics and Probability g11 Quarter 4 Module 9 Formulating Null and Alternative Hypotheses On A Population PR
28 pages
Sta238 Wks - Week1+2
No ratings yet
Sta238 Wks - Week1+2
35 pages
What Is Test Method Qualification
No ratings yet
What Is Test Method Qualification
11 pages
Characteristics of A Good Research
0% (1)
Characteristics of A Good Research
1 page
Jadad Scale
No ratings yet
Jadad Scale
4 pages
Biology Chapter 2
No ratings yet
Biology Chapter 2
12 pages
Uv Seminar
No ratings yet
Uv Seminar
47 pages
Lab 3. Linear Regression 230223
100% (1)
Lab 3. Linear Regression 230223
7 pages
Econometrics I - R Summary (Maite Cabeza-Gutes)
No ratings yet
Econometrics I - R Summary (Maite Cabeza-Gutes)
77 pages
Autocorrelation
No ratings yet
Autocorrelation
5 pages
BDA MSC It
No ratings yet
BDA MSC It
35 pages
R - (2017) Understanding and Applying Basic Statistical Methods Using R (Wilcox - R - R) (Sols.)
No ratings yet
R - (2017) Understanding and Applying Basic Statistical Methods Using R (Wilcox - R - R) (Sols.)
91 pages
Problem Set 1 Solution Numerical Methods
No ratings yet
Problem Set 1 Solution Numerical Methods
32 pages
R Practicals
No ratings yet
R Practicals
32 pages
Multicollinearity and Oaxaca - Tutorial
No ratings yet
Multicollinearity and Oaxaca - Tutorial
35 pages
R For Machine Learning Lab Practical Work: Master of Business Administration in Business Analytics
0% (1)
R For Machine Learning Lab Practical Work: Master of Business Administration in Business Analytics
9 pages
Lab File AD PDF
No ratings yet
Lab File AD PDF
25 pages
R Program
No ratings yet
R Program
22 pages
Systematic Review PDSA
100% (1)
Systematic Review PDSA
11 pages
Pool
No ratings yet
Pool
13 pages
Practice-Training BTTC
No ratings yet
Practice-Training BTTC
25 pages
Data Analyst Chapter 3
No ratings yet
Data Analyst Chapter 3
20 pages
R Commands
No ratings yet
R Commands
18 pages
Linear Regression
No ratings yet
Linear Regression
22 pages
DSA Lab
No ratings yet
DSA Lab
29 pages
Statistical Models in S
No ratings yet
Statistical Models in S
115 pages
R Examples
No ratings yet
R Examples
56 pages
Applied Statistics MAT1011
No ratings yet
Applied Statistics MAT1011
22 pages
K Nearest Neighbours (KNN) : Short Intro To KNN
No ratings yet
K Nearest Neighbours (KNN) : Short Intro To KNN
13 pages
Linear Regression
No ratings yet
Linear Regression
17 pages
Problem Set 3: General Guideline
No ratings yet
Problem Set 3: General Guideline
12 pages
Model 1
No ratings yet
Model 1
14 pages
Practical4 Solution-1
No ratings yet
Practical4 Solution-1
9 pages
Lab Wk1soln PDF
No ratings yet
Lab Wk1soln PDF
14 pages
Home Credit Data
No ratings yet
Home Credit Data
6 pages
Assignment Submitted By-Srishti Bhateja 19021141116: STR (Crew - Data)
No ratings yet
Assignment Submitted By-Srishti Bhateja 19021141116: STR (Crew - Data)
11 pages
Linear Regression in R
No ratings yet
Linear Regression in R
19 pages
R Cheat Sheet (Updated)
No ratings yet
R Cheat Sheet (Updated)
13 pages
Input Analysis
No ratings yet
Input Analysis
14 pages
Bai11 1 1
No ratings yet
Bai11 1 1
11 pages
Statsss 1
No ratings yet
Statsss 1
18 pages
Gdpforecast.r: Rehanshu Vij 2020-12-10
No ratings yet
Gdpforecast.r: Rehanshu Vij 2020-12-10
10 pages
Lab6 STA651
No ratings yet
Lab6 STA651
13 pages
Session Set Working Directory Choose Directlry
No ratings yet
Session Set Working Directory Choose Directlry
17 pages
R Poisson
No ratings yet
R Poisson
11 pages
Amta - Final - Notes.r: ### Step Wise AIC Regression
No ratings yet
Amta - Final - Notes.r: ### Step Wise AIC Regression
6 pages
Lab Manual Record: St. Josephs PG College
No ratings yet
Lab Manual Record: St. Josephs PG College
14 pages
Ayush Gupta2035. (BA)
No ratings yet
Ayush Gupta2035. (BA)
13 pages
Converted R
No ratings yet
Converted R
8 pages
Part 1b
No ratings yet
Part 1b
7 pages
R Course
No ratings yet
R Course
7 pages
DM Assignment - Thena Bank
No ratings yet
DM Assignment - Thena Bank
39 pages
Discriminant Analysis TUTORIAL
No ratings yet
Discriminant Analysis TUTORIAL
9 pages
Rstudio Study Notes For PA 20181126
No ratings yet
Rstudio Study Notes For PA 20181126
6 pages
Philosophical Implications of Quantum Mechanics
No ratings yet
Philosophical Implications of Quantum Mechanics
1 page
MR-4 Research Design
No ratings yet
MR-4 Research Design
17 pages
STAT-2450 Assignment 1: Name:, Student ID: B00
No ratings yet
STAT-2450 Assignment 1: Name:, Student ID: B00
9 pages
A Short List of The Most Useful R Commands
No ratings yet
A Short List of The Most Useful R Commands
11 pages
SH Alatas Papers On Development
No ratings yet
SH Alatas Papers On Development
46 pages
Advanced Statistics - Test: Tytuły Labek
No ratings yet
Advanced Statistics - Test: Tytuły Labek
40 pages
EDA and Regression: Introduction To Assignment
No ratings yet
EDA and Regression: Introduction To Assignment
2 pages
R - Tutorial: Matrices Are Vectors
No ratings yet
R - Tutorial: Matrices Are Vectors
13 pages
A Short List of The Most Useful R Commands
No ratings yet
A Short List of The Most Useful R Commands
8 pages
Praktikum Modul 3
No ratings yet
Praktikum Modul 3
5 pages
R Tutorial
No ratings yet
R Tutorial
15 pages
(Practical) Programming With R
No ratings yet
(Practical) Programming With R
5 pages
Cognitive Complexity and The Structure of Musical Patterns
100% (1)
Cognitive Complexity and The Structure of Musical Patterns
8 pages
An R Tutorial Starting Out
No ratings yet
An R Tutorial Starting Out
9 pages
Ecosystems and Competition Lesson Plan
No ratings yet
Ecosystems and Competition Lesson Plan
6 pages
STAT212 093 Old-Exam First-Major Solved
No ratings yet
STAT212 093 Old-Exam First-Major Solved
4 pages
Engr 111 A, Section 3500 - D. Mair - Fall 2015 Syllabus
No ratings yet
Engr 111 A, Section 3500 - D. Mair - Fall 2015 Syllabus
8 pages
Schumpeter On Invention, Innovation and Technological Change
No ratings yet
Schumpeter On Invention, Innovation and Technological Change
12 pages
Emmanuel Lwanga - Dissertation-12-3-2024
No ratings yet
Emmanuel Lwanga - Dissertation-12-3-2024
148 pages
Aspal PKM
No ratings yet
Aspal PKM
8 pages
775 1804 2 PB - 230913 - 101218
No ratings yet
775 1804 2 PB - 230913 - 101218
19 pages
Mckendrick, J.H. (2020)
No ratings yet
Mckendrick, J.H. (2020)
7 pages
Methodology Accresm
No ratings yet
Methodology Accresm
10 pages
Preliminary Pages
No ratings yet
Preliminary Pages
5 pages
Answer All The Questions
No ratings yet
Answer All The Questions
5 pages
Worksheet 7
No ratings yet
Worksheet 7
3 pages
The Digital Guide
From Everand
The Digital Guide
Raylene Egbert
No ratings yet
Gd Script
From Everand
Gd Script
Marijo Trkulja
No ratings yet
Time Management Tracker (Printable Version)
From Everand
Time Management Tracker (Printable Version)
Sheba Blake
No ratings yet
Student Planner (Printable Version)
From Everand
Student Planner (Printable Version)
Sheba Blake
No ratings yet
Pearl Necklace Romper Knitting Pattern
From Everand
Pearl Necklace Romper Knitting Pattern
Agnese Iskrova
No ratings yet

07exercise Solution

Uploaded by

07exercise Solution

Uploaded by

University of Zurich STA121

Institute of Mathematics Reinhard Furrer

Exercises 7 Gilles Kratzer, Simone Tiberi

Test Exam, 120 minutes

City: City name

ec <- read.csv("data/worldEconomy.txt", sep = "\t")

## Work Price Salary SalaryCat

## ’data.frame’: 48 obs. of 4 variables:

## Work Price Salary SalaryCat

## Work Price Salary SalaryCat

## Work Price Salary SalaryCat

40 60 80 100 1.0 1.5 2.0 2.5 3.0

1600 1800 2000 2200 2400 0 20 40 60 80 100

Histogram of Work Histogram of Price Histogram of Salary

Work Price Salary

with(ec, table(cut(Work, 3), SalaryCat))

ec.noNA <- na.omit(ec)

dat <- scale(ec.noNA[, -4])

dist(dat) dist(dat) dist(dat)

mod1 <- lm(Salary ~ Work + Price, data = ec)

mod2 <- update(mod1, Salary ~ Price)

## Analysis of Variance Table

par(mfrow = c(2, 2))

Fitted values Theoretical Quantiles

Stockholm ● ● Zurich 0.5

0 20 40 60 80 0.00 0.02 0.04 0.06 0.08 0.10 0.12

Fitted values Leverage

pca <- prcomp(ec.noNA[, -4], scale = TRUE)

## [1] 1.47 0.80 0.44

−0.4 −0.2 0.0 0.2

1600 1800 2000 2200 1600 1800 2000 2200

(i) What is the distribution of Ỹi ?

(b) Describe in a view lines:

• What is an estimator? What is an estimate?

You might also like