0% found this document useful (0 votes)

27 views7 pages

Department of Statistics: Course Stats 330

This document provides model answers for an assignment analyzing Titanic passenger data. It includes instructions on loading the data, checking for errors, creating additional variables, and performing analyses to understand the relationship between survival and other factors like age, gender, and class. Tables and plots are used to examine how the fraction of passengers surviving depends on these factors.

Uploaded by

PETER

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views7 pages

Department of Statistics: Course Stats 330

Uploaded by

PETER

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Department of Statistics

COURSE STATS 330

Model answers for Assignment 1, 2007.

The data set in the file titanic.txt (available on the course web page) contains some
data on 633 passengers on the liner Titanic, which sank in the North Atlantic on 15th
April 1912 after striking an iceberg.

The data set has 5 variables and 633 cases. The variables are

age.group: The age group of the passenger (0-9, 10-19, 20-29, 30-39, 40-49, 50-
59, 60+), treated as a factor;

age: The age of the passenger, treated as a continuous variable;

survived: 0 = died, 1 = survived. This is a numeric variable.

pclass: The passenger class (1st, 2nd, 3rd), treated as a factor;

sex: The gender (female, male) of the passenger.

Questions

1. Load the data into R, and make a data frame titanic.df to contain the data.
Check for any typographical errors. [5 marks]

There are several ways to do this. You can download the file titanic.txt onto your
computer, and set the R directory to point to the folder containing the data. You set
the R directory by pulling down the File menu in R, choosing Change dir, and
navigating to the correct folder. Having set the directory, type

titanic.df = read.table(titanic.txt, header=T)

Another way that is more convenient is to load the data directly from the web
site. (You have to be connected to the internet to do this.) Type

titanic.df =
read.table("http://www.stat.auckland.ac.nz/~lee/330/datasets.dir/titanic.
txt", header=T)

You can cut and paste the URL for the data directly from the browser. To check for
typographical errors, we can just inspect the values of the variables to see if there are
any typos. You can proofread the file, but a simpler way is just to inspect the unique
(i.e. distinct) values of the variables. Type

> unique(titanic.df$age.group)
[1] 20-29 0-9 30-39 40-49 60-69 50-59 70-79 10-19 100-19
Levels: 0-9 10-19 100-19 20-29 30-39 40-49 50-59 60-69 70-79

1
Here we see that there is a typo : 100-19 is not a valid age group. It probably should
be 10-19, so we will correct it to that. Which case is the offending one?

By typing

> titanic.df[titanic.df$age.group=="100-19",]
pclass survived age sex age.group
633 3rd 0 19 male 100-19

we see that it is case number 633. Correct the original file and read the data in again
Alternatively, we can correct it by

titanic.df[633,5]="10-19"

We also have to readjust the factor to eliminate the incorrect level:

titanic.df$age.group = factor(titanic.df$age.group)

since age.group is the 5th variable in the data frame. The other data can be checked
similarly, but are OK.

2. Make an additional variable by turning the numeric variable survived into a

factor with levels survived and died. [5 marks]

Lets call the new variable survival. Make it by typing

survival = factor(titanic.df$survived, labels=c(Died, Survived))

Add the new variable to the data frame, calling the result titanic2.df

titanic2.df = data.frame(survival, titanic.df)

3. What is the relationship between survival and age? Does it depend on class
and gender? Draw a suitable trellis plot to answer this question. Dont try and
fit any models. [10 marks, 5 for the plot and 5 for the discussion]

We will draw a trellis plot of survival versus age, using gender and class as
conditioning factors.

library(lattice) # need to load the lattice library

trellis.par.set(background = list(col = "white"),

dot.symbol=list(col="darkblue"))

# trellis.par.set is the method for changing the background

# colours and plot symbol colours (this is optional, but these colours
are easier to see when printed)

2
0 20 40 60

pclass : 3rd pclass :3rd

sex: female sex:male

Survived

Died

pclass : 2nd pclass :2nd

sex: female sex:male

Survived
Survival

Died

pclass : 1st pclass :1st

sex: female sex:male

Survived

Died

0 20 40 60

Age

Interpretation: For the third class passengers, the ages seem similar for those who
died and those who survived. Note that relatively few third class passengers survived.
For the second and first class passengers, the survivors tended to be younger than
those who died, with the exception of female first class passengers. (There are too
few of these who died for any pattern to emerge.)

4. For each combination of age group, class and gender, calculate the fraction of
passengers that survived. Present your results in a table. [10 marks]

The R function table is useful for this.

my.table =
table(titanic2.df$age.group,titanic2.df$pclass,titanic2.df$sex,
titanic2.df$survival)

> my.table

3
, , = female, = Died

1st 2nd 3rd

0-9 1 0 5
10-19 0 1 7
20-29 1 4 8
30-39 1 3 5
40-49 0 1 4
50-59 1 1 0
60-69 1 0 0
70-79 0 0 0

, , = male, = Died

1st 2nd 3rd

0-9 0 0 7
10-19 2 10 21
20-29 10 43 53
30-39 18 30 23
40-49 22 13 14
50-59 16 8 1
60-69 11 1 1
70-79 3 1 0

, , = female, = Survived

1st 2nd 3rd

0-9 0 9 4
10-19 13 11 11
20-29 20 23 4
30-39 19 19 8
40-49 19 9 1
50-59 18 4 0
60-69 7 0 0
70-79 0 0 0

, , = male, = Survived

1st 2nd 3rd

0-9 3 11 6
10-19 3 1 2
20-29 10 4 6
30-39 12 4 3
40-49 10 1 1
50-59 4 0 0
60-69 1 0 0
70-79 0 0 0

The object my.table is an array this is like a matrix but in this case has 4
dimensions. We can make separate tables of just the survivors and just those

4
who survived by subsetting:

> survivors=my.table[,,,2] # survivors are second level

> died = my.table[,,,1] # died are first level
> fraction = survivors/(died + survivors)
> round(fraction,3) # rounds to 3 decimal places

, , = female

1st 2nd 3rd

0-9 0.000 1.000 0.444
10-19 1.000 0.917 0.611
20-29 0.952 0.852 0.333
30-39 0.950 0.864 0.615
40-49 1.000 0.900 0.200
50-59 0.947 0.800 NaN
60-69 0.875 NaN NaN
70-79 NaN NaN NaN

, , = male

1st 2nd 3rd

0-9 1.000 1.000 0.462
10-19 0.600 0.091 0.087
20-29 0.500 0.085 0.102
30-39 0.400 0.118 0.115
40-49 0.312 0.071 0.067
50-59 0.200 0.000 0.000
60-69 0.083 0.000 0.000
70-79 0.000 0.000 NaN

An alternative way to make the table is to use the fact that, for binary
(0/1) data like the variable survived, the proportion of ones is just
the mean. We can calculate the mean (ie the proportion surviving) for
each age group/class./sex combination by using the R function tapply:

> survival.frac = tapply(titanic2.df$survived,

list(titanic2.df$age.group,titanic2.df$pclass,titanic2.df$sex), mean)
> round(survival.frac,3)

, , female

1st 2nd 3rd

0-9 0.000 1.000 0.444
10-19 1.000 0.917 0.611
20-29 0.952 0.852 0.333
30-39 0.950 0.864 0.615
40-49 1.000 0.900 0.200
50-59 0.947 0.800 NA
60-69 0.875 NA NA
70-79 NA NA NA

5
, , male

1st 2nd 3rd

0-9 1.000 1.000 0.462
10-19 0.600 0.091 0.087
20-29 0.500 0.085 0.102
30-39 0.400 0.118 0.115
40-49 0.312 0.071 0.067
50-59 0.200 0.000 0.000
60-69 0.083 0.000 0.000
70-79 0.000 0.000 NA

5. How does the fraction surviving depend on age group, gender and class? Draw
another Trellis plot to explore this. [10 marks, 5 for the plot and 5 for the
discussion]

First, we need to convert the table sentries into a data frame, with extra variables
indicating the factor levels. The R function expand.grid does this job nicely:

survival.frac.df = data.frame(frac = as.vector(survival.frac), expand.grid(

age.group=c("0-9","10-19","20-29","30-39","40-49","50-59","60-69","70-79"),
pclass=c("1st","2nd","3rd"),
sex=c("female", "male")))

> survival.frac.df
frac age.group pclass sex
1 0.000 0-9 1st male
2 1.000 10-19 1st male
3 0.952 20-29 1st male
4 0.950 30-39 1st male
5 1.000 40-49 1st male
6 0.947 50-59 1st male
7 0.875 60-69 1st male
8 NA 70-79 1st male
9 1.000 0-9 2nd male
10 0.917 10-19 2nd male
11 0.852 20-29 2nd male
12 0.864 30-39 2nd male
13 0.900 40-49 2nd male
14 0.800 50-59 2nd male
15 NA 60-69 2nd male
16 NA 70-79 2nd male
17 0.444 0-9 3rd male
. 48 lines in all

Then we draw the dot plot

dotplot(frac~age.group|sex*pclass,
data=survival.frac.df,
xlab="Age",
ylab="Survival fraction",
strip=function(...)strip.default(..., strip.names=T))

6
pclass:3rd pclass :3rd
sex:female sex:male
1.0

0.8

0.6

0.4

0.2

0.0
pclass:2nd pclass :2nd
sex:female sex:male
1.0
Survival fraction

0.8

0.6

0.4

0.2

0.0
pclass:1st pclass :1st
sex:female sex:male
1.0

0.8

0.6

0.4

0.2

0.0
0-9 10-19 20-29 30-39 40-49 50-59 60-69 70-79 0-9 10-19 20-29 30-39 40-49 50-59 60-69 70-79

Age

Discussion: It is clear from the plot that survival was much higher for females than
males for all classes, although the higher the class, the higher the survival. Survival
was more likely for younger persons. This trend is particularly pronounced for the first
class males. Age did not have such a strong effect on survival for the other
categories.

Sounds like the movie got it right!

Screenshot 2023-10-25 at 5.37.21 PM
No ratings yet
Screenshot 2023-10-25 at 5.37.21 PM
195 pages
1 - A LEVEL GRAMMAR-VOCAB 2 (7-15. Haftalar) Cevap Anahtarı
No ratings yet
1 - A LEVEL GRAMMAR-VOCAB 2 (7-15. Haftalar) Cevap Anahtarı
52 pages
Turning in Circles
100% (1)
Turning in Circles
31 pages
Life Table Approach
100% (1)
Life Table Approach
32 pages
Sas Notes Module 4-Categorical Data Analysis Testing Association Between Categorical Variables
100% (1)
Sas Notes Module 4-Categorical Data Analysis Testing Association Between Categorical Variables
16 pages
Week 02
No ratings yet
Week 02
36 pages
Quantitive Research - Assignments
No ratings yet
Quantitive Research - Assignments
68 pages
Can We Really Live Longer - A Machine Learning Study - by Nicolasdealba - Medium
No ratings yet
Can We Really Live Longer - A Machine Learning Study - by Nicolasdealba - Medium
34 pages
What Can We Learn From: Disasters?
No ratings yet
What Can We Learn From: Disasters?
20 pages
JOM Article On The Titanic - Did A Metallurgical Failure Cause A Night To Remember - Ok
No ratings yet
JOM Article On The Titanic - Did A Metallurgical Failure Cause A Night To Remember - Ok
12 pages
Life Table
No ratings yet
Life Table
35 pages
Lab Manual BIO330 PDF
No ratings yet
Lab Manual BIO330 PDF
28 pages
6th Grade Error Correction
100% (2)
6th Grade Error Correction
3 pages
Chapter 2
No ratings yet
Chapter 2
28 pages
Titanic Prediction
No ratings yet
Titanic Prediction
53 pages
Actuarial cs2
No ratings yet
Actuarial cs2
4 pages
Life Table
No ratings yet
Life Table
21 pages
Analysis of Data
No ratings yet
Analysis of Data
19 pages
Exam Template: Statistical Inference: Instructions To Students
No ratings yet
Exam Template: Statistical Inference: Instructions To Students
22 pages
Big Bang 00
No ratings yet
Big Bang 00
30 pages
PracticeProblems FinalExam Solutions
No ratings yet
PracticeProblems FinalExam Solutions
15 pages
Topic 3: The Life Tables: Course Instructor: MR. O.KIGAHE
No ratings yet
Topic 3: The Life Tables: Course Instructor: MR. O.KIGAHE
73 pages
Digital Strategy
No ratings yet
Digital Strategy
4 pages
Lab 1 - Data, Frequency Tables and Histograms (20042023) - Picture
No ratings yet
Lab 1 - Data, Frequency Tables and Histograms (20042023) - Picture
14 pages
10 - Eda To Prediction Dietanic
No ratings yet
10 - Eda To Prediction Dietanic
21 pages
Acs 411 Sample Paper
No ratings yet
Acs 411 Sample Paper
5 pages
Group 1 Project Report DA
No ratings yet
Group 1 Project Report DA
65 pages
Conditional Sentences TITANIC
0% (1)
Conditional Sentences TITANIC
2 pages
Titanic Facts
No ratings yet
Titanic Facts
7 pages
Stat220 Collabora, Ve Learning (Slic) Workshop # 1
No ratings yet
Stat220 Collabora, Ve Learning (Slic) Workshop # 1
8 pages
R Test - MBA
No ratings yet
R Test - MBA
15 pages
Dataset Visualization Basic Ml-1
No ratings yet
Dataset Visualization Basic Ml-1
12 pages
Dsbdalab 8
No ratings yet
Dsbdalab 8
8 pages
L1 MultivDescriptive
No ratings yet
L1 MultivDescriptive
11 pages
08 Titanic
No ratings yet
08 Titanic
19 pages
Project
No ratings yet
Project
7 pages
Divine Mathematics Like You Have Never Seen Before: You Will Enter an Area That Will Show You From Where Arises All the Diversity of This Ours Monolithic World
From Everand
Divine Mathematics Like You Have Never Seen Before: You Will Enter an Area That Will Show You From Where Arises All the Diversity of This Ours Monolithic World
Nenad Ilic
No ratings yet
MATH& 146 Lesson 11: Section 1.6
No ratings yet
MATH& 146 Lesson 11: Section 1.6
34 pages
Sac400-Lesson 2
No ratings yet
Sac400-Lesson 2
12 pages
Unit 3 STB554
No ratings yet
Unit 3 STB554
14 pages
Life Table
No ratings yet
Life Table
15 pages
Week 4 exercises-SOLN
No ratings yet
Week 4 exercises-SOLN
6 pages
English !ST Lang B
No ratings yet
English !ST Lang B
5 pages
Something's Alive On The Titanic
100% (1)
Something's Alive On The Titanic
200 pages
Titanic MD File
No ratings yet
Titanic MD File
8 pages
Statistics: An Introduction Using R by M.J. Crawley Exercises
No ratings yet
Statistics: An Introduction Using R by M.J. Crawley Exercises
29 pages
Dsbdalab 9
No ratings yet
Dsbdalab 9
4 pages
Atestat Egleza
No ratings yet
Atestat Egleza
10 pages
Why Did The Titanic Sank
100% (1)
Why Did The Titanic Sank
3 pages
WEEK 3 Activity - Assignment 1
No ratings yet
WEEK 3 Activity - Assignment 1
5 pages
Cemetery Lab 2015-16
No ratings yet
Cemetery Lab 2015-16
8 pages
Ps Project
No ratings yet
Ps Project
6 pages
Unit 540 Differences Between Two Groups Without Answers
No ratings yet
Unit 540 Differences Between Two Groups Without Answers
5 pages
Syns 11
No ratings yet
Syns 11
4 pages
What Are Decision Trees?
No ratings yet
What Are Decision Trees?
9 pages
Summarising Categorical Variables in R
No ratings yet
Summarising Categorical Variables in R
4 pages
POP HumanPopCemDemoLab
No ratings yet
POP HumanPopCemDemoLab
11 pages
112 BinaryLogisticRegression
No ratings yet
112 BinaryLogisticRegression
4 pages
DW 20
No ratings yet
DW 20
5 pages
Ex 3
No ratings yet
Ex 3
4 pages
Assignment 1 - TITANIC
No ratings yet
Assignment 1 - TITANIC
6 pages
Combining Sentences
No ratings yet
Combining Sentences
19 pages
Mini-Lesson 1: Section 1.1: Order of Operations
No ratings yet
Mini-Lesson 1: Section 1.1: Order of Operations
11 pages
Methylated Spirits Msds 56454 Jun08
No ratings yet
Methylated Spirits Msds 56454 Jun08
8 pages
Methylated Spirits Msds 56454 Jun08
No ratings yet
Methylated Spirits Msds 56454 Jun08
8 pages
Zimbabwe Life Tables
No ratings yet
Zimbabwe Life Tables
13 pages
grp2 Ex 7
No ratings yet
grp2 Ex 7
6 pages
Department of Biological Sciences, College of Science, University of Santo Tomas, Manila
No ratings yet
Department of Biological Sciences, College of Science, University of Santo Tomas, Manila
10 pages
RMS Titanic Was A British Passenger: Olympic - Class Ocean Liners
No ratings yet
RMS Titanic Was A British Passenger: Olympic - Class Ocean Liners
1 page
Msds SDH
No ratings yet
Msds SDH
2 pages
1.1 Objective: 2. Data Preparation and Exploratory Analysis
No ratings yet
1.1 Objective: 2. Data Preparation and Exploratory Analysis
11 pages
Cemetery Lab
No ratings yet
Cemetery Lab
5 pages
Exercise 7: Taking A Trip Down "Memorial" Lane: Estimating Population Indices in A Cemetery
No ratings yet
Exercise 7: Taking A Trip Down "Memorial" Lane: Estimating Population Indices in A Cemetery
6 pages
Life Table: Producers of Their de
No ratings yet
Life Table: Producers of Their de
5 pages
Titanic Summary
No ratings yet
Titanic Summary
3 pages
Cemeterypopulationstudy Matthysvisser
No ratings yet
Cemeterypopulationstudy Matthysvisser
5 pages
Multilevel Models in R Presente and Future
No ratings yet
Multilevel Models in R Presente and Future
8 pages
Demography Using Cemetery Data
No ratings yet
Demography Using Cemetery Data
5 pages
Jewish
No ratings yet
Jewish
56 pages
Homework 3
No ratings yet
Homework 3
3 pages
Discourse Analysis - Titanic Essay
No ratings yet
Discourse Analysis - Titanic Essay
13 pages
Special Methylated Spirit 70% Untinted: Material Safety Data Sheet
No ratings yet
Special Methylated Spirit 70% Untinted: Material Safety Data Sheet
3 pages
A. Reading Comprehension
No ratings yet
A. Reading Comprehension
24 pages
File Vertex Ortoplast Polvere Scheda Di Sicurezza
No ratings yet
File Vertex Ortoplast Polvere Scheda Di Sicurezza
6 pages
Modelling Wax Iw421xx
No ratings yet
Modelling Wax Iw421xx
4 pages
Msds B50
No ratings yet
Msds B50
3 pages
Titanic Commemoration Service
No ratings yet
Titanic Commemoration Service
11 pages
Vertex Self Curing Liquid
No ratings yet
Vertex Self Curing Liquid
18 pages
Iceberg Ahead
No ratings yet
Iceberg Ahead
18 pages
Robots Mean Business A Conversation With Rodney Brooks
No ratings yet
Robots Mean Business A Conversation With Rodney Brooks
4 pages
Titanic Clean
No ratings yet
Titanic Clean
21 pages
Wuct121 Discrete Mathematics Final Exam Autumn 2008 Marking Guide
No ratings yet
Wuct121 Discrete Mathematics Final Exam Autumn 2008 Marking Guide
13 pages
Discri, Bully and Haras
No ratings yet
Discri, Bully and Haras
10 pages
Homework Topic 1&2.: Plus 20
No ratings yet
Homework Topic 1&2.: Plus 20
11 pages
Government Response: Pricing VET Under Smart and Skilled - Final Report
No ratings yet
Government Response: Pricing VET Under Smart and Skilled - Final Report
7 pages
Reading Comprehension
No ratings yet
Reading Comprehension
27 pages
IELTS Simulation Reading Test 1
No ratings yet
IELTS Simulation Reading Test 1
4 pages
The Titanic Story
100% (1)
The Titanic Story
1 page
2012 Assessment: Specialist Maths 2 GA 3 Exam © Victorian Curriculum and Assessment Authority 2013 1
No ratings yet
2012 Assessment: Specialist Maths 2 GA 3 Exam © Victorian Curriculum and Assessment Authority 2013 1
9 pages
English Test: A Reading Comprehension
No ratings yet
English Test: A Reading Comprehension
3 pages
The Titanic
No ratings yet
The Titanic
2 pages
Dentaurum Rema Exakt, Rema Exakt F Mixing Liquid
No ratings yet
Dentaurum Rema Exakt, Rema Exakt F Mixing Liquid
6 pages
Wacker Catalyst T 35
No ratings yet
Wacker Catalyst T 35
6 pages
Lab 7 B
No ratings yet
Lab 7 B
2 pages
Eureka! Eureka! - Inventions and Technology Through The Ages
No ratings yet
Eureka! Eureka! - Inventions and Technology Through The Ages
11 pages
b2 Angol Irasbeli Egynyelvu
No ratings yet
b2 Angol Irasbeli Egynyelvu
9 pages
In 1912 With The Band Playing On The Deck, The Ocean Liner TITANIC Sink At.2:27 A.M in The North Atlantic
No ratings yet
In 1912 With The Band Playing On The Deck, The Ocean Liner TITANIC Sink At.2:27 A.M in The North Atlantic
2 pages
The Defining Turning Point
No ratings yet
The Defining Turning Point
4 pages
THE TITANICreading Exercise
No ratings yet
THE TITANICreading Exercise
3 pages

Department of Statistics: Course Stats 330

Uploaded by

Department of Statistics: Course Stats 330

Uploaded by

Department of Statistics

COURSE STATS 330

Model answers for Assignment 1, 2007.

age: The age of the passenger, treated as a continuous variable;

survived: 0 = died, 1 = survived. This is a numeric variable.

pclass: The passenger class (1st, 2nd, 3rd), treated as a factor;

sex: The gender (female, male) of the passenger.

titanic.df = read.table(titanic.txt, header=T)

We also have to readjust the factor to eliminate the incorrect level:

2. Make an additional variable by turning the numeric variable survived into a

Lets call the new variable survival. Make it by typing

survival = factor(titanic.df$survived, labels=c(Died, Survived))

titanic2.df = data.frame(survival, titanic.df)

library(lattice) # need to load the lattice library

trellis.par.set(background = list(col = "white"),

# trellis.par.set is the method for changing the background

pclass : 3rd pclass :3rd

pclass : 2nd pclass :2nd

pclass : 1st pclass :1st

The R function table is useful for this.

1st 2nd 3rd

1st 2nd 3rd

1st 2nd 3rd

1st 2nd 3rd

> survivors=my.table[,,,2] # survivors are second level

1st 2nd 3rd

1st 2nd 3rd

> survival.frac = tapply(titanic2.df$survived,

1st 2nd 3rd

1st 2nd 3rd

survival.frac.df = data.frame(frac = as.vector(survival.frac), expand.grid(

Then we draw the dot plot

Sounds like the movie got it right!

You might also like