0% found this document useful (0 votes)

217 views4 pages

FIT3152 Data Analytics. Tutorial 01: Introduction To R. Review of Basic Statistics

This document provides an introduction to the R programming language and basic statistics concepts for a data analytics tutorial. It includes example analysis activities and challenges using various datasets. Tips are provided for each activity to help students complete the tasks in R. The additional notes section defines key statistical concepts like hypothesis testing, regression, histograms, and correlation.

Uploaded by

hazel nutt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

217 views4 pages

FIT3152 Data Analytics. Tutorial 01: Introduction To R. Review of Basic Statistics

Uploaded by

hazel nutt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

FIT3152 Data analytics.

Tutorial 01:
Introduction to R. Review of basic statistics
Note: Hints and tips will be added to this document during Weeks 1 and 2.

Group Questions:
1. Tutorial Activity as a class or in groups: Compare the two figures in Lecture 1 showing
Internet activity over 2018/2019, and 2020 (Slides 5 and 6). Answer the following:
• What are the trends – that is, what types of online activities are increasing in
prevalence?
• Looking at a particular activity, what types of data could be collected?
• What could that data be used to study?
• What changes might be due to COVID-19?

Tips For example, think about Instagram.

The data collected is: time spent scrolling, number of likes, number of
followers, number following, number of posts made, frequency of posting,
amount of private messaging, ...
Time spent scrolling could be used as an indicator of how much free time
that person has.
The number of followers might be a guide to the size of their friendship
circle.
...

2. Tutorial Activity as a class or in groups: Using the examples of applications of data science
in the real world as inspiration, find a recent application of data science from the media.
Answer the following:
• What is the problem to be solved?
• What type of data is collected?
• What type of analysis is performed?
• What is the outcome?
• How might you use this data to investigate another aspect of (human) activity?

Present your findings in Tutorial 1 as a 2 – 3-minute verbal report as part of the group discussion.

Tips Sources of news articles you might find useful are:

https://www.theage.com.au/
https://www.nature.com/
https://www.news.com.au/

1
Individual/Small Group Questions:
Note, much of the data for the following questions has been sourced from
http://www.statsci.org/datasets.html and links within.

1. Using the data sets provided as csv files and the lecture notes, try and reproduce all of the
statistics and graphics from Lecture 1.

Files are: {InvestA, InvestB, Toothbrush, Workers, Food Retail 2014-2020 time series}.csv

Tips If you are having trouble reading files into R, put your data file on to
the desktop and set desktop as your working directory.

2. The following data records the length of rivers in the South Island of New Zealand. The
lengths are given in kilometres. Data is grouped depending on where it flows into. Source:
http://www.statsci.org/data/oz/nzrivers.html

Pacific Ocean:
209, 48, 169, 138, 64, 97, 161, 95, 145, 90, 121, 80, 56, 64, 209, 64, 72, 288, 322.

Tasman Sea:
76, 64, 68, 64, 37, 32, 32, 51, 56, 40, 64, 56, 80, 121, 177, 56, 80, 35, 72, 72, 108, 48.

(a) Calculate the summary stats for each group of rivers. Draw a boxplot.
(b) Test the hypothesis that rivers flowing into the Tasman Sea are shorter on average than
those flowing into the Pacific Ocean. Use a significance of 1%

Tips Will be added here.

3. When anthropologists analyze human skeletal remains, an important piece of information is

living stature. Since skeletons are commonly based on statistical methods that utilize
measurements on small bones, the following data was presented in a paper in the American
Journal of Physical Anthropology to validate one such method. Variables are: MetaCarp –
Metacarpal bone length in cm, Stature (Height of skeleton) in cm. Source:
http://www.statsci.org/data/general/stature.html

MetaCarp Stature
45 171
51 178
39 157
41 163
48 172
49 183
46 173
43 175
47 173

Draw a scatterplot of the data with Stature as the vertical axis. Calculate the regression
equation predicting Stature from MetaCarp. Comment on the accuracy of the model.
Superimpose the line of best fit on your scatterplot.

2
Tips If you are having trouble adding an extra line on scatter plot, check out
the abline function.

4. The ocean swell produces spectacular eruptions of water through a hole in the cliff at Kiama,
about 120km south of Sydney, known as the Blowhole. The times at which 65 successive
eruptions occurred from 1340 hours on 12 July 1998 were observed using a digital watch.
Source: http://www.statsci.org/data/oz/kiama.html

Challenge: download the data into R directly from: http://www.statsci.org/data/oz/kiama.txt

(See ATHR page 18) or alternatively use the file: Data: kiama.txt

Read these data into R, creating a vector named ‘kiama’. Calculate the mean, standard
deviation. Draw the default histogram. Using help, try and draw an improved histogram of
your own design by changing range, class intervals and colour etc.

Tips Will be added here.

5. The timber data are for specimens of 50 varieties of timber, for modulus of rigidity, modulus
of elasticity and air dried density, arranged in increasing order of magnitude of the density.
Source: http://www.statsci.org/data/oz/timber.html

Read these data into R, creating a data frame named ‘timber’. You can use the data file:
timber.txt or load directly from: http://www.statsci.org/data/oz/timber.txt

(a) which variable: elasticity or density is a better predictor of rigidity?

(b) using your choice of variable calculate the regression equation predicting rigidity, draw a
scatterplot of the data, showing the fitted model.

(c) challenge: calculate the regression equation predicting rigidity as a function of both
elasticity and density. Comment on the quality of your model vs the single predictor in (b).

Tips If you want to load the data from an online data source, you can use the
path to the file on web (such as
“http://www.statsci.org/data/oz/timber.txt”) together with the same
functions that are used to load a local file into R.

Correlation between a predictor and the response variable can be a good

measure to assess quality of the predictor.

Also, to assess quality of a model, one option could be considering R-

squared value to see how good a fit the model is to the data.

6. Challenge: Using the data: InvestA.csv draw a boxplot. You will need to use the help file to
work out the syntax – try ?boxplot as a starting point...

Using the data: InvestA.csv, now use the ‘aggregate’ function to calculate the mean of each
group. This is similar to the ‘tapply’ function but returns a data frame. Use help to work out
the syntax...

Tips If you are having trouble finding out how to group data inside the
boxplot, have a closer look at the “formula” argument of boxplot
function. Also, it is possible to use formula argument in aggregate
function. in the same way with boxplot function.

3
7. Analyse Victorian Retail Turnover: for Food retailing; Household goods retailing; Clothing,
footwear and personal accessory retailing; Department stores; Other retailing; Cafes,
restaurants and takeaway food service for the period Jan 2010 – Dec 2020 using Australian
Bureau of Statistics data. You will need to copy the data from the Excel file: 8501.0 Retail
Trade, Australia.xls.

Draw a time series plot of the data and plot the time series decomposition. Comment on the
main elements in the time series.

Can you see any COVID-19 effects during 2020?

Tips It is an .xls file with more than one pages, if you are having trouble
loading the data located at the second page, you might try to copy the
data on to the clipboard and load into R from clipboard. This operation
works differently for Mac, Windows and Linux. Quick Google search can be
the way to go!

Additional Statistics Notes:

Q2) Hypothesis Testing and p-value
In a statistical hypothesis test, a hypothesis about a certain population parameter is tested using
sample data usually consisting of one or two groups.
The hypothesis is evaluated against a null hypothesis that assumes equality or no difference
between the group(s) and the value being tested.
p-values evaluate how well the used sample data support the argument of the null hypothesis.
High p-values indicate that the sample data is likely with a true null hypothesis, while a low p-value
indicates that the data is unlikely with a true null.
Therefore, a low p-value indicates that there is more evidence to conclude that the tested hypothesis
is true against the null hypothesis.
Compare p-value against the desired significance to determine the likelihood that the hypothesis is
true.

Q3 )Regression
Provides a model for the relationship between different variables with the primary aim of
prediction.
Simple linear regression models the relationship between one dependent variable and an associated
independent variable.
Can plot expected value of dependent variable for all possible values of the independent variable
using the derived formula.
Minimize sum of squared error (squared residuals) to come up with best fit.

Q4) Histogram
A histogram graphically displays the shape and spread of data.
Groups data into a number of class intervals and counts the frequency in each.

Q5) Correlation and Prediction

Correlation indicates the degree of association between two variables.
Usually quantifies the strength of the linear relationship between the variables (from -1 to +1).
Positive correlation – value of one variable increases with the increase in the second variable.
Negative Correlation – value of one variable decreases with the increase in the second variable
Higher association between two variables (positive or negative correlation) indicates a better
predictive capability.

Parrot OS Tools
No ratings yet
Parrot OS Tools
56 pages
Functional Data Analysis With R and Matlab
No ratings yet
Functional Data Analysis With R and Matlab
220 pages
Unit 2
No ratings yet
Unit 2
32 pages
A Learning Guide To R PDF
0% (1)
A Learning Guide To R PDF
255 pages
AutoREID Operation Manual en V1 9
No ratings yet
AutoREID Operation Manual en V1 9
51 pages
Applied Statistics For Bioinformatics PDF
No ratings yet
Applied Statistics For Bioinformatics PDF
278 pages
Verzani Answers
100% (8)
Verzani Answers
94 pages
Visual Statistics Use R!
50% (2)
Visual Statistics Use R!
388 pages
Visual Statistics Use R PDF
No ratings yet
Visual Statistics Use R PDF
388 pages
Applied Statistics For Bioinformatics Using R
100% (2)
Applied Statistics For Bioinformatics Using R
279 pages
Topographic Survey of Comprehensive Secondary School Nawfia, Anambra State
100% (1)
Topographic Survey of Comprehensive Secondary School Nawfia, Anambra State
8 pages
For Final Year Project Overview
No ratings yet
For Final Year Project Overview
11 pages
Your E-Admit Card
No ratings yet
Your E-Admit Card
4 pages
Scholar Advacned Higher Maths Unit 1
No ratings yet
Scholar Advacned Higher Maths Unit 1
274 pages
An R Companion To Statistical Thinking For The 21st Century
No ratings yet
An R Companion To Statistical Thinking For The 21st Century
159 pages
Rnotes Main
0% (1)
Rnotes Main
263 pages
Visual Statistics Use R
No ratings yet
Visual Statistics Use R
451 pages
Shipunov Visual Statistics
No ratings yet
Shipunov Visual Statistics
429 pages
Analysing Data Using Linear Models 5th Ed January 2021
No ratings yet
Analysing Data Using Linear Models 5th Ed January 2021
388 pages
Research Method Using R
No ratings yet
Research Method Using R
442 pages
R Notes
No ratings yet
R Notes
263 pages
Krijnen IntroBioInfStatistics
No ratings yet
Krijnen IntroBioInfStatistics
278 pages
Introduction To R: Exercises: Aboratory For Pplied Tatistics Elle Ørensen Niversity of Openhagen Ugust
No ratings yet
Introduction To R: Exercises: Aboratory For Pplied Tatistics Elle Ørensen Niversity of Openhagen Ugust
42 pages
Statistical Analysis and Visualizations Using R: Okan Bulut
No ratings yet
Statistical Analysis and Visualizations Using R: Okan Bulut
96 pages
STAT319 Lab Manual Based On R - Final Version
No ratings yet
STAT319 Lab Manual Based On R - Final Version
127 pages
Rnotes Main
No ratings yet
Rnotes Main
265 pages
Product Catalogue en
No ratings yet
Product Catalogue en
40 pages
Chapter - 03 - Review of Basic Data
No ratings yet
Chapter - 03 - Review of Basic Data
92 pages
Industrial Training Report: E-Learning
No ratings yet
Industrial Training Report: E-Learning
53 pages
List of Programs in R 2 Sem
No ratings yet
List of Programs in R 2 Sem
48 pages
R Short Course
No ratings yet
R Short Course
40 pages
Udyam Registration Certificate - The Lord's Family Spa
No ratings yet
Udyam Registration Certificate - The Lord's Family Spa
2 pages
ASM 1 Thay Duong
No ratings yet
ASM 1 Thay Duong
8 pages
R Statistics
No ratings yet
R Statistics
124 pages
R Corregr
No ratings yet
R Corregr
147 pages
Journal Marking Guide 21
No ratings yet
Journal Marking Guide 21
1 page
Econometrics I - R Summary (Maite Cabeza-Gutes)
No ratings yet
Econometrics I - R Summary (Maite Cabeza-Gutes)
77 pages
Apunts BLOC 1 Estadística
No ratings yet
Apunts BLOC 1 Estadística
15 pages
Circuit Theory Lec 4
No ratings yet
Circuit Theory Lec 4
34 pages
Mobile Virus and Security
No ratings yet
Mobile Virus and Security
25 pages
CAM625 2019 s1 Module1
No ratings yet
CAM625 2019 s1 Module1
31 pages
Module - 4 (R Training) - Basic Stats & Modeling
No ratings yet
Module - 4 (R Training) - Basic Stats & Modeling
15 pages
R Practicals
No ratings yet
R Practicals
32 pages
Data - Analysis - With - R - 24
No ratings yet
Data - Analysis - With - R - 24
47 pages
Unit 03 The Motherboard
No ratings yet
Unit 03 The Motherboard
17 pages
S K Y Guide PDF
No ratings yet
S K Y Guide PDF
26 pages
Advanced Statistical Methods Using R Notes
No ratings yet
Advanced Statistical Methods Using R Notes
55 pages
X - 15 x-1 2. Print ('Hello Word!') ## (1) "Hello Word!" 3. X - 4 y - 5 Z - X+y Print (Z) 4. X - 4 y - 5 Cat ('The Sum of X and y Is', X+y)
No ratings yet
X - 15 x-1 2. Print ('Hello Word!') ## (1) "Hello Word!" 3. X - 4 y - 5 Z - X+y Print (Z) 4. X - 4 y - 5 Cat ('The Sum of X and y Is', X+y)
15 pages
Introduction Qr1
No ratings yet
Introduction Qr1
34 pages
Stats Lab1
No ratings yet
Stats Lab1
11 pages
Lec448B 20160406
No ratings yet
Lec448B 20160406
30 pages
Deneesha Tharunika Sooriyaarachchi CL-HDCSE-CMU-102-40 CSE5014 1668472 412159309
No ratings yet
Deneesha Tharunika Sooriyaarachchi CL-HDCSE-CMU-102-40 CSE5014 1668472 412159309
15 pages
DSR 2879
No ratings yet
DSR 2879
25 pages
Statistics With R
No ratings yet
Statistics With R
10 pages
P2P Technology User's Manual
No ratings yet
P2P Technology User's Manual
8 pages
BSc. AC-Sem IV
No ratings yet
BSc. AC-Sem IV
19 pages
Module2 BDA
No ratings yet
Module2 BDA
44 pages
Unit - I: Topic - 1
No ratings yet
Unit - I: Topic - 1
13 pages
Twitter Return Vs S&P 500 Return
No ratings yet
Twitter Return Vs S&P 500 Return
7 pages
Instant Download Antennas From Theory To Practice 1st Edition Huang Y. PDF All Chapters
No ratings yet
Instant Download Antennas From Theory To Practice 1st Edition Huang Y. PDF All Chapters
51 pages
Commands For Data Analysis Using R
No ratings yet
Commands For Data Analysis Using R
11 pages
Business Analytics - L2
No ratings yet
Business Analytics - L2
41 pages
2) Final Question Bank - DA-QB
No ratings yet
2) Final Question Bank - DA-QB
8 pages
Everyday Narrative Report Format
No ratings yet
Everyday Narrative Report Format
10 pages
Incomm-Dollar Tree Montly Continuity - October Project Update - 092823 - PN - 2677
No ratings yet
Incomm-Dollar Tree Montly Continuity - October Project Update - 092823 - PN - 2677
10 pages
DEV Lab Manual
No ratings yet
DEV Lab Manual
27 pages
R Viva Ques
No ratings yet
R Viva Ques
24 pages
STATISTICS
No ratings yet
STATISTICS
6 pages
Assignment 1 ISOM2500 2025spring
No ratings yet
Assignment 1 ISOM2500 2025spring
5 pages
Farhad Salih Ahmed (Esam)
No ratings yet
Farhad Salih Ahmed (Esam)
8 pages
Ds Imp Qs
No ratings yet
Ds Imp Qs
4 pages
Preparing A Static PDF Form
No ratings yet
Preparing A Static PDF Form
6 pages
Assignment02 With Instructions
No ratings yet
Assignment02 With Instructions
7 pages
Pracal Labexamsamplequestions
No ratings yet
Pracal Labexamsamplequestions
35 pages
Michael Dubois III: Dataclay - Motion Graphics Artist
No ratings yet
Michael Dubois III: Dataclay - Motion Graphics Artist
2 pages
CSE 312-Introduction To Statistical Tools in Research - Question Bank
No ratings yet
CSE 312-Introduction To Statistical Tools in Research - Question Bank
6 pages
Rizwan Shoukat PDF
No ratings yet
Rizwan Shoukat PDF
3 pages
Persistent Systems Senior Data Engineer
No ratings yet
Persistent Systems Senior Data Engineer
5 pages
Maximum Possible Questions For Theory Exam Business Analytics
No ratings yet
Maximum Possible Questions For Theory Exam Business Analytics
5 pages
VT 100 Log
No ratings yet
VT 100 Log
4 pages
Cheat Sheet
No ratings yet
Cheat Sheet
3 pages
A Computer Is An Electronic Device That Has Storage
No ratings yet
A Computer Is An Electronic Device That Has Storage
4 pages
Angebot WW ATS 2019
No ratings yet
Angebot WW ATS 2019
3 pages
Devon Hendryx - The Ghost Pop Tape (Remastered + Original) (323857613) - Log
No ratings yet
Devon Hendryx - The Ghost Pop Tape (Remastered + Original) (323857613) - Log
3 pages
DVR Kit 0 NVR Kit - Pricelist (SINSYN-Tech) 2014.10
No ratings yet
DVR Kit 0 NVR Kit - Pricelist (SINSYN-Tech) 2014.10
4 pages
FIT3162 - 3164 - 2021 - S1 - Final Presentation
No ratings yet
FIT3162 - 3164 - 2021 - S1 - Final Presentation
2 pages
Fit3162 s1 21 Journal1
No ratings yet
Fit3162 s1 21 Journal1
2 pages
CS459 - Introduction To Services Computing: Course Information
No ratings yet
CS459 - Introduction To Services Computing: Course Information
2 pages
Unregistered Copy of Bakoma Tex: Midterm Exam Report
No ratings yet
Unregistered Copy of Bakoma Tex: Midterm Exam Report
2 pages
Applied Epic en Us
No ratings yet
Applied Epic en Us
2 pages
FIT 3162-3164 Software Project - Code Demonstration Semester 1, 2021 Week 10 of Semester During Workshop Sessions
No ratings yet
FIT 3162-3164 Software Project - Code Demonstration Semester 1, 2021 Week 10 of Semester During Workshop Sessions
2 pages
Thinking Statistically
From Everand
Thinking Statistically
Anthony Banfield
5/5 (1)
Data Science with R: Beginner to Expert
From Everand
Data Science with R: Beginner to Expert
Narayana Nemani
No ratings yet
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
From Everand
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
Peter Bradley
No ratings yet

FIT3152 Data Analytics. Tutorial 01: Introduction To R. Review of Basic Statistics

Uploaded by

FIT3152 Data Analytics. Tutorial 01: Introduction To R. Review of Basic Statistics

Uploaded by

FIT3152 Data analytics.

Tips For example, think about Instagram.

Tips Sources of news articles you might find useful are:

Tips Will be added here.

3. When anthropologists analyze human skeletal remains, an important piece of information is

Challenge: download the data into R directly from: http://www.statsci.org/data/oz/kiama.txt

Tips Will be added here.

(a) which variable: elasticity or density is a better predictor of rigidity?

Correlation between a predictor and the response variable can be a good

Also, to assess quality of a model, one option could be considering R-

Can you see any COVID-19 effects during 2020?

Additional Statistics Notes:

Q5) Correlation and Prediction

You might also like