0% found this document useful (0 votes)

120 views50 pages

Afin8015 Topic 3 2022 v1

1. Data science is applied across many domains including finance. 2. A typical data science project involves defining the goal, collecting and managing data, building and evaluating models, presenting results, and deploying models. 3. Important steps include defining quantifiable goals, exploring available data, describing data through statistics and visualizations, and ensuring data quality and quantity are sufficient to address the defined goal.

Uploaded by

Kritika Rajput

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

120 views50 pages

Afin8015 Topic 3 2022 v1

Uploaded by

Kritika Rajput

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 50

Financial Data Science Topic-3

' $

AFIN-8015: Financial Data Science

Topic-3: Data Science and Machine Learning Methods(I)

& %

Page-1
Financial Data Science Topic-3

Readings
Chapter-1: Nina Zumel & John Mount (2019). Practical Data Science with R, Second
Edition. Manning Publications.
https://multisearch.mq.edu.au/permalink/f/1od1ft6/TN_safari_s9781617295874
Chapter-1 and 2: Ozdemir, S. (2016). Principles of data science : Learn the techniques
and math you need to start making sense of your data / Sinan Ozdemir.
https://multisearch.mq.edu.au/permalink/f/i7uiug/MQ_ALMA51204622540002171
Chapter-1 and Chapter-2: Boehmke, Brad and Greenwell, Brandon M, Hands-on machine
learning with R (CRC Press, 2019).https://bradleyboehmke.github.io/HOML/
Chapter-1: Sunila Gollapudi. (2016). Practical Machine Learning. Packt
Publishing.https://multisearch.mq.edu.au/permalink/f/1lmkbbh/
TN_pq_ebook_centralEBC4520739
Chapter-9 and Chapter 10: Statistics and Data Analysis for Financial Engineering with R
examples Second Edition
https://multisearch.mq.edu.au/permalink/f/i7uiug/MQ_ALMA51175555040002171

Page-2
Contents

1 Background 5
1.1 Active Data Science Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.1.1 Finance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Life Cycle of a Data Science Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.1 Defining the Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.2 Collect & Manage Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.3 Build the model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2.4 Evaluate & Critique the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.2.5 Present Results & Document . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.2.6 Model Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2 Let’s talk Data 16

2.1 What is Data? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2 Types of Data (Chapter-2 Ozdemir (2016)) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3 Introduction to Machine Learning 18

3.1 What is Machine Learning? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 ML Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Page-3
Financial Data Science Topic-3

3.4 Types of Learning Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.5 Machine Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.6 Subfields of Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.6.1 Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.6.2 Unsupervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4 Linear Regression 30
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2 OLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

References 49

Contents Page-4
Part 1

Background

Page-5
Financial Data Science Topic-3

• We will first discuss some background theory for foundation before jumping into the methods details.

1.1 Active Data Science Domains

• Figure-1 shows active data science domains

Figure 1.1: Data Science Domains (Dasgupta et al., 2018)

Part 1. Background Page-6

Financial Data Science Topic-3

1.1.1 Finance
• Trading in Finance has been using Data Science for decades

• Investment banking, hedge funds, etc., have been using complex models to analyse data and make decision
for sometime.

• Some examples of data science use cases are:

– Credit Risk Modelling and Management

– Loan Fraud and default detection
– Market Basket Analysis
– High Frequency Trading (HFT)
– Forecasting risk return using Machine Learning Methods
– Using alternate data like text data for financial modelling

Part 1. Background Page-7

Financial Data Science Topic-3

1.2 Life Cycle of a Data Science Project

• Figure-2 depicts a typical data science process (chapter-1 (Mount & Zumel, 2019))

Figure 1.2: Data Science Project- Life cycle

Part 1. Background Page-8

Financial Data Science Topic-3

1.2.1 Defining the Goal

• The first task in Data Science is to define a measurable and quantifiable goal. For example, forecast the n-day
ahead price movement in oil prices, is a general goal which needs to be quantifiable and measurable.

• As per Mount & Zumel (2019) (Chapter-1) Chapter-1 from Mount &
Zumel (2019) is the main
– Why do the sponsors want the project in the first place? reference for Section 1.2
– What do they lack, and what do they need? What are they doing to solve the problem now, and why isn’t that
good enough?
– What resources will you need: what kind of data and how much staff?
– Will you have domain experts to collaborate with, and what are the computational resources?
– How do the project sponsors plan to deploy your results?
– What are the constraints that have to be met for successful deployment?

Part 1. Background Page-9

Financial Data Science Topic-3

1.2.2 Collect & Manage Data

• This step is around identifying the data required to achieve the goal.

• This is the stage to initially explore the data, describe (descriptive statistics) and visualise (plots for understand-
ing).

• One of the most important steps.

• Typical questions to ask in this step:

– What data is available to me?

– Will it help me solve the problem?
– Is it enough?
– Is the data quality good enough?

Part 1. Background Page-10

Financial Data Science Topic-3

1.2.3 Build the model

• Step involving statistics and machine learning: The analysis stage.

• There may be overlap and back-and-forth between the modelling stage and the data-cleaning stage as you try
to find the best way to represent the data and the best form in which to model it.

• The most common data science modelling tasks are:

Classifying— Deciding if something belongs to one category or another

Scoring— Predicting or estimating a numeric value, such as a price or probability (also referred to as predictive
analysis task)
Ranking— Learning to order items by preferences
Clustering— Grouping items into most-similar groups
Finding relations— Finding correlations or potential causes of effects seen in the data
Characterizing— Very general plotting and report generation from data

• There are several possible methods and approaches for these tasks.

• For example, for classification tasks, some common approaches are logistic regressions and tree based meth-
ods. Neural Networks based forecasting will be an example for predictive tasks. We will cover some of these in
Part 1. Background Page-11
Financial Data Science Topic-3

this unit.

• This lecture will cover the basics of some broad categories of these methods.

Part 1. Background Page-12

Financial Data Science Topic-3

1.2.4 Evaluate & Critique the Model

• Evaluate the model to check if it satisfies the goal requirements.

– Is it accurate enough for your needs? Does it generalize well?

– Does it perform better than “the obvious guess”? Better than whatever estimate you currently use?
– Do the results of the model (coefficients, clusters, rules, confidence intervals, significances, and diagnostics)
make sense in the context of the problem domain?

• Various measures of accuracy, model fit, predictive power etc.

Part 1. Background Page-13

Financial Data Science Topic-3

1.2.5 Present Results & Document

• Present your results to your project sponsor and other stakeholders.

• You must also document the model for those in the organization who are responsible for using, running, and
maintaining the model once it has been deployed.

• Reproducibility is a key component here.

• A presentation for the model’s end users would instead emphasize how the model will help them do their job
better:

– How should they interpret the model?

– What does the model output look like?
– If the model provides a trace of which rules in the decision tree executed, how do they read that?
– If the model provides a confidence score in addition to a classification, how should they use the confidence
score?
– When might they potentially overrule the model?

Part 1. Background Page-14

Financial Data Science Topic-3

1.2.6 Model Deployment

• Putting the model to action (operation).

• Model should run smoothly and shouldn’t result in disastrous decisions.

• Usually initial deployment will happen at a smaller scale.

Part 1. Background Page-15

Part 2

Let’s talk Data

2.1 What is Data?

• Collection of information in either an organised or unorganised format (Chapter-1Ozdemir (2016)).

Organised data: This refers to data that is sorted into a row/column structure, where every row represents a
single observation and the columns represent the characteristics of that observation.
Unorganised data: This is the type of data that is in the free form, usually text or raw audio/signals that must be
parsed further to become organized.

Page-16
Financial Data Science Topic-3

2.2 Types of Data (Chapter-2 Ozdemir (2016))

• Structured Data: Usually organised as a table format with rows and columns, and has observations and
characteristics. For example, finance stock price data.

– Generally thought to be much easier to work with

• Unstructured Data: Does not follow any standard organisation or structure. For example, unorganised text
data such as Twitter posts, facebook posts etc.

– Is really common.
– Exists in many forms; Tweets, emails, literature, news articles, server logs etc.

• Quantitative Data: The data described using numbers and mathematics. For example, annual revenue data
for a company.

– Discrete data: Usually data which is counted based on outcomes. For example, roll of a dice.
– Continuous data: Data which is measured; usually at a regular interval.

• Qualitative Data: The data which can not be described using numbers and basic mathematics. For example,
personal particulars of the board members of a company.

Part 2. Let’s talk Data Page-17

Part 3

Introduction to Machine Learning

Data science is a superset of Machine learning, data mining, and related subjects. It extensively covers the
complete process starting from data loading until production.

3.1 What is Machine Learning?

Main reference: Chapter-1 of

Gollapudi (2016). Whole of
• Fig-3.1 presents an example concept map representing the key aspects of Machine learning (ML). chapter-1 is relevant.

Page-18
Financial Data Science Topic-3

Figure 3.1: Concept Map Gollapudi (2016)

• There are various definitions for machine learning.

Part 3. Introduction to Machine Learning Page-19

Financial Data Science Topic-3

"A computer program is said to learn from experience E with respect to some class of tasks T and performance
measure P, if its performance at tasks in T, as measured by P, improves with experience E." (Mitchell, 2017.
Machine Learning, Mcgraw Hill)
• As per Wikipedia

"Machine learning is a scientific discipline that is concerned with the design and development of algorithms that
allow computers to evolve behaviours based on empirical data, such as from sensor data or databases."
• Primary goal of a ML implementation is to develop a general purpose algorithm that solves a practical and
focused problem.

• Important aspects in the process include data, time, and space requirements.

• The goal of a learning algorithm is to produce a result that is a rule and is as accurate as possible.

Part 3. Introduction to Machine Learning Page-20

Financial Data Science Topic-3

3.2 ML Process

• Types of datasets required: Training Set, Validation Set (may come from the initial data) and Testing Set

• Training set: data examples that are used to learn or build a classifier.

• Validation set: data examples that are verified against the built classifier and can help tune the accuracy of the
output.

• Testing set: data examples that help assess the performance of the classifier.

Phase 1-Training Phase: Training data used to train the model by using expected output with the input. Output is
the learning model.

Phase 2-Validation/Test Phase: Measuring the validity and fit of the model. How good is the model? Uses valid-
ation dataset, which can be a subset of the initial dataset.

Phase 3-Application Phase: Run the model with real world data to generate results.

• Fig-3.2 example flowchart on how learning can be applied to predict

Part 3. Introduction to Machine Learning Page-21

Financial Data Science Topic-3

Figure 3.2: Example Flowchart for predictive ML workflow

Part 3. Introduction to Machine Learning Page-22

Financial Data Science Topic-3

3.3 Models

• Central to any ML implementation

• At a high level

– Logical : Rule based (if else...), for example, decision trees.

– Geometric: Use geometric concepts like lines, planes etc. Linear transformations are often used.
– Probabilistic: Statistical models. Defines relationship between two variables.

Part 3. Introduction to Machine Learning Page-23

Financial Data Science Topic-3

3.4 Types of Learning Problems

Figure 3.3: Learning Problems Categories

Part 3. Introduction to Machine Learning Page-24

Financial Data Science Topic-3

3.5 Machine Learning Algorithms

• Decision tree based algorithms • Association rule based learning algorithms

• Bayesian method based algorithms

• Kernel method based algorithms

• Clustering methods

• Artificial neural networks

• Dimensionality reduction

• Ensemble methods (combining multiple methods)

• Instance based learning algorithms

Figure 3.4: Machine learning algorithms/methods Gol-
• Regression Analysis based algorithms lapudi (2016)

Part 3. Introduction to Machine Learning Page-25

Financial Data Science Topic-3

3.6 Subfields of Machine Learning

Figure 3.5: Subfields of ML

Part 3. Introduction to Machine Learning Page-26

Financial Data Science Topic-3

3.6.1 Supervised Learning

Also review Chapter-1 and 2
from Boehmke & Greenwell
• Construct predictive models (2019)

• Prediction of a given output (or target) using other variables (or features) in the data set. https://bradleyboehmke
.github.io/HOML/
• Supervision refers to the fact that the target values provide a supervisory roles. Indicates to the learner the task
it needs to learn.

• Uses labelled data.

• Most supervised learning problems are either regression or classification.

Part 3. Introduction to Machine Learning Page-27

Financial Data Science Topic-3

3.6.2 Unsupervised Learning

• Statistical tools to conduct descriptive analysis; for better understanding of the data.

• No specific target to solve, for example, clustering to identify groups.

• Unsupervised learning is often performed as part of an exploratory data analysis (EDA).

• Unlabelled dataset

Part 3. Introduction to Machine Learning Page-28

Regression Methods

• This part will discuss two regression methods. Linear Regression and Logistic Regression.

• We will start with Linear regression in this lecture and continue with Logistic regression in week-4.

Page-29
Part 4

Linear Regression

4.1 Introduction

• Regression analysis is one of the most widely used tool in quantitative research which is used to analyse the
relationship between variables.

• One or more variables are considered to be explanatory variables, and the other is considered to be the de-
pendent variable.

• In general linear regression is used to predict a continuous dependent variable (regressand) from a number
of independent variables (regressors) assuming that the relationship between the dependent and independent
variables is linear. Reading: Statistics and
Data Analysis for Financial
4.2 OLS Engineering with R
examples Second Edition
• The regression model with only one independent variable is called as simple linear regression and the model (Chapter-9 and Chapter 10)

with more than one independent variable is known as multiple linear regression. (Ruppert, 2015)

Page-30
Financial Data Science Topic-3

• If we have a dependent (or response) variable Y which is related to a predictor variables Xi. The simple
regression model is given by

Y = α + βXi + ϵi (4.1)

• here, the error term ϵi are assumed to be i.i.d and independent of Xi. This model describes Y lying on a straight
line with the slope of the line β , also called as the regression coefficient and the intercept of the line α. Here Y
and X are assumed to have bivariate normal distribution.

• These three parameters can be estimated using the method of Ordinary Least Squares (OLS). The basic
optimisation model minimizes the sum of squared residuals
X
SumRes = (Yi − (α + βXi))2 (4.2)
i

• R has the function lm (linear model) for linear regression.

• The main arguments to the function lm are a formula and the data. lm takes the defining model input as a
formula1, which is from a f ormula class.

library(readxl)
# change the working directory to the folder containing file
1
A f ormula object is also used in other statistical function like glm, nls, rq etc

Part 4. Linear Regression Page-31

Financial Data Science Topic-3

data1 = read_excel("PriceHistory_lec-3_afin8015.xlsx", skip = 1) #import data

# convert
data1 = as.data.frame(data1)

• Sort the data from old to new

data2 = data1[order(as.Date(data1$Date, fromat = "%Y-%m-%d")), ]

colnames(data2)

# [1] "Date"
# [2] "Composite"
# [3] "ASX All Ordinaries (180334)"
# [4] "Scentre Group (SCG-AU)"
# [5] "S&P ASX 50 (180520)"
# [6] "Australia and New Zealand Banking Group Limited (ANZ-AU)"
# [7] "Westpac Banking Corporation (WBC-AU)"
# [8] "Telstra Corporation Limited (TLS-AU)"
# [9] "BHP Group Ltd (BHP-AU)"
# [10] "CSL Limited (CSL-AU)"
# [11] "Transurban Group Ltd. (TCL-AU)"
# [12] "Commonwealth Bank of Australia (CBA-AU)"
# [13] "Rio Tinto Limited (RIO-AU)"
# [14] "Aristocrat Leisure Limited (ALL-AU)"
Part 4. Linear Regression Page-32
Financial Data Science Topic-3

# [15] "Insurance Australia Group Limited (IAG-AU)"

# [16] "Suncorp Group Limited (SUN-AU)"
# [17] "National Australia Bank Limited (NAB-AU)"
# [18] "Newcrest Mining Limited (NCM-AU)"
# [19] "Wesfarmers Limited (WES-AU)"
# [20] "Woodside Petroleum Ltd (WPL-AU)"
# [21] "Woolworths Group Ltd (WOW-AU)"
# [22] "Goodman Group (GMG-AU)"
# [23] "Brambles Limited (BXB-AU)"
# [24] "Macquarie Group Limited (MQG-AU)"

head(data2)

# Date Composite ASX All Ordinaries (180334)

# 1260 2015-08-06 100.0000 5368.639
# 1267 2015-08-06 100.0000 5600.117
# 1828 2015-08-06 100.0000 5600.117
# 1259 2015-08-07 97.1721 5309.430
# 1266 2015-08-07 97.1721 5472.331
# 1827 2015-08-07 97.1721 5472.331
# Scentre Group (SCG-AU) S&P ASX 50 (180520)
# 1260 3.84 5492.369
# 1267 4.02 5757.634
Part 4. Linear Regression Page-33
Financial Data Science Topic-3

# 1828 4.02 5757.634

# 1259 3.80 5419.528
# 1266 3.96 5606.614
# 1827 3.96 5606.614
# Australia and New Zealand Banking Group Limited (ANZ-AU)
# 1260 29.52
# 1267 32.58
# 1828 32.58
# 1259 28.97
# 1266 30.14
# 1827 30.14
# Westpac Banking Corporation (WBC-AU)
# 1260 31.76512
# 1267 33.25691
# 1828 33.25691
# 1259 31.48665
# 1266 32.17287
# 1827 32.17287
# Telstra Corporation Limited (TLS-AU) BHP Group Ltd (BHP-AU)
# 1260 6.09 25.20
# 1267 6.40 26.69
# 1828 6.40 26.69

Part 4. Linear Regression Page-34

Financial Data Science Topic-3

# 1259 6.12 24.98

# 1266 6.29 25.93
# 1827 6.29 25.93
# CSL Limited (CSL-AU) Transurban Group Ltd. (TCL-AU)
# 1260 93.87 9.560097
# 1267 98.45 9.813731
# 1828 98.45 9.813731
# 1259 93.49 9.472299
# 1266 96.84 9.667402
# 1827 96.84 9.667402
# Commonwealth Bank of Australia (CBA-AU)
# 1260 81.27000
# 1267 84.18964
# 1828 84.18964
# 1259 76.91000
# 1266 80.95350
# 1827 80.95350
# Rio Tinto Limited (RIO-AU) Aristocrat Leisure Limited (ALL-AU)
# 1260 50.89 8.53
# 1267 53.55 8.71
# 1828 53.55 8.71
# 1259 50.59 8.51

Part 4. Linear Regression Page-35

Financial Data Science Topic-3

# 1266 53.27 8.70

# 1827 53.27 8.70
# Insurance Australia Group Limited (IAG-AU)
# 1260 5.942622
# 1267 6.055327
# 1828 6.055327
# 1259 5.891393
# 1266 6.014343
# 1827 6.014343
# Suncorp Group Limited (SUN-AU)
# 1260 13.77961
# 1267 14.84037
# 1828 14.84037
# 1259 13.66632
# 1266 14.70649
# 1827 14.70649
# National Australia Bank Limited (NAB-AU)
# 1260 31.17465
# 1267 32.45990
# 1828 32.45990
# 1259 30.69147
# 1266 31.71581

Part 4. Linear Regression Page-36

Financial Data Science Topic-3

# 1827 31.71581
# Newcrest Mining Limited (NCM-AU) Wesfarmers Limited (WES-AU)
# 1260 11.44 28.59431
# 1267 11.26 30.48146
# 1828 11.26 30.48146
# 1259 10.86 28.45798
# 1266 10.96 30.08681
# 1827 10.96 30.08681
# Woodside Petroleum Ltd (WPL-AU) Woolworths Group Ltd (WOW-AU)
# 1260 32.16218 26.90
# 1267 33.98789 28.12
# 1828 33.98789 28.12
# 1259 31.62927 26.76
# 1266 33.52406 27.73
# 1827 33.52406 27.73
# Goodman Group (GMG-AU) Brambles Limited (BXB-AU)
# 1260 6.35 10.12
# 1267 6.39 10.54
# 1828 6.39 10.54
# 1259 6.30 10.18
# 1266 6.26 10.49
# 1827 6.26 10.49

Part 4. Linear Regression Page-37

Financial Data Science Topic-3

# Macquarie Group Limited (MQG-AU)

# 1260 78.98
# 1267 81.00
# 1828 81.00
# 1259 78.85
# 1266 80.08
# 1827 80.08

The above data file contains prices . The ’market model’ regression can be represented as the following regres-
sion.

Ri = α + βiRM + ϵ (4.3)

The following example estimates OLS regression coefficient for BHP and ASX
ret_bhp = 100 * diff(log(data2$`BHP Group Ltd (BHP-AU)`))
ret_asx = 100 * diff(log(data2$`ASX All Ordinaries (180334)`))
lreg1 = lm(formula = ret_bhp ~ ret_asx)
lreg1

#
# Call:
# lm(formula = ret_bhp ~ ret_asx)
Part 4. Linear Regression Page-38
Financial Data Science Topic-3

#
# Coefficients:
# (Intercept) ret_asx
# 0.01241 1.63913

• The result in the above example is an lm object which can be used with extractor functions like summary to
provide more information.

summary(lreg1)

#
# Call:
# lm(formula = ret_bhp ~ ret_asx)
#
# Residuals:
# Min 1Q Median 3Q Max
# -17.8179 -1.0931 -0.0124 1.0826 17.5035
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.01241 0.07196 0.172 0.863
# ret_asx 1.63913 0.04451 36.822 <2e-16 ***

Part 4. Linear Regression Page-39

Financial Data Science Topic-3

# ---
# Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#
# Residual standard error: 3.076 on 1825 degrees of freedom
# Multiple R-squared: 0.4263,Adjusted R-squared: 0.4259
# F-statistic: 1356 on 1 and 1825 DF, p-value: < 2.2e-16

• We get more information about the regression model using summary.

• There are other generic functions which can be used to get more information from lreg1 and similar regression
objects. Table-2.1 gives a list of some such functions.

Table 4.1: List of generic functions to extract more information

Generic Function Use
summary() Returns summary of the fitted models
coef() Estimated model parameters
resid() The model residuals
fitted() The fitted values of the model
deviance() The residual sum of squares
anova() An ANOVA table
predict() Returns predictions
plot() Used for creating plots

The following example shows how to create plots for the lreg1 object.
Part 4. Linear Regression Page-40
Financial Data Science Topic-3

# we first set the graphical parameter as the plot function for lm

# object creates 4 plots
par1 = par()
par(mfrow = c(2, 2))
plot(lreg1)

Part 4. Linear Regression Page-41

Financial Data Science Topic-3

Residuals vs Fitted Normal Q−Q

6
459 459
463 463

Standardized residuals

4
10
Residuals

2
0

0
−6 −4 −2
−10
−20
458
458

−15 −10 −5 0 5 10 −3 −2 −1 0 1 2 3

Fitted values Theoretical Quantiles

Scale−Location Residuals vs Leverage

2.5

459 458 0.5

6
463
Standardized residuals

Standardized residuals
2.0

430

4
428
1727
1.5

2
0
1.0

−6 −4 −2
0.5

Cook's distance
0.0

0.5

−15 −10 −5 0 5 10 0.000 0.005 0.010 0.015 0.020

Fitted values Leverage

Part 4. Linear Regression Figure 4.1: Linear Regression Diagnostic Plots Page-42
Financial Data Science Topic-3

• The upper left plot in figure-2.1 shows the residual errors plotted versus their fitted values.

• The plot in the upper right is a standard Q-Q plot, which should suggest that the residual errors are normally
distributed.

• The scale-location plot in the lower left shows the square root of the standardized residuals as a function of the
fitted values.

• The fourth plot in the lower right shows each points leverage, a measure of the point importance in determining
the regression result.

• The contour lines on the plot are for the Cook’s distance, which is another measure of the importance of each
observation to the regression. Smaller distances means that removing the observation has little affect on the
regression results. Only one plot out of the four
can also be generated using
Sometimes, its just required to plot the regression line over the data points. The following example demonstrate the argument which in the
how to add the regression line using the function abline function plot.

# first plot BHP and ASX returns

plot(ret_asx, ret_bhp)
# add the regression line
abline(lreg1, col = "blue")

Part 4. Linear Regression Page-43

Financial Data Science Topic-3

20
10
ret_bhp

0
−10
−20

−10 −5 0 5

ret_asx

Part 4. Linear Regression Figure 4.2: Regression Fit Page-44

Financial Data Science Topic-3

The function lm can handle multiple linear regression along with simple linear regression. We will discuss multiple
linear regression during factor Models

• Plot using ggplot2

data_ggplot = data.frame(ASX = ret_asx, BHP = ret_bhp)

library(ggplot2)
p1 = ggplot(data_ggplot, aes(ASX, BHP))
p1 + geom_point(color = "blue") + stat_smooth(method = "lm", color = "red") +
theme_bw() + labs(title = "Market Model BHP")

Part 4. Linear Regression Page-45

Financial Data Science Topic-3

Market Model BHP

BHP

−10

−20
−10 −5 0 5
ASX

Part 4. Linear Regression Page-46

Financial Data Science Topic-3

• Table of output in LaTeX

library(stargazer)
stargazer(lreg1, summary = TRUE, title = "OLS Results", type = "latex",
no.space = TRUE)

Table 4.2: OLS Results

Dependent variable:
ret_bhp
ret_asx 1.639∗∗∗
(0.045)
Constant 0.012
(0.072)
Observations 1,827
2
R 0.426
Adjusted R2 0.426
Residual Std. Error 3.076 (df = 1825)
F Statistic 1,355.879∗∗∗ (df = 1; 1825)
∗
Note: p<0.1; ∗∗p<0.05; ∗∗∗p<0.01

• Table output in Word

Part 4. Linear Regression Page-47

Financial Data Science Topic-3

library(stargazer)
stargazer(lreg1, summary = TRUE, title = "OLS Results", type = "html",
out = "bhp_capm.doc", no.space = TRUE)

Part 4. Linear Regression Page-48

Next Time

• Logistic Regressions

• Machine Learning Methods (Continued...)

Page-49
References

Boehmke, Brad, & Greenwell, Brandon M. 2019. Hands-on machine learning with R. CRC Press.

Dasgupta, Nataraj, Farias, Ricardo Anjoleto, & Lanzetta, Vitor Bianchi. 2018. Hands-On Data Science with R. Packt Publishing.

Gollapudi, Sunila. 2016. Practical Machine Learning. Packt Publishing.

Mount, John, & Zumel, Nina. 2019. Practical Data Science with R, Second Edition. Manning Publications.

Ozdemir, Sinan. 2016. Principles of data science : learn the techniques and math you need to start making sense of your data. Packt Publishing.

Ruppert, David. 2015. Statistics and data analysis for financial engineering. 2 edn. Vol. 13. Springer.

Page-50

(Ebook PDF) Data Mining For Business Analytics: Concepts, Techniques, and Applications in R PDF Download
83% (6)
(Ebook PDF) Data Mining For Business Analytics: Concepts, Techniques, and Applications in R PDF Download
44 pages
Statistics For Data Science - 1
100% (2)
Statistics For Data Science - 1
38 pages
Introduction To Data Science: Hui Lin and Ming Li
No ratings yet
Introduction To Data Science: Hui Lin and Ming Li
403 pages
Book Machine Learning Finance Python
100% (1)
Book Machine Learning Finance Python
75 pages
Data Science Training in Naresh I Technologies
100% (3)
Data Science Training in Naresh I Technologies
18 pages
Introduction To Data Science
100% (1)
Introduction To Data Science
200 pages
Hmls
No ratings yet
Hmls
126 pages
Om Scratch
100% (1)
Om Scratch
124 pages
Next - Level - Data - Science - Sample Chapter
No ratings yet
Next - Level - Data - Science - Sample Chapter
37 pages
DSF - UNIT III Notes
No ratings yet
DSF - UNIT III Notes
17 pages
Statistics For Data Science
No ratings yet
Statistics For Data Science
39 pages
Soft Skill Training Manual
100% (1)
Soft Skill Training Manual
17 pages
Machine Learning Simplified
100% (1)
Machine Learning Simplified
109 pages
Career Choice CH 1 3
100% (2)
Career Choice CH 1 3
25 pages
Data Science Foundations
No ratings yet
Data Science Foundations
4 pages
Data Science Project - An Inductive Learning Approach, Verri
No ratings yet
Data Science Project - An Inductive Learning Approach, Verri
238 pages
Numsense
No ratings yet
Numsense
138 pages
Introduction To Data Science - Lin and Li
No ratings yet
Introduction To Data Science - Lin and Li
403 pages
MachineLearning 1 1
No ratings yet
MachineLearning 1 1
81 pages
(Ebook PDF) Data Mining For Business Analytics: Concepts, Techniques, and Applications in R Download
No ratings yet
(Ebook PDF) Data Mining For Business Analytics: Concepts, Techniques, and Applications in R Download
48 pages
Breaking Into AI!
No ratings yet
Breaking Into AI!
30 pages
001-2023-0714 DLBDSIDS01 Course Book
No ratings yet
001-2023-0714 DLBDSIDS01 Course Book
90 pages
Ids PDF
No ratings yet
Ids PDF
397 pages
FULLTEXT01
No ratings yet
FULLTEXT01
68 pages
Machine Learning Basic Principles
No ratings yet
Machine Learning Basic Principles
124 pages
Data Mining Notes
100% (1)
Data Mining Notes
178 pages
Machine Learning Introduction
100% (1)
Machine Learning Introduction
20 pages
Data Science
No ratings yet
Data Science
9 pages
Practitioner's Guide To Data Science
No ratings yet
Practitioner's Guide To Data Science
403 pages
Ads Imp Qna 2025 15 04 06 06 35
No ratings yet
Ads Imp Qna 2025 15 04 06 06 35
33 pages
DM Fraud
No ratings yet
DM Fraud
32 pages
Unit III
No ratings yet
Unit III
19 pages
Week 12 Intro To DS and ML
No ratings yet
Week 12 Intro To DS and ML
67 pages
Fd45092a Ccad 459e Bc18 B01536fd6bac Untitled
No ratings yet
Fd45092a Ccad 459e Bc18 B01536fd6bac Untitled
53 pages
1Z0 1035 24 Demo
No ratings yet
1Z0 1035 24 Demo
4 pages
File of ML
No ratings yet
File of ML
42 pages
Case Study - Churn Mdel Prediction
No ratings yet
Case Study - Churn Mdel Prediction
77 pages
David L. Olson, Desheng Wu - Predictive Data Mining Models (2nd Ed.) - Springer (2020)
No ratings yet
David L. Olson, Desheng Wu - Predictive Data Mining Models (2nd Ed.) - Springer (2020)
127 pages
Unit 1 Part 4
No ratings yet
Unit 1 Part 4
8 pages
Data - Analytics - Chapter 2
No ratings yet
Data - Analytics - Chapter 2
58 pages
Data Science PDF
No ratings yet
Data Science PDF
11 pages
Unit 1
No ratings yet
Unit 1
41 pages
Tle January
67% (3)
Tle January
10 pages
MLE
No ratings yet
MLE
15 pages
Machine Learning The Basics
No ratings yet
Machine Learning The Basics
158 pages
Final 1
No ratings yet
Final 1
6 pages
Unit I
No ratings yet
Unit I
52 pages
Air Quality Prediction Using Machine Learning
No ratings yet
Air Quality Prediction Using Machine Learning
29 pages
Machine Learning For Quants
No ratings yet
Machine Learning For Quants
13 pages
Chatbot and Text Summarization
No ratings yet
Chatbot and Text Summarization
5 pages
Internship Report: T.J.Instituteoftechnology
No ratings yet
Internship Report: T.J.Instituteoftechnology
29 pages
TIS - Intro To Machine Learning
No ratings yet
TIS - Intro To Machine Learning
18 pages
Data Science
No ratings yet
Data Science
64 pages
Introduction to Data Analytics
From Everand
Introduction to Data Analytics
Dan Martin
No ratings yet
Dsa in C++
No ratings yet
Dsa in C++
14 pages
Global Trends On Inclusive Education July 2018
100% (1)
Global Trends On Inclusive Education July 2018
36 pages
Part 2 Introduction To ML
No ratings yet
Part 2 Introduction To ML
13 pages
Sample Curriculum Vitae
No ratings yet
Sample Curriculum Vitae
3 pages
Oe Cae 3
No ratings yet
Oe Cae 3
7 pages
Past Simple Questions Worksheet
No ratings yet
Past Simple Questions Worksheet
1 page
Data Mining Fraud
No ratings yet
Data Mining Fraud
32 pages
Environmental Geology Course Synopsis
No ratings yet
Environmental Geology Course Synopsis
3 pages
Unit 3
No ratings yet
Unit 3
9 pages
DsNaIT v2.0
No ratings yet
DsNaIT v2.0
43 pages
Economic and Political Weekly Vol. 47, No. 10, MARCH 10, 2012
No ratings yet
Economic and Political Weekly Vol. 47, No. 10, MARCH 10, 2012
84 pages
Online Machine Learning Algorithms For Currency Exchange Prediction
No ratings yet
Online Machine Learning Algorithms For Currency Exchange Prediction
84 pages
Unlocking Statistics for the Social Sciences
From Everand
Unlocking Statistics for the Social Sciences
Norma Sinclair
No ratings yet
PSYCHOLOGY OF EDUCATION Must Read
No ratings yet
PSYCHOLOGY OF EDUCATION Must Read
15 pages
Lloyds British - Scope of Work Training
No ratings yet
Lloyds British - Scope of Work Training
23 pages
ACTG 417 Fall 2018 Syllabus
No ratings yet
ACTG 417 Fall 2018 Syllabus
4 pages
3590 - Extractions in Orthodontics - Literature Review
No ratings yet
3590 - Extractions in Orthodontics - Literature Review
9 pages
What Is Data Science
No ratings yet
What Is Data Science
13 pages
Table of Table of Specification Mathematics 9 1St Summative Test, Quarter Thinking Skills / Item Placement
No ratings yet
Table of Table of Specification Mathematics 9 1St Summative Test, Quarter Thinking Skills / Item Placement
5 pages
Global University Rankings and The Mediatization of Higher Education 2019 (Book Review
No ratings yet
Global University Rankings and The Mediatization of Higher Education 2019 (Book Review
3 pages
231
No ratings yet
231
10 pages
60th Graduation Ceremony
No ratings yet
60th Graduation Ceremony
1 page
1oo Marks Project On Rural Insurance
50% (2)
1oo Marks Project On Rural Insurance
6 pages
Design and Technology in Today's World: A First Look
From Everand
Design and Technology in Today's World: A First Look
Baz Professor
No ratings yet
2058 s14 Ms 12 PDF
100% (1)
2058 s14 Ms 12 PDF
7 pages
Unit 9
No ratings yet
Unit 9
17 pages
CBCRM Unit2 Study Material
No ratings yet
CBCRM Unit2 Study Material
18 pages
Early Access To Language Supports Number Mapping Skills in Deaf Children
No ratings yet
Early Access To Language Supports Number Mapping Skills in Deaf Children
18 pages
Practica-Distribuida Resuelta
No ratings yet
Practica-Distribuida Resuelta
1 page
How To Write A Literature Review For A Software Project
100% (1)
How To Write A Literature Review For A Software Project
8 pages
AI 2076 - Ankit Pangeni
No ratings yet
AI 2076 - Ankit Pangeni
8 pages
Stephen Hawking
No ratings yet
Stephen Hawking
1 page
Mechanics: Physics 151
No ratings yet
Mechanics: Physics 151
23 pages
Dade Lacrosse-Flag Football
No ratings yet
Dade Lacrosse-Flag Football
1 page
Content Creation Revolution with chatGPT
From Everand
Content Creation Revolution with chatGPT
Maria Cowen
No ratings yet
Sub Hall Attendant PT Providence DCM EXT 10apr18
No ratings yet
Sub Hall Attendant PT Providence DCM EXT 10apr18
3 pages