0% found this document useful (0 votes)

20 views9 pages

Sta 3

This document discusses statistical machine learning techniques beyond parameter estimation, including modeling multivariate distributions, discrete conditional distributions, continuous conditional distributions, and other more complex estimation tasks. It introduces concepts like regression, classification, supervised learning, and linear models.

Uploaded by

wayacel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views9 pages

Sta 3

Uploaded by

wayacel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Beyond parameter estimation

INFO-F-422: Statistical foundations of machine

learning So far simple task: estimation of parameters of univariate
distributions.
Linear regression More complex estimation tasks:
◮ Parameters of multivariate distributions: consider for
Gianluca Bontempi example multivariate gaussians.
◮ More complex functionals.
Machine Learning Group
Computer Science Department
◮ Discrete conditional distributions: pattern recognition or
mlg.ulb.ac.be pattern classification.
◮ Continuous conditional distributions: regression.

1/34 2/34

Bivariate continuous random scatterplot Bivariate density distribution

Input/output training set

3.5

2.5

2
y

1.5

0.5

0
0 2 4 6 8 10
x

3/34 4/34
Prediction problems Input/output problems

All the previous examples are characterized by

◮ Predict whether a patient, hospitalized due to a heart attack, 1. An outcome measurement, also called output, usually
will have a second heart attack, on the basis of demographic, quantitative (like a stock price) or categorical (like heart
diet and clinical measurements. attack/no heart attack).
◮ Predict the price of a stock in 6 months from now, on the 2. a set of features or inputs, also quantitative or categorical,
basis of company performance measures and economic data. that we wish to use to predict the output.
◮ Identify the risk factors for breast cancer, based on clinical,
demographic and genetic variables.
Assumption: input variables provide some explanation for the
◮ Classify the category of a text email (spam or not) on the variability of the output.
basis of its text content.
◮ Characterize the mechanical property of a steel plate on the We collected a set of input/output data (training set), we use
basis of its physical and chemical composition. statistical methods to build a prediction model (learner) to
predict the outcome for new unseen objects.

5/34 6/34

Supervised learning Regression and classification

INPUT UNKNOWN OUTPUT PREDICTION

DEPENDENCY ERROR

According to the type of output, two prediction tasks:

TRAINING
◮ Regression: quantitative outputs, e.g. real or integer numbers
DATASET ◮ Classification (or pattern recognition): qualitative or
categorical outputs which take values in a finite set of classes
(e.g. black, white and red) where there is no explicit ordering.
MODEL
PREDICTION Qualitative variables are also referred to as factors.

Supervised learning because of the presence of the outcome

variable which guides the learning process.
Collecting a set of training data is like having a teacher suggesting
the correct answer for each input.

7/34 8/34
Simple linear model Linear regression function

The simplest regression model is is the linear model The function f (x) = E [y|x] is also known as regression function.

y = β0 + β1 x + w
y

where
◮ x ∈ R is the regressor (or independent) variable,
◮ y ∈ R is the measured response (or dependent) variable, 1
0
0
1
E[y|x 1] 0
1
0
1 E[y|x 2]
◮ β0 is the intercept, β1 is the slope regression 0
1
function
◮ E [w] = 0 where w is the model error
This implies that

E [y|x] = f (x) = β0 + β1 x
Var [y|x] = Var [w] x1 x2 x

9/34 10/34

What does “linear” mean? Example of linear models

According to our definition of linear model, then

◮ y = β0 + β1 x is a linear model
◮ y = β0 + β1 x 2 is again a linear model. Simply by making the
In the following, linear model is each input/output relationships
transformation X = x 2 , the model can be put in in the linear
which is linear in the model parameters and not necessarily in the
form y = β0 + β1 X
dependent variables. This means that
◮ y = B0 x β1 can be studied as a linear model between
1. any value of the response variable y is described by a linear
Y = log(y ), X = log(x) and β0 = log(B0 ) thanks to the
combination of a series of parameters (regression slopes,
equality
intercept)
2. no parameter appears as an exponent or is multiplied or log(y ) = log(B0 ) + β1 log(x) ⇔ Y = β0 + β1 X
divided by another parameter.
◮ the relationship y = β0 + β1 β2x cannot be linearized.
◮ let z a categorical variable taking 4 possible values
{c1 , . . . , c4 }. It is possible to model a linear dependence with y
by creating four binary variables xj such that xj = 1 ⇔ z = cj .

11/34 12/34
Model estimation Least squares formulation

The method of least squares is designed to provide

◮ N i.id. pairs of observations DN = {hxi , yi i}, i = 1, . . . , N 1. estimations β̂0 and β̂1 of β0 and β1
◮ Data generated by the stochastic process 2. the fitted values of the response y

yi = β0 + β1 xi + wi , i = 1, . . . , N ŷi = β̂0 + β̂1 xi , i = 1, . . . , N

where so that the residual sum of squares
1. wi are iid realizations of the r.v. w having mean zero and
constant variance σw2 (homoscedasticity), N
X
2. xi are non random and observed with negligible error SSEemp (b0 , b1 ) = (yi − b0 − b1 xi )2
◮ Then the unknown parameters (also known as regression i =1

coefficients) β0 and β1 can be estimated by the least-squares is minimized.

method.
See Shiny script leastsquares.R.

13/34 14/34

Least-squares solution Empirical risk

From SSEemp we can define the term

Since the error function SSEemp (b0 , b1 ) is a quadratic function of
the coefficients b0 and b1 , the minimization of the error function
has a unique solution which can be found in closed form. [ emp = min SSEemp (b0 , b1 ) =
MISE
{b0 ,b1 } N
PN
This is called the least-squares solution SSEemp (β̂0 , β̂1 ) i =1 (yi − β̂0 − β̂1 xi )2
= =
N N
N
is called the empirical risk or training error.
X
{β̂0 , β̂1 } = arg min SSEemp (b0 , b1 ) = arg min (yi −b0 −b1 xi )2
{b0 ,b1 } {b0 ,b1 }
i =1
Note that the term SSEemp is a function of the training set and as
such it can be considered as a realization of a random variable.

15/34 16/34
Univariate least-squares solution Properties of the least-squares estimators
If the dependency underlying the data is P
linear then the estimators
It can be shown that the least-squares solution is
are unbiased. Since x is nonrandom and N i =1 (xi − x̄) = 0
Sxy
β̂1 = β̂0 = ȳ − β̂1 x̄ N
Sxx Sxy X (xi − x̄)E [yi ]
EDN [β̂ 1 ] = EDN =
Sxx Sxx
where PN PN
i =1

i =1 xi i =1 yi
N N
!
x̄ = , ȳ = 1 X X β1 Sxx
N N = [(xi − x̄)β0 ] + [(xi − x̄)β1 xi ] = = β1
Sxx Sxx
i =1 i =1
and
N
Also it can be shown that
X
Sxy = (xi − x̄)yi h i σ2
Var β̂ 1 = w
i =1 Sxx
N N
X X E [β̂ 0 ] = β0
Sxx = (xi − x̄)2 = (xi − x̄)xi
x̄ 2

i =1 i =1
h i
2 1
Var β̂ 0 = σw +
N Sxx

17/34 18/34

Properties of the least-squares estimators (II) Sample correlation coefficient

The usual estimator of the correlation
◮ It can be shown that the error mean-square Cov[x, y]
ρ(x, y) = p
PN
− ŷi )2 Var [x] Var [y]
2 i =1 (yi
σ̂w =
N −2 between two r.v. x and y is
is an unbiased estimator of σw2 under the (strong) assumption
Sxy
that the linear model is correct. ρ̂ = p
Sxx Syy
◮ The denominator is often referred to as the residual degrees of
freedom, also denoted by df. Note that since
Sxy
◮ The degree of freedom can be be seen as the number N of β̂1 =
Sxx
samples reduced by the numbers p of parameters estimated
(slope and intercept). the following relation holds
◮ The estimate of the variance σw 2 allows the estimation of the
β̂1 Sxy
variance of the intercept and slope, respectively. ρ̂2 =
Syy

19/34 20/34
Variance of the response Multiple linear dependency
◮ Since β̂0 = ȳ − β̂1 x̄ ◮ Consider a linear relation between an independent variable
^
y = β̂ 0 + β̂ 1 x = ȳ − β̂ 1 x̄ + β̂ 1 x = ȳ + β̂ 1 (x − x̄) x ∈ X ⊂ Rn and a dependent random variable y ∈ Y ⊂ R

is the estimator of the conditional expectation in x. y = β0 + β1 x·1 + β2 x·2 + · · · + βn x·n + w

◮ Under the linear hypothesis, we have for a specific x = x0
where w represents a random variable with mean zero and
y|x0 ] = E [β̂ 0 ] + E [β̂ 1 ]x0 = β0 + β1 x0 = E [y|x0 ]
E [^ constant variance σw2.

◮ Since ◮ In matrix notation the equation can be written as:

h i 2
σw
Var β̂ 1 = y = xT β + w
Sxx
and Cov[ȳ, β̂ 1 ] = 0, the variation of ^y at x0 if repeated data where x stands for the [p × 1] vector x = [1, x·1 , x·2 , . . . , x·n ]T ,
collection and consequent regressions were conducted is β = [β0 , . . . , βn ]T is the vector of parameters and p = n + 1 is
h i
1 (x0 − x̄)2
the total number of model parameters.
2
Var [^y|x0 ] = Var ȳ + β̂ 1 (x0 − x̄) = σw +
N Sxx ◮ NB: in the following x·j (and xj ) will denote the jth
PN (j = 1, . . . , n) variable of the vector x, while xi (i = 1, . . . , N)
i =1 xi
where x̄ = N . will denote the i th observation of the vector x.
21/34 22/34

The multiple linear regression model with n = 2 The multiple linear regression model
Consider N observations DN = {hxi , yi i : i = 1, . . . , N}, where
xi = (1, xi 1 , . . . , xin ), generated according to the previous model.
We suppose that the following multiple linear relation holds

Y = Xβ + W

where Y is the [N × 1] response vector, X is the [N × p] data

matrix, whose (j + 1)th column of X contains readings on the j th
regressor, β is the [p × 1] vector of parameters
x1T
 
y1 1 x11 x12 ... x1n β0 w1
       
y2 1 x21 x22 ... x2n  xT   β1  w2
2
       
       
Y = . X = . . . .  .
= β =
 .. W =
 ..
   
. . . . .

     .   
 .   . . . .   .   .   . 
yN 1 xN1 xN2 ... xNn xNT βn wN

where wi are assumed uncorrelated, with mean zero and constant

variance σw 2 (homogeneous variance). Then
2I .
Var [w1 , . . . , wN ] = σw N
(excerpt from "The Elements of Statistical Learning " book)
23/34 24/34
The least-squares solution Normal equations
The the least-squares estimator β̂ is Differentiating the residual sum of squares we obtain the
N
X least-squares normal equations
β̂ = arg min (yi − xiT b)2 = arg min (Y − Xb)T (Y − Xb) (X T X )β̂ = X T Y
b b
i =1
As a result, assuming X is of full column rank
Given β̂ we obtain
β̂ = (X T X )−1 X T Y
SSEemp = (Y − X β̂)T (Y − X β̂) = e T e
where the X T X matrix is a positive definite symmetric [p × p]
where SSEemp represents the residual sum of squares for linear matrix which plays an important role in multiple linear regression.
models and e is the [N × 1] vector of residuals. We define also the The predicted values for the training set are
empirical (or training) error quantity
Ŷ = X β̂ = X (X T X )−1 X T Y
[ emp = SSEemp
MISE where H = X (X T X )−1 X T is the Hat matrix.
N
The vector β̂ must satisfy In R notation:
∂
[(Y − X β̂)T (Y − X β̂)] = 0 ⇔ −2X T (Y − X β̂) = 0 betahat=solve(t(X)%*%X) %*%t(X)%*%Y
∂ β̂
25/34 26/34

R function lm Analysis of the LS estimate

If the linear dependency assumption holds

summary(lm(Y~X))
◮ If E [w] = 0 then β̂ is an unbiased estimator of β.
Call:
lm(formula = Y ~ X) ◮ The residual mean square estimator
Residuals:
Min 1Q Median
-0.40141 -0.14760 -0.02202
3Q
0.03001
Max
0.43490 2 (Y − X β̂)T (Y − X β̂)
σ̂ =
Coefficients: N −p
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.09781 0.11748 9.345 6.26e-09
X 0.02196 0.01045 2.101 0.0479
2.
is an unbiased estimator of the error variance σw
(Intercept) ***
X *
◮ If the wi are uncorrelated and have common variance, the
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
variance-covariance matrix of β̂ is given by
Residual standard error: 0.2167 on 21 degrees of freedom
Multiple R-Squared: 0.1737, Adjusted R-squared: 0.1343
2
Var[β̂] = σw (X T X )−1
F-statistic: 4.414 on 1 and 21 DF, p-value: 0.0479

◮ R script Linear/bv_mult.R.

27/34 28/34
Variance of the prediction Generalization error of the linear model
◮ The prediction ^
y for a generic input value x = x0 is unbiased ◮ The linear predictor
ŷ = x T β̂
y|x0 ] = x0T β
E [^
has been estimated by using the training dataset
◮ The variance of the prediction ^
y for a generic input value DN = {hxi , yi i : i = 1, . . . , N}. Then β̂ is a r.v.
x = x0 is given by ◮ Now we want to use it to predict for a test input x the future
x0 (X T X )−1 x0
2 T
y|x0 ] = σw
Var[^ output y(x).
◮ The test output y(x) is independent of the training set DN .
◮ Assuming a normal error w, the 100(1 − α)% confidence ◮ Which precision can we expect from ŷ (xi ) = xiT β̂ on average?
y|x = x0 ] is given by
bound for the regression value E [^ ◮ A measure of error is the MSE
q
ŷ(x0 ) ± tα/2,N−p σ̂w x0T (X T X )−1 x0 MSE(x) = EDN ,y [(y(x) − x T β̂)2 ] = σw
2
+ EDN [(x T β − x T β̂)2 ]

where tα/2,N−p is the upper α/2 percent point of the where y(xi ) is independent of DN and the integrated version
t-distribution with N − p degrees of freedom and the quantity
Z
q MISE = MSE(x)p(x)dx
σ̂w x0T (X T X )−1 x0 is the standard error of prediction for X
multiple regression. ◮ How can we estimate this quantity?
29/34 30/34

The expected empirical error MISE error

◮ Is the empirical risk a good estimate of the MISE

generalization error? ◮ Let us compute now the expected prediction error of a linear
◮ The expectation of the residual sum of squares can be written model trained on DN when this is used to predict a set of test
as1 outputs distributed according to the same linear dependency
but independent of the training set.
"P #
h i N
(y − x T β̂)2 ◮ It can be shown2 that in case of linear dependency
\ emp = ED
EDN MISE i =1 i i
=
N
N "P
N
#
(y − T β̂)2
i =1 x i N +p 2
MISE = EDN ,y = σw
"P #
N T β̂)2
N −p (y
i =1 i − x i N −p 2 N N
= EDN = σw
N N −p N
Note that in the MISE formula the y distribution is independent of
◮ This is the expectation of the error made by a linear model DN and then of β̂.
trained on DN to predict the value of the output in DN .

1 2
derivation in the handbook derivation in the handbook
31/34 32/34
MISE error The PSE and the FPE

◮ 2 we have the Predicted Square Error

Given the estimate σ̂w
Then it follows that the empirical error returns a biased estimate of
(PSE) criterion
MISE, that is
[ emp + 2σ̂w
PSE = MISE 2
p/N
\ emp ] = N − p σw
EDN [MISE 2
6= MISE =
N +p 2
σw
N N 2
◮ Taking as estimate of σw
[ emp with
If we replace MISE
2 1
σ̂w = SSEemp
[ emp + 2 p σw
MISE 2 N −p
N
we have the Final Prediction Error (FPE)
we obtain an unbiased estimator of the quantity MISE (see R file
Linear/ee.R). 1 + p/N [
FPE = MISEemp
Nevertheless, this estimator requires an estimate of the noise 1 − p/N
variance.
◮ See the R script Linear/fpe.R

33/34 34/34

Salt Cfa Level 2 Formulasheet 2024
100% (3)
Salt Cfa Level 2 Formulasheet 2024
19 pages
StatLearning2r PDF
No ratings yet
StatLearning2r PDF
267 pages
BA501 Week5 Linear Regression
No ratings yet
BA501 Week5 Linear Regression
45 pages
Statistics 3 Notes
No ratings yet
Statistics 3 Notes
90 pages
2.linear Regression
No ratings yet
2.linear Regression
49 pages
Regression
No ratings yet
Regression
45 pages
Lec2 ASE
No ratings yet
Lec2 ASE
86 pages
Data Science Interview Preparation
100% (1)
Data Science Interview Preparation
113 pages
Linear Regression
No ratings yet
Linear Regression
108 pages
Statics Thinking-Regression
No ratings yet
Statics Thinking-Regression
51 pages
Regression
No ratings yet
Regression
44 pages
MultivariableRegression 6
No ratings yet
MultivariableRegression 6
44 pages
Multiple Regression
No ratings yet
Multiple Regression
22 pages
Linear Regression
No ratings yet
Linear Regression
29 pages
Lectureslides Chap6-Annot PDF
No ratings yet
Lectureslides Chap6-Annot PDF
30 pages
BS Classes V2
No ratings yet
BS Classes V2
70 pages
Linear Regression
No ratings yet
Linear Regression
47 pages
Linera Regression II PDF
No ratings yet
Linera Regression II PDF
14 pages
Module 3: Linear Regression: TMA4268 Statistical Learning V2025
No ratings yet
Module 3: Linear Regression: TMA4268 Statistical Learning V2025
110 pages
MGT Three
No ratings yet
MGT Three
86 pages
Elementary Regression Analysis
No ratings yet
Elementary Regression Analysis
25 pages
Notes 2
No ratings yet
Notes 2
16 pages
Unit-2 Ak
No ratings yet
Unit-2 Ak
106 pages
LinearRegression1 210720 171800
No ratings yet
LinearRegression1 210720 171800
41 pages
TSNotes 1
No ratings yet
TSNotes 1
29 pages
Lecture3 221109 035214
No ratings yet
Lecture3 221109 035214
87 pages
Linear Regression - Module 3
No ratings yet
Linear Regression - Module 3
16 pages
Lecture 2
No ratings yet
Lecture 2
23 pages
Stat 353 Study Guide
No ratings yet
Stat 353 Study Guide
44 pages
Lecture 3
No ratings yet
Lecture 3
33 pages
ML Unit3
No ratings yet
ML Unit3
9 pages
Gary Chamberlain Econometric S
No ratings yet
Gary Chamberlain Econometric S
152 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
55 pages
WST 311 Notes Part 2 2024
No ratings yet
WST 311 Notes Part 2 2024
21 pages
Definition of Simple Linear Regression
No ratings yet
Definition of Simple Linear Regression
9 pages
Lecture 1a
No ratings yet
Lecture 1a
17 pages
Chap 7
No ratings yet
Chap 7
7 pages
ch12 0
No ratings yet
ch12 0
43 pages
Chapter Three
No ratings yet
Chapter Three
35 pages
Ordinary Least Squares
No ratings yet
Ordinary Least Squares
54 pages
Chapter 2 Regression Analysis Notes
No ratings yet
Chapter 2 Regression Analysis Notes
11 pages
Module01 LinearRegression
No ratings yet
Module01 LinearRegression
41 pages
Lec9 - Linear Models
No ratings yet
Lec9 - Linear Models
44 pages
Chap01-3 (Autosaved)
No ratings yet
Chap01-3 (Autosaved)
51 pages
Mungadze Linear
No ratings yet
Mungadze Linear
21 pages
AAI Lecture 10 SP 25
No ratings yet
AAI Lecture 10 SP 25
37 pages
DS303: Introduction To Machine Learning: Manjesh K. Hanawal
No ratings yet
DS303: Introduction To Machine Learning: Manjesh K. Hanawal
17 pages
Machine Learning Lecture Notes Undergrad
No ratings yet
Machine Learning Lecture Notes Undergrad
19 pages
Multiple Linear Regression Model by Jeevan Bista
No ratings yet
Multiple Linear Regression Model by Jeevan Bista
16 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
18 pages
Unit - 1
No ratings yet
Unit - 1
8 pages
SimpleMultipleLinearRegression FoundationalMathofAI S24
No ratings yet
SimpleMultipleLinearRegression FoundationalMathofAI S24
6 pages
Math644 - Chapter 1 - Part2 PDF
No ratings yet
Math644 - Chapter 1 - Part2 PDF
14 pages
Multiple Linear Reegression
No ratings yet
Multiple Linear Reegression
21 pages
Lecture 22: Review For Exam 2 1 Basic Model Assumptions (Without Gaussian Noise)
No ratings yet
Lecture 22: Review For Exam 2 1 Basic Model Assumptions (Without Gaussian Noise)
7 pages
Mult Regression
No ratings yet
Mult Regression
28 pages
Econometric Theory: Module - Ii
No ratings yet
Econometric Theory: Module - Ii
11 pages
CH07 Wooldridge 7e PPT 2pp
No ratings yet
CH07 Wooldridge 7e PPT 2pp
25 pages
G. S. Maddala - Introduction To Econometrics-Macmillan Pub. Co. - Maxwell Macmillan Canada - Maxwell Macmillan International (1992)
No ratings yet
G. S. Maddala - Introduction To Econometrics-Macmillan Pub. Co. - Maxwell Macmillan Canada - Maxwell Macmillan International (1992)
637 pages
CENGR 3140:: Numerical Solutions To Ce Problems
No ratings yet
CENGR 3140:: Numerical Solutions To Ce Problems
21 pages
Module05 Notes
No ratings yet
Module05 Notes
19 pages
Classical LinearReg 000
No ratings yet
Classical LinearReg 000
41 pages
Foundations of Engineering With MATLAB 7: Eric S. Carlson
No ratings yet
Foundations of Engineering With MATLAB 7: Eric S. Carlson
14 pages
Genotype X Environment Interaction Chapter 13
No ratings yet
Genotype X Environment Interaction Chapter 13
30 pages
LeastSquares Fit in Matlab
No ratings yet
LeastSquares Fit in Matlab
64 pages
SHS Correlation and Regression Final
No ratings yet
SHS Correlation and Regression Final
79 pages
Chapter 7 Curve Fitting V1.
No ratings yet
Chapter 7 Curve Fitting V1.
43 pages
ML Project Part A 1
No ratings yet
ML Project Part A 1
6 pages
Report No. Dsp-14: Dansk Skibsteknisk Forskningsinstitut
No ratings yet
Report No. Dsp-14: Dansk Skibsteknisk Forskningsinstitut
97 pages
SNR Maths Methods 19 Ia1 Asr High PSMT
No ratings yet
SNR Maths Methods 19 Ia1 Asr High PSMT
16 pages
Lecture Notes 5
No ratings yet
Lecture Notes 5
19 pages
Xtxtgee
No ratings yet
Xtxtgee
19 pages
Ch14 ZKH3 Multiple Regression
No ratings yet
Ch14 ZKH3 Multiple Regression
45 pages
Get SAS For Linear Models Fourth Edition Ramon Littell PDF Ebook With Full Chapters Now
No ratings yet
Get SAS For Linear Models Fourth Edition Ramon Littell PDF Ebook With Full Chapters Now
52 pages
Final MAEC - Semester - 4 - English - 2024-25
No ratings yet
Final MAEC - Semester - 4 - English - 2024-25
15 pages
22IZ023 Nikhil - Exercise 6 - Linear Regression
No ratings yet
22IZ023 Nikhil - Exercise 6 - Linear Regression
4 pages
MId - Term 2
No ratings yet
MId - Term 2
10 pages
StockWatson Econ CH04
No ratings yet
StockWatson Econ CH04
27 pages
#Q604 MCQ Practice Test 1
No ratings yet
#Q604 MCQ Practice Test 1
5 pages
QM 7 Panel Regression Fixed Effects
No ratings yet
QM 7 Panel Regression Fixed Effects
36 pages
Zellner - 1962 - An Efficient Method of Estimating Seemingly Unreleted Regressions and Test
No ratings yet
Zellner - 1962 - An Efficient Method of Estimating Seemingly Unreleted Regressions and Test
22 pages
Practice Problem Set
No ratings yet
Practice Problem Set
2 pages
R Programing 6 Feb
No ratings yet
R Programing 6 Feb
10 pages
Elements of Statistics and Probability STA 201 S M Rajib Hossain MNS, BRAC University Lecture-8
No ratings yet
Elements of Statistics and Probability STA 201 S M Rajib Hossain MNS, BRAC University Lecture-8
6 pages
Extra Proves
No ratings yet
Extra Proves
1 page
Homework1 2023
No ratings yet
Homework1 2023
1 page
Appendix: How To Use The Excel Solver To Fit Data To Any Equation
No ratings yet
Appendix: How To Use The Excel Solver To Fit Data To Any Equation
2 pages
Non Linear Least Squares Curve Fitting With Microsoft Excel Solver
No ratings yet
Non Linear Least Squares Curve Fitting With Microsoft Excel Solver
3 pages
Statistical Classification: Fundamentals and Applications
From Everand
Statistical Classification: Fundamentals and Applications
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

Sta 3

Uploaded by

Sta 3

Uploaded by

Beyond parameter estimation

INFO-F-422: Statistical foundations of machine

Bivariate continuous random scatterplot Bivariate density distribution

Input/output training set

All the previous examples are characterized by

Supervised learning Regression and classification

INPUT UNKNOWN OUTPUT PREDICTION

According to the type of output, two prediction tasks:

Supervised learning because of the presence of the outcome

What does “linear” mean? Example of linear models

According to our definition of linear model, then

The method of least squares is designed to provide

yi = β0 + β1 xi + wi , i = 1, . . . , N ŷi = β̂0 + β̂1 xi , i = 1, . . . , N

coefficients) β0 and β1 can be estimated by the least-squares is minimized.

Least-squares solution Empirical risk

From SSEemp we can define the term

Properties of the least-squares estimators (II) Sample correlation coefficient

is the estimator of the conditional expectation in x. y = β0 + β1 x·1 + β2 x·2 + · · · + βn x·n + w

◮ Since ◮ In matrix notation the equation can be written as:

where Y is the [N × 1] response vector, X is the [N × p] data

where wi are assumed uncorrelated, with mean zero and constant

R function lm Analysis of the LS estimate

If the linear dependency assumption holds

The expected empirical error MISE error

◮ Is the empirical risk a good estimate of the MISE

◮ 2 we have the Predicted Square Error

You might also like