0% found this document useful (0 votes)

241 views27 pages

Linear Algebra For Business Analytics

This document provides an overview of key concepts from linear algebra that are useful for business analytics applications. It covers vectors, matrices, systems of linear equations, and linear regression. Vectors are defined as ordered lists of numbers. Key vector concepts discussed include the zero vector, unit vectors, addition, and the inner product. Matrices are defined as arrays of numbers, and matrix operations like transpose, addition, and multiplication are explained. The document also introduces systems of linear equations and linear regression, which are important applications of linear algebra concepts.

Uploaded by

Bom Villatuya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

241 views27 pages

Linear Algebra For Business Analytics

Uploaded by

Bom Villatuya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Linear Algebra for Business Analytics

Marcel Scharth
The University of Sydney Business School

This version: 11/8/2017

This reference material for Business Analytics students covers basic concepts from
linear algebra that are helpful in applications. You can use this guide either to learn
the essentials, or as a reference that you can always go back to as you come across
these concepts. This material draws on Klein (2013) and Boyd and Vandenberghe
(2016), with the latter being particularly well suited as a complete resource for business
analytics applications.
Here, we follow a practical approach and do not cover abstract linear algebra, which
includes concepts that would be part of a traditional linear algebra course such as vector
spaces. While powerful, this level of abstraction is not a requirement for our purposes.
Instead, you will find that investing in the material below considerably simplifies things
from the perspective of understanding practical methods.

Contents

1. Vectors 4

1.1. What is a vector? 4

1.2. Special vectors 5

1.3. Vector operations 6

1.4. Inner Product 6

1.5. Linear functions 8

1.6. Norm 8

1.7. Distance 9

1.8. Orthogonal vectors 10

2. Matrices 10

2.1. What is a matrix? 10

2.2. Row and column vectors 11

2.3. Transpose 12

2.4. Addition and scalar multiplication 13

2.5. Matrix-vector multiplication 13

2.6. Matrix-matrix multiplication 14

2.7. Square matrices 15

2.8. Identity and diagonal matrices 16

2.9. Systems of Linear Equations 17

2.10. Matrix inverse 18

2.11. Trace† 19

3. Differentiation 19

4. Random vectors 20

5. Application: linear regression and least squares 22

5.1. Multiple Linear Regression (MLR) model 22

5.2. Least squares 23

5.3. Sampling properties 25

References 27

The † symbol indicates subsections that are less important and can be initially skipped.
You can always go back to them when you come across these concepts.
4

1. Vectors

1.1. What is a vector?

A vector is an ordered finite list of numbers. We typically write vectors as vertical

arrays, surrounded by square or curved brackets, as in

   
−1 −1
   
 0   0 
   
  or  
 2.5   2.5 
   
−7.2 −7.2

 
5  
  1
a= 
−2 , a =
 
1
−3

We also write vectors as numbers separated by commas.

a = (5, −2, −3), a = (1, 1).

The elements or entries of a vector are the values in the array.

The size (or dimension) of a vector is the number of elements it contains.

A vector of size n is called an n−vector.

 
a1
 
 .. 
 . 
 
a= 
 ai 
 
 .. 
 . 
 
an

a = (a1 , · · · , ai , · · · , an )
5

A vector with n entries, each belonging to R (the set of real numbers), is called a
n-vector over R. We denote the set of n-vectors over R as Rn .

a = (a1 , a2 , a3 ) ∈ (R, R, R) ≡ R3

We can also define a vector as a function from a finite set D to R. For example, a
function from D = {0, 1, 2, . . . , d − 1} to R. The vector

a = (6, −4, −3.7)

is the function

0 7−→ 6
1 7−→ −4
2 7−→ −3.7,

where 7−→ reads “maps to”.

This last definition is useful as it matches how we work with vectors in Python.
For example, if we store the above vector as a Python list called a, the command
a[0] returns 6, the first element of the vector. More formally, we say that the above
definition lends itself to representation in a data structure (a format for organising and
storing data). Python objects such as lists, dictionaries, and NumPy arrays are data
structures that can represent vectors.

1.2. Special vectors.

Zero vector. A zero vector has all elements equal to zero 0 = (0, 0, ..., 0). 0n indicates
a zero vector with dimension n.

Unit vector. A unit vector has all elements equal zeros, except one element which is
equal to one. ei = [0, ..., 0, 1, 0, ..., 0]T (all zeros except for 1 at i-th position).

Ones vector. A ones vector has elements equal to one, 1 = [1, 1, ..., 1]T . 1n indicates
a ones vector with dimension n. We also use the notation ι for this type of vector
6

Sparsity. A vector is said to be sparse if many of its elements are equal to zero.

1.3. Vector operations.

Vector equality. a = b ⇐⇒ ai = bi for all i = 1, 2, ..., n.

Scalar-vector multiplication. Let α denote a scalar. The vector α a is the vector

with elements {α ai }. For example, let a = (5, −2, −3), then

0.5 a = (0.5 × 5, 0.5 × −2, 0.5 × −3) = (2.5, −1, −1.5)

Addition. let a and b be two vectors with the same size n. The sum c = a + b is the
vector with elements ci = ai + bi .

Let a = (5, −2, −3) and b = (−1, 2, 4). Then,

c = a + b = (5, −2, −3) + (−1, 2, 4) = (5 − 1, −2 + 2, −3 + 4) = (4, 0, 1).

Linear combination. Let a and b are n−vectors and β1 and β2 are scalars, the
n−vector
β1 a + β2 b
is called a linear combination of a and b. The scalars β1 and β2 are the coefficients
of the linear combination.

Let a = (5, −2, −3), b = (−1, 2, 4), β1 = 2, and β2 = 3.

2a + 3b = (2 × 5, 2 × −2, 2 × −3) + (3 × −1, 3 × 2, 3 × 4) = (7, 2, 18)

1.4. Inner Product.

We define the dot or inner product of two n-dimensional vectors a and b as

∑
n
T
a b = a1 b 1 + a2 b 2 + . . . + an b n = ai b i
i=1
7

Example: a = (2, −1, 3) and b = (5, −2, −3), then

aT b = 2 × 5 + (−1) × (−2) + 3 × (−3) = 3

Some authors use the notation ⟨a, b⟩ for inner products.

Properties. The following are useful properties of inner products that follow easily
from the definition.

(αa)T b = α(aT b)

aT b = bT a

aT (b + c) = aT b + aT c
.

Examples.

Sum. ιT a = a1 + a2 + . . . + an is the sum the elements of a.

Average. (1/n)(ιT a) is the average of the elements of a.

Sum of squares. aT a = a21 + . . . + a2n is the sum of squares of the elements of a.

   
3 1
   
4 1
   
n = 4, x =   , 1 =   ,
2 1
   
7 1
1 1
⇒ ιT x = (1 × 3 + 1 × 4 + 1 × 3 + 1 × 7) = 4.
4 4
8

1.5. Linear functions.

The notation f : Rn −→ R means that f is a function that maps an n−vector to a

real number. If x is an n−vector, then f (x) (a scalar) is the value of the function at
x. In this setting, we refer to x as the argument of the function.

Let x and y be n−vectors and α and β be scalars. A linear function is a function

that satisfies the property

f (αx + βy) = αf (x) + βf (y)

We can always represent a linear function as an inner product. Let a be an n−vector.

The we can write any linear function using the form

f (x) = aT x = a1 x1 + a2 x2 + . . . an xn ,

where x is an n−vector. Here, a is fixed, and the argument x can be any n−vector.

For example, in a linear regression model f (x) is the regression function, x are the
predictor values, and a are the model parameters.

An affine function is a linear function plus a constant, that is

f (x) = aT x + b,

for a scalar b.

1.6. Norm.

The Euclidean norm or ℓ2 -norm of a vector is

( )1/2
√ ∑
n
∥a∥2 = aT a = a2i .
i=1

This is the distance from the origin to the point a or the length of the vector.

a
The normalized vector ∥a∥2
has unit norm.
9

Example.

Let x be a vector with sample average zero. Then ∥x∥2 is the sum of squares of x and
sx = ∥x∥2 /n is the sample variance.

General definition. A norm ∥ · ∥ is a function that satisfies the following properties:

(1) ∥a∥ ≥ 0 (non-negativity).

(2) ∥a∥ = 0 only if x = 0 (definiteness).
(3) ∥αa∥ = |α| × ∥a∥ (homogeneity).
(4) ∥a + b∥ ≤ ∥a∥ + ∥b∥ (triangle inequality).

The ℓ1 norm of a vector is

∑
n
∥a∥1 = |a1 | + |a2 | + . . . + |an | = |ai |.
i=1

The Chebyshev or ℓ∞ norm is given by

∥a∥∞ = max{|a1 |, |a2 |, . . . , |an |}.

The Minkowski norm of order p is

( n )1/p
∑
∥a∥p = |ai |p ,
i=1

for p ≥ 1. This is a generalisation of the previous norms. We include this here because
the scikit-learn package in Python sometimes refers to the Minkowski norm by
default, even if p = 2.

1.7. Distance.

The Euclidean distance between two vectors x and y is the norm of the difference
vector x − y:
√ ( n )1/2
∑
dist(x, y) = ∥x − y∥2 = (x − y)T (x − y) = (xi − yi )
2

i=1
10

Every norm ∥ · ∥ induces a distance metric ∥x − y∥.

1.8. Orthogonal vectors.

Two vectors are orthogonal, written a ⊥ b, if and only if their inner product is zero,
aT b = 0.

Example:

   
1 1 √
a =   , b =   , ∥a∥ = ∥b∥ = 2, aT b = 0.
1 −1

   
1 .6
c =  , d =  ,
−3 .2

√ √
∥c∥ = 10 ≈ 3.16, ∥d∥ = .4 ≈ 0.63, cT d = 0, and aT c = −2.

2. Matrices

2.1. What is a matrix?

A matrix is a rectangular two-dimensional array of numbers such as

 
0 1 −2.3 0.1
 
A=
1.3 4 −0.1 7 


4.1 −1 0 1.7

The size (or dimensions) of a matrix are the number of rows and columns. The
matrix above has 3 rows and 4 columns, so the size is 3 × 4 (it reads 3-by-4).
11

We represent an (m × n) matrix as
 
a11 a12 · · · a1j ··· a1n
 
a a22 · · · a2j ··· a2n 
 21 
 . .. .. .. .. .. 
 . 
 . . . . . . 
A=



 ai1

ai2 · · · aij ··· ain 

 .. .. .. .. .. .. 
 . . . . . . 
 
am1 am2 · · · amj · · · amn ,
with A ∈ Rm×n .

We also represent a matrix as A = {aij }. In a design matrix in regression analysis, the

index i = 1, 2, ..., m refers to the statistical units, and the index j = 1, 2, ..., n to the
variables or attributes.

2.2. Row and column vectors.

A column vector is a m × 1 matrix. We do not distinguish between vectors and column

vectors.  
a
 1
 
 a2 
a= 
 .. 
 . 
 
am

In the same way, a row vector is a 1 × m matrix.

[ ]
a = a1 a2 . . . a m

The transpose of a column vector is the corresponding row vector and vice-versa.
 T
a
 1
  [ ]
 a2 
  = a a ... a
 ..  1 2 m
 . 
 
am
12

 
a
 1
[ ]T  
 a2 
a1 a2 . . . am =  . 


 .. 
 
am

We can represent a matrix X as a partitioned matrix whose generic block is the 1 × n

row vector xiT = [xi1 , xi2 , ..., xij , ..., xin ], which contains the profile of the i-th row unit,
 
xT
 1
 .. 
 . 
 
 
X= xT 
 i 
 .. 
 . 
 
T
xm
Alternatively, we can partition as

X = [x1 , x2 , ..., xj , ..., xn ],

where xj is the m × 1 column vector referring to the j-th variable or attribute.

2.3. Transpose.

The transpose of an m × n matrix A yields an n × m matrix that interchanges the

rows and columns of A.

 
a11 a21 · · · ai1 · · · am1
 
a ai2 · · · am2 
 12 a22 · · · 
 . .. .. .. .. .. 
 . 
 . . . . . . 
T 
A = 

 a1j a2j · · · aij · · · amj 
 
 .. .. .. .. .. .. 
 . . . . . . 
 
a1n a2n · · · ain · · · amn

The transpose has the property that (AT )T = A

Example.  
2 3 4  

 1
 2 1 3 −2
 2 −1  
A= , A = 3 2 −2 4 
T 

 3 −2 5 
 
4 −1 5 1
−2 4 1

2.4. Addition and scalar multiplication.

Scalar Multiplication. αa = {αaij }.

Matrix addition. If A is an m × n matrix and B an m × n matrix, then

A + B = {aij + bij } .

This can only be performed if A and B have the exact same dimensions.

Note that (A + B)T = AT + BT .

Example.

       

2 3  1 2 2 + 1 3 + 2  3 5
+ = =
3 −2 3 4 3 + 3 −2 + 4 6 2

2.5. Matrix-vector multiplication.

The product of an m × n matrix A with an n-vector b is an m-vector c with element

i equal to the inner product of the row i of A with b.
∑
n
ci = aTi b = aij bj ,
j=1

where aTi denotes i-th row of A.

Example.
14

     
1 4   1×2+4×1 6
  2    
7 −3 ×   = 7 × 2 − 3 × 1 =  11 
     
1
2 −5 2×2−5×1 −1

2.6. Matrix-matrix multiplication.

The product of an m × p matrix A with an p × n matrix B is an m × n matrix C with

element ij equal to the inner product of the row i of A with column j of B
∑
p
cij = aTi bj = aik bkj
k=1

where aTi denotes i-th row of A and bj denotes j-th column of B.

The matrix partitions that we use in the multiplication are

 
aT
 1
 .. 
 . 
 
 
A =  aiT  , B = [b1 , ..., bj , ..., bn ]
 
 .. 
 . 
 
T
am

The multiplication (AB) is only defined when the column dimension of A (m × p

matrix) equals the row dimensionB (p × n matrix).

Example.

       

2 3  1 2 2 × 1 + 3 × 3 2 × 2 + 3 × 4  11 16 
× = =
3 −2 3 4 3×1−2×3 3×2−2×4 −3 −2

Properties of matrix multiplication.

(AB)T = BT AT
(AB)C = A(BC)
A(B + C) = AB + AC
(A + B)C = AC + BC

Unlike in scalar multiplication, the order of multiplication matters for matrices: in

general, AB̸=BA. Moreover, remember that if m ̸= n, BA is not even defined.

Example.

   
1 0   1
  2 −1  
A= 
5 −1 , B =
  , ι3 = 1
 
3 6
3 2 1

 
2 −1 [ ]
 
C = AB =  7 −11

 , ι T
C = 21 −3 .
12 9

BA is not defined.

Vector outer product. If a is an m-vector and b is an n- vector, the outer product

abT is the m × n matrix

 
ab a1 b 2 . . . a 1 bn
 1 1 

 a2 b 1 a2 b 2 . . . a 2 bn 

abT = 
 .. .. .. 
 . . . 
 
am b 1 am b 2 . . . a m b n

2.7. Square matrices.

A square matrix has the same number or rows and columns m = n.

Symmetric matrix. A square matrix A is symmetric if AT = A.

Quadratic form. Let A be an n dimensional square matrix and x an n × 1 vector.

The scalar xT Ax is called a quadratic form.

A symmetric matrix A with the property that xT Ax > 0 for any vector x is said to
be positive definite.

2.8. Identity and diagonal matrices.

The diagonal elements of a matrix are the elements aij such that i = j (same row and
column index).

An identity matrix of order n is a matrix with all diagonal elements equal to one
(aii = 1 for i = 1, . . . , n), and all non-diagonal elements equal to zero, that is
 

1 0 ··· 0 0
 ... 
0 1 0 0
 
.. . . .. 
 .... .. .. .
In =   = diag(1, ..., 1)
 
 ... 
0 0 1 0
 
0 0 ··· 0 1

For example,  
  1 0 0
1 0  
I2 =  , I3 =  
0 1 0 ,
0 1
0 0 1

Properties.

Let A be an m × n matrix.

In2 = In
Im A = A
17

AIn = A

Diagonal matrix. A diagonal matrix is a square matrix with zeros in all the non-
diagonal positions.
 
d
 1
0 ··· 0 0
 ... 
0 d2 0 0
 
. .. .. .. .. 
 .. .
D=  . . .  = diag(d1 , ..., dn )
 
 .. 
0 0 . dn−1 0 
 
0 0 ··· 0 dn

Let D be an n × n diagonal matrix and A an n × p matrix. The operation DA

multiplies each row i of A by the diagonal element di of D.

2.9. Systems of Linear Equations.

Consider a system of m linear equations with n variables.

a11 x1 + a12 x2 + . . . + a1n xn = b1

a21 x1 + a22 x2 + . . . + a2n xn = b2
..
.
am1 x1 + am2 x2 + . . . + amn xn = b2

This system has a compact representation in matrix notation

Ax = b,

where      
a a21 . . . a1n x b
 11   0  1
     
 a21 a22 . . . a2n   x1   b2 
A=
 .. .. .. ..     
 , x =  ..  , b =  ..  .
 . . . .   .   . 
     
am1 am2 . . . amn xn bm
18

The study of linear systems is a fundamental part of linear algebra, which allows us to
determine whether a system has an unique solution, infinitely many solutions, or no
solutions, and to obtain a solution if one exists.

Example.

We can write the system



2x1 + 2x2 + x3 = 9



2x1 − x2 + 2x3 = 6



x − x + 2x
1 2 3 = 5
as
Ax = b,
where    
2 2 1 9
   
A = 2 −1 2 , b = 6
  
.
1 −1 2 5

The unique solution is x = (1, 2, 3).

2.10. Matrix inverse.

An n × n matrix A is invertible is there exists a matrix B such that

AB = In

If that is the case then we call B the inverse of A, and use the notation A−1 .

There are several methods for calculating a matrix inverse, but we will leave the details
in the background. It is often the case in practice that the we do not actually need to
explictly compute the matrix inverse to evaluate expressions in which it appears (for
example in the formula for the OLS).

Properties.
19

(A−1 )−1 = A
(αA)−1 = (1/α)A−1
(AT )−1 = (A−1 )T
(AB)−1 = B −1 A−1

2.11. Trace† .

The trace of a square matrix is the sum of its diagonal elements. If A is n × n,

∑
n
tr(A) = aii .
i=1

Properties.

tr(αA) = α tr(A)
tr(A + B) = tr(A) + tr(B)
tr(AT ) = tr(A)
tr(AB) = tr(BA)

3. Differentiation

Let f : Rn −→ R be a function. The gradient of f is the vector of partial derivatives

of the function with respect to each of its arguments.
20

 
∂f (x)
 ∂x1 
 
 ∂f (x) 
 
 ∂x2 
 
▽f (x) = 
 .. 

 . 
 
 
 ∂f (x) 
∂xn

We also use the notation:

( )
d f (x) ∂f (x) ∂f (x) ∂f (x)
= , ,...,
dx ∂x1 ∂x2 ∂xn

There are several convenient rules for differentiating linear algebra operations with
respect to vectors. The two following rules appear in the derivation of the least squares
estimates of a linear regression.

Let x and a be n−vectors and A a matrix with column dimension n. Then,

d(x′ a)
=a
dx

d(x′ Ax)
= (A + A′ )x
dx

The Matrix Algebra Cookbook found online contains a comprehensive catalog.

4. Random vectors

A random vector or multivariate random variable is a vector with entries that

are scalar-valued random variables.
21

Let X = (X1 X2 . . . Xn ) be a random vector. The mean vector, or expected value

of X is a n-vector over R defined as

 
E(X1 )
 
 
 E(X2 ) 
E(X) = 
 ..  
 . 
 
E(Xn )

Let a and b be non-random scalars and Y be another random vector with dimension
n. Then
E(aX + bY ) = aE(X) + bE(Y ),
which follows from the linearity of expectations.

For a non-random n−vector a,

E(aT X) = aT E(X).

Let A be a non-random matrix with n columns.

E(AX) = AE(X)

We define the variance of the random vector as the square matrix

 
Var(X1 ) Cov(X1 , X2 ) . . . Cov(X1 , Xn )
 

 Cov(X2 , X1 ) Var(X2 ) . . . Cov(X2 , Xn )

Var(X) = 
 .. .. .. .. 

 . . . . 
 
Cov(Xn , X1 ) Cov(Xn , X2 ) ... Var(Xn )

Var(X) is also known as the variance-covariance or covariance matrix of X.

Var(X) = E(XX T ) − E(X)E(X)T

Let a be a non-random vector and b a scalar. Then,

Var(a + bX) = b2 Var(X)

For a non-random n−vector a,

Var(aT X) = aT Var(X)a.

For a non-random matrix A with n columns,

Var(AX) = AVar(X)AT .

5. Application: linear regression and least squares

5.1. Multiple Linear Regression (MLR) model.

The classical MLR model is characterised by the following set of assumptions.

1. Linearity: if X = x, then

Y = β0 + β1 x1 + . . . + βp xp + ε

for some population parameters β0 , β1 , . . . , βp and a random error ε.

2. The conditional mean of ε given X is zero, E(ε|X) = 0.
3. Constant error variance: Var(ε|X) = σ 2 .
4. Independence: the observations are independent.
5. The distribution of X1 , . . . , Xp is arbitrary.
6. There is no perfect multicollinearity (no column of X is a linear combination
of other columns).

The model equation for an observed indexed by i is

Yi = β0 + β1 xi1 + β2 xi2 + . . . + βp xip + εi

We can therefore write the model of a sample of n observations as

Y1 = β0 + β1 x11 + β2 x12 + . . . + βp x1p + ε1

Y2 = β0 + β1 x21 + β2 x22 + . . . + βp x2p + ε2
..
.
Yn = β0 + β1 xn1 + β2 xn2 + . . . + βp xnp + εn

Compact matrix notation:

Y = Xβ + ε,
where
   
  1 x11 x12 . . . x1p β  
Y    0 ε
 1      1
 ..   1 x21 x22 . . . x2p  β1 
Y =  .  X=   β =   ε =  ... 

.
   .. .. .. .. .   ..   

 . . . . .. 

 . 
 
Yn εn
1 xn1 xn2 . . . xnp βp

Assumptions 2 and 4 imply that

Var(ε) = σ 2 In

5.2. Least squares.

Let {(yi , xi )}ni=1 be a sample. The ordinary least squares (OLS) method obtains
the coefficient values that minimise the residual sum of squares (RSS):
 2
∑
n ∑
p
βb = argmin yi − β0 − βj xij 
β i=1 j=1

The partial derivatives of the RSS with respect to the coefficients are
 
∂RSS(β) ∑n ∑
p
= −2 yi − β0 − βj xij 
∂β0 i=1 j=1
24
 
∂RSS(β) ∑n ∑p
= −2 xi1 yi − β0 − βj xij 
∂β1 i=1 j=1
 
∂RSS(β) ∑
n ∑
p
= −2 xi2 yi − β0 − βj xij 
∂β2 i=1 j=1
..
. 
∂RSS(β) ∑n ∑p
= −2 
xip yi − β0 − βj xij 
∂βp i=1 j=1

Note that each partial derivative j contains a sum that is the inner product of the j-th
column of X with the vector of residuals (y − Xβ). We can therefore write the above
equations using the compact notation

▽RSS(β) = −2X T (Y − Xβ),

where  
y
 1
 
 y2 
y= 
 .. 
 . 
 
yn

We obtain the first order condition by setting the gradient to zero,

d(RSS(β))
= −2X T y + 2X T Xβ = 0.
dβ

The least squares estimate βb therefore satisfies the system of linear equations:

X T X βb = X T y

Note that X T X is a (p + 1) × (p + 1) matrix and X T y is a (p + 1)-vector, such that

this expression has the form of Section 2.9.

If (X T X) is invertible, which is the case if Assumption 6 of no perfect multicollinearity

is satisfied, left multiplication with (X T X)−1 gives the unique solution

βb = (X T X)−1 X T y.
25

Solution using vector differentiation rules.

 2  2
∑
n ∑
p ∑
n ∑
p
RSS(β) = yi − β0 − βj xij  yi − β0 − βj xij 
i=1 j=1 i=1 j=1

= (y − Xβ) (y − Xβ)
T

= y T y − 2β T X T y + β T X T Xβ

The gradient is
d(RSS(β)) d(y T y) d(2β T X T y) d(β T X T Xβ)
= − + =
dβ dβ dβ dβ
= 0 − 2X T y + 2X T Xβ

The first order condition is therefore

d(RSS(β))
= −2X T y + 2X T Xβ = 0.
dβ

Therefore, as above
X T X βb = X T y,
leading to
βb = (X T X)−1 X T y.

5.3. Sampling properties.

We first obtain the following convenient representation of the estimator.

βb = (X T X)−1 X T Y
= (X T X)−1 X T (Xβ + ε)
= (X T X)−1 X T Xβ + (X T X)−1 X T ε
= β + (X T X)−1 X T ε

Below, all the results are conditional on the predictor values in the X matrix. We omit
this conditioning from the notation for simplicity.

Expected value.

b = E(β + (X T X)−1 X T ε)
E(β)
[ ]
= β + E (X T X)−1 X T ε
= β + (X T X)−1 X T E(ε)
= β + (X T X)−1 X T 0
=β

The least squares estimator is unbiased under the model assumptions.

Variance.
27

b = Var(β + (X T X)−1 X T ε)
Var(β)
= Var((X T X)−1 X T ε)
= E((X T X)−1 X T εεT X(X T X)−1 )
= (X T X)−1 X T E(εεT )X(X T X)−1
= (X T X)−1 X T (σ 2 I)X(X T X)−1
= σ 2 (X T X)−1 X T X(X T X)−1
= σ 2 (X T X)−1

References

Boyd, S. and L. Vandenberghe (2016). Vectors, matrices, and least squares. Available:
stanford.edu/class/ee103/mma.pdf .
Klein, P. N. (2013). Coding the matrix: Linear algebra through applications to computer
science. Newtonian Press.

National University: Calculus and Analytical Geometry Course Outline According To OBE FALL-2021
No ratings yet
National University: Calculus and Analytical Geometry Course Outline According To OBE FALL-2021
5 pages
Statistical Analysis with Excel Complete Self-Assessment Guide
From Everand
Statistical Analysis with Excel Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet
Mat 132 - Vector (Custech) - 1
No ratings yet
Mat 132 - Vector (Custech) - 1
38 pages
Surds PQ
No ratings yet
Surds PQ
21 pages
Mathematical Analysis II 2nd Edition Vladimir A. Zorich 2024 Scribd Download
100% (5)
Mathematical Analysis II 2nd Edition Vladimir A. Zorich 2024 Scribd Download
53 pages
Exercises of Stochastic Processes
From Everand
Exercises of Stochastic Processes
Simone Malacrida
No ratings yet
DPP-Modulus & Greatest Integer Functions
No ratings yet
DPP-Modulus & Greatest Integer Functions
5 pages
Les Houches Lecture Notes On Moduli Spaces of Riemann Surfaces
No ratings yet
Les Houches Lecture Notes On Moduli Spaces of Riemann Surfaces
46 pages
Tutorial 1
No ratings yet
Tutorial 1
4 pages
CP1 - Proof by Induction
No ratings yet
CP1 - Proof by Induction
7 pages
M11S1T1W6 Matrices
No ratings yet
M11S1T1W6 Matrices
24 pages
MAT2613 Jan 2024
No ratings yet
MAT2613 Jan 2024
6 pages
Python Natural Language Processing Cookbook: Over 60 recipes for building powerful NLP solutions using Python and LLM libraries
From Everand
Python Natural Language Processing Cookbook: Over 60 recipes for building powerful NLP solutions using Python and LLM libraries
Zhenya Antić
No ratings yet
Verify Trigonometric Identities by Transforming Each Side of An Equation Into The Same Form
No ratings yet
Verify Trigonometric Identities by Transforming Each Side of An Equation Into The Same Form
1 page
12TH
100% (1)
12TH
6 pages
Logic Handout
No ratings yet
Logic Handout
28 pages
MATH165 CO1 Lesson 1.8 Intro To Functions
No ratings yet
MATH165 CO1 Lesson 1.8 Intro To Functions
22 pages
29 Kaunlaran (23 April 2024)
No ratings yet
29 Kaunlaran (23 April 2024)
10 pages
Aditya Jain PPT Presentation
No ratings yet
Aditya Jain PPT Presentation
14 pages
Internet Value Chain 2022 1
No ratings yet
Internet Value Chain 2022 1
54 pages
Basic Training in Mathematics - Fitness Program For Science Students PDF
100% (2)
Basic Training in Mathematics - Fitness Program For Science Students PDF
390 pages
Draft Syllabus For Auditing
No ratings yet
Draft Syllabus For Auditing
4 pages
Draft Syllabur For Financial Accounting and Reporting
No ratings yet
Draft Syllabur For Financial Accounting and Reporting
5 pages
1 Factors Multiples and Primes Ws PDF
100% (1)
1 Factors Multiples and Primes Ws PDF
2 pages
CIE 115 SAS 1 and 2 Highlighted Version
100% (1)
CIE 115 SAS 1 and 2 Highlighted Version
12 pages
1 - (Math) Reviewer
No ratings yet
1 - (Math) Reviewer
5 pages
Guess How Much I Love You
No ratings yet
Guess How Much I Love You
1 page
Jottings: RC Makati - Through The Years
No ratings yet
Jottings: RC Makati - Through The Years
22 pages
Villatuya - Jose Ramon-ICD Profile CV (March.28.2022)
No ratings yet
Villatuya - Jose Ramon-ICD Profile CV (March.28.2022)
2 pages
FRE 528 Applied Econometrics University of British Columbia Fall, 2018
No ratings yet
FRE 528 Applied Econometrics University of British Columbia Fall, 2018
4 pages
Quadratic Eqn Quiz 22
No ratings yet
Quadratic Eqn Quiz 22
11 pages
Calculus Problem Set 1: Functions With Solutions. (Online)
No ratings yet
Calculus Problem Set 1: Functions With Solutions. (Online)
1 page
MsIB Presentation For Companies - Compressed
No ratings yet
MsIB Presentation For Companies - Compressed
26 pages
Arithmetic Operations
No ratings yet
Arithmetic Operations
3 pages
Leadership Team Ry 2021-2022: Jottings
No ratings yet
Leadership Team Ry 2021-2022: Jottings
12 pages
Jottings: Club Launches Hack4Food Challenge To Address Challenges in Agriculture
No ratings yet
Jottings: Club Launches Hack4Food Challenge To Address Challenges in Agriculture
6 pages
Kelleher 2E TOC Preface
No ratings yet
Kelleher 2E TOC Preface
15 pages
CMSC 420: Lecture 12 Balancing by Rebuilding - Scapegoat Trees
No ratings yet
CMSC 420: Lecture 12 Balancing by Rebuilding - Scapegoat Trees
5 pages
EC505 F16 Persson Bjorn1
No ratings yet
EC505 F16 Persson Bjorn1
2 pages
Dfai Markov Chains 02 PDF
No ratings yet
Dfai Markov Chains 02 PDF
25 pages
Microsoft Excel Statistical and Advanced Functions for Decision Making
From Everand
Microsoft Excel Statistical and Advanced Functions for Decision Making
Palani Murugappan
No ratings yet
CH-1 - Introduction-Updated
No ratings yet
CH-1 - Introduction-Updated
55 pages
Syllabus Decision Theory
No ratings yet
Syllabus Decision Theory
4 pages
Appendix: Statistical Models of Attrition
No ratings yet
Appendix: Statistical Models of Attrition
3 pages
IE506 Bagging Boosting April5 6
No ratings yet
IE506 Bagging Boosting April5 6
14 pages
Table of Specification Grade 7
No ratings yet
Table of Specification Grade 7
2 pages
Course p015 Cost Benefit Analysis
No ratings yet
Course p015 Cost Benefit Analysis
2 pages
Journal of Statistical Software: Reviewer: Samuel E. Buttrey Naval Postgraduate School
No ratings yet
Journal of Statistical Software: Reviewer: Samuel E. Buttrey Naval Postgraduate School
4 pages
Dining With The Data: The Case of New York City and Its Restaurants
No ratings yet
Dining With The Data: The Case of New York City and Its Restaurants
7 pages
Basic R Programming: Exercises
No ratings yet
Basic R Programming: Exercises
7 pages
Applied Econometrics & Time Series Analysis Homework 1
No ratings yet
Applied Econometrics & Time Series Analysis Homework 1
5 pages
Applied Regression Analysis For Business - Tools, Traps and Applications (PDFDrive)
No ratings yet
Applied Regression Analysis For Business - Tools, Traps and Applications (PDFDrive)
294 pages
Gradient Descent
No ratings yet
Gradient Descent
18 pages
Lecture 1 Full
No ratings yet
Lecture 1 Full
15 pages
Bayesian AB Testing For Business Decisions
No ratings yet
Bayesian AB Testing For Business Decisions
8 pages
Markov Chains - Lectures - CMC - 2024
No ratings yet
Markov Chains - Lectures - CMC - 2024
168 pages
QB - Signal & System
No ratings yet
QB - Signal & System
17 pages
Jukic
No ratings yet
Jukic
18 pages
02-03 ASAP Business Analytics-2 Descriptive Statistics
No ratings yet
02-03 ASAP Business Analytics-2 Descriptive Statistics
109 pages
Analysis of Past Year Questions - Maths PMR
No ratings yet
Analysis of Past Year Questions - Maths PMR
1 page
Mastering Time Series Analysis and Forecasting with Python
From Everand
Mastering Time Series Analysis and Forecasting with Python
Sulekha Aloorravi
No ratings yet
Tungban Machine Learning Math Course
No ratings yet
Tungban Machine Learning Math Course
124 pages
Polymath 5 Overview
No ratings yet
Polymath 5 Overview
15 pages
CH04
No ratings yet
CH04
17 pages
Machine Learning in 10 Pages PDF
No ratings yet
Machine Learning in 10 Pages PDF
10 pages
(Xinfeng Zhou) A Practical Guide To Quantitative Org
100% (3)
(Xinfeng Zhou) A Practical Guide To Quantitative Org
96 pages
Time-Varying Parameter VAR Model Using TVP-VAR Package: Jouchi Nakajima
0% (1)
Time-Varying Parameter VAR Model Using TVP-VAR Package: Jouchi Nakajima
5 pages
Regression: UNIT - V Regression Model
100% (1)
Regression: UNIT - V Regression Model
21 pages
Chapter 2 (The Well Ordering Principle)
No ratings yet
Chapter 2 (The Well Ordering Principle)
7 pages
K-Means Clustering Algorithm
No ratings yet
K-Means Clustering Algorithm
13 pages
LDA 01 Linear Discriminant Analysis
No ratings yet
LDA 01 Linear Discriminant Analysis
65 pages
An Introduction to Statistical Computing: A Simulation-based Approach
From Everand
An Introduction to Statistical Computing: A Simulation-based Approach
Jochen Voss
No ratings yet
01
100% (1)
01
12 pages
Statistics For Managers PDF
No ratings yet
Statistics For Managers PDF
58 pages
Using Excel For Principles-Of-Econometrics
No ratings yet
Using Excel For Principles-Of-Econometrics
29 pages
Stochastic Modelling & Its Applications
No ratings yet
Stochastic Modelling & Its Applications
19 pages
Mit Data Science Machine Learning Program Brochure
No ratings yet
Mit Data Science Machine Learning Program Brochure
17 pages
White Paper - Multiplicative MMM Simplified
No ratings yet
White Paper - Multiplicative MMM Simplified
8 pages
Lecture 1
No ratings yet
Lecture 1
46 pages
Introduction To Probability
No ratings yet
Introduction To Probability
88 pages
Camm 4e Ch01 PPT
No ratings yet
Camm 4e Ch01 PPT
48 pages
Examples and Problems in Mathematical Statistics
From Everand
Examples and Problems in Mathematical Statistics
Shelemyahu Zacks
5/5 (2)
5 Regression Analysis
No ratings yet
5 Regression Analysis
43 pages
Phuong Nguyen: The Complete Guide To Cluster Analysis Using Python
No ratings yet
Phuong Nguyen: The Complete Guide To Cluster Analysis Using Python
68 pages
The Econometric Modelling of Financial Time Series: Terence C. Mills
100% (1)
The Econometric Modelling of Financial Time Series: Terence C. Mills
11 pages
Matrices and Vectors. - . in A Nutshell: AT Patera, M Yano October 9, 2014
No ratings yet
Matrices and Vectors. - . in A Nutshell: AT Patera, M Yano October 9, 2014
19 pages
2003 Makipaa 1
No ratings yet
2003 Makipaa 1
15 pages
Effective Amazon Machine Learning
From Everand
Effective Amazon Machine Learning
Alexis Perrier
No ratings yet
C and Data Structures Unit Wise Important Questions
100% (3)
C and Data Structures Unit Wise Important Questions
15 pages
3 Multiple Linear Regression: Estimation and Properties: Ezequiel Uriel Universidad de Valencia Version: 09-2013
100% (1)
3 Multiple Linear Regression: Estimation and Properties: Ezequiel Uriel Universidad de Valencia Version: 09-2013
37 pages
Analytics On Spreadsheets: Business Analytics: Methods, Models, and Decisions, 1
No ratings yet
Analytics On Spreadsheets: Business Analytics: Methods, Models, and Decisions, 1
39 pages
Probability and Probability Distribution
No ratings yet
Probability and Probability Distribution
100 pages
SSRN Id3177534 PDF
No ratings yet
SSRN Id3177534 PDF
11 pages
Statistics
100% (2)
Statistics
146 pages
MT416 - BCommII - Introduction To Business Analytics - MBA - 10039 - 19 - PratyayDas
No ratings yet
MT416 - BCommII - Introduction To Business Analytics - MBA - 10039 - 19 - PratyayDas
44 pages
Mastering Parallel Programming with R
From Everand
Mastering Parallel Programming with R
Simon R. Chapple
No ratings yet
Discriminant Analysis
100% (1)
Discriminant Analysis
20 pages
Macroeconomics Chapter 1
No ratings yet
Macroeconomics Chapter 1
24 pages
Analysis of Time Series
100% (1)
Analysis of Time Series
27 pages
Business Statistics
No ratings yet
Business Statistics
20 pages
Econometrics Exam
No ratings yet
Econometrics Exam
8 pages
Reader 133A
No ratings yet
Reader 133A
164 pages
Forecasting: Components of Time Series Analysis
No ratings yet
Forecasting: Components of Time Series Analysis
18 pages
Variable Selection
No ratings yet
Variable Selection
15 pages
Introduction Econometrics R
No ratings yet
Introduction Econometrics R
48 pages
Little Book of R For Multivariate Analysis
No ratings yet
Little Book of R For Multivariate Analysis
51 pages
Alternative To Profit Maximisation
No ratings yet
Alternative To Profit Maximisation
11 pages
MATLAB Linear Algebra
100% (4)
MATLAB Linear Algebra
9 pages
CH 1 - MDA - 6e - PH
No ratings yet
CH 1 - MDA - 6e - PH
32 pages
Introduction To Stata
No ratings yet
Introduction To Stata
181 pages
App.A - Detection and Estimation in Additive Gaussian Noise PDF
No ratings yet
App.A - Detection and Estimation in Additive Gaussian Noise PDF
55 pages
Market Basket Analysis Using: R Tool
No ratings yet
Market Basket Analysis Using: R Tool
23 pages