CSE3506 - Essentials of Data Analytics: Facilitator: DR Sathiya Narayanan S
CSE3506 - Essentials of Data Analytics: Facilitator: DR Sathiya Narayanan S
Email: [email protected]
Handphone No.: +91-9944226963
Education
Experience
Y ≈ β0 + β1 X
sales ≈ β0 + β1 TV
β̂1 are the least squares coefficient estimates for simple linear
regression, and they give the best linear fit on the given training
data.
Figure 2 shows the simple linear regression fit to the Advertising data,
where β̂0 = 7.03 and β̂1 = 0.0475.
Facilitator: Dr Sathiya Narayanan S VIT-Chennai - SENSE Winter Semester 2020-21 11 / 36
Module 1: Regression Analysis
Question 1.1
Question 1.2
Consider the following five training examples
X = [2 3 4 5 6]
Y = [12.8978 17.7586 23.3192 28.3129 32.1351]
We want to learn a function f (x) of the form f (x) = ax + b which is
parameterized by (a, b). Using squared error as the loss function, which of
the following parameters would you use to model this function.
(a) (4 3)
(b) (5 3)
(c) (5 1)
(d) (1 5)
Question 1.3
Question 1.4
When you perform multiple linear regression, which among the following
are questions you will be interested in?
where TSS = ni=1 (yi − ȳ )2 is the total sum of squares. Note that
P
R 2 statistic is independent of the scale of Y , and it always takes a
value between 0 and 1.
Facilitator: Dr Sathiya Narayanan S VIT-Chennai - SENSE Winter Semester 2020-21 22 / 36
Module 1: Regression Analysis
Question 1.5
Consider the following five training examples
X = [2 3 4 5 6]
Y = [12.8978 17.7586 23.3192 28.3129 32.1351]
We want to learn a function f (x) of the form f (x) = ax + b which is
parameterized by (a, b).
Bias-Variance Tradeoff
Correlation
When comparing two random variables, say x1 and x2 , covariance
Cov(x1 , x2 ) is used to determine how much these two vary together,
whereas correlation Corr(x1 , x2 ) is used to determine whether a
change in one variable will result in a change in another.
For multiple data points, the covariance matrix is given by
(X − m)(X − m)T
C= .
n
where X = [x1 x2 ...] is the data matrix with n columns (each column
is one data point) and m is the mean vector of the data points.
Correlation, a normalized version of the covariance, is expressed as
Cov(x1 , x2 )
Corr(x1 , x2 ) = .
σx1 σx2
Correlation
Time series modeling deals with the time based data. Time can be
years, days, hours, minutes, etc.
Time series forecasting involves fitting a model on time based data
and using it to predict future observations.
Time series forecasting serves two purposes: understanding the
pattern/trend in the time series data and forecasting/extrapolating
the future values of it. The forecast package in R contains functions
which serve these purposes.
In time series forecasting, the AutoRegressive Integrated Moving
Average (ARIMA) model is fitted to the time series data either to
better understand the data or to predict future points in the series.
Components of a time series are level, trend, seasonal, cyclical and
noise/irregular (random) variations.
H0 : µ1 = µ2 = µ3 = ...
Ha : µ1 6= µ2 6= µ3 6= ...
Question 1.6
Assume there are 3 canteens in a college and the sale of an item in those
canteens during first week of February-2021 is as follows:
Module-1 Summary