0% found this document useful (0 votes)

26 views13 pages

DA Unit 3 Trio

The document covers regression concepts, including simple and multiple linear regression models, estimation methods like least squares and maximum likelihood, and the BLUE property assumptions. It also discusses logistic regression, its applications in various business domains, and the importance of predictive analytics in decision-making. Additionally, it highlights data modeling techniques, types of data, and the role of statistical models in business applications.

Uploaded by

tanaymaniyar895

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views13 pages

DA Unit 3 Trio

Uploaded by

tanaymaniyar895

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

UNIT – III

Regression – Concepts, Blue property assumptions, Least Square Estimation, Variable Rationalization,
and Model Building etc. Logistic Regression: Model Theory, Model fit Statistics, Model Construction,
Analytics applications to various Business Domains etc.

THE SIMPLE LINEAR REGRESSION MODEL

We consider the modeling between the dependent and one independent variable. When there is only one
independent variable in the linear regression model, the model is generally termed as simple linear
regression model. When there are more than one independent variables in the model, then the linear
model is termed as the multiple linear regression model.
The linear model
Consider a simple linear regression model
y = 0 +  1 X + 

where y is termed as the dependent or study variable and X is termed as independent or explanatory
variable. The terms 0 and 1 are the parameters of the model. The parameter 0 is termed as intercept

term and the parameter 1 is termed as slope parameter. These parameters are usually called as regression
coefficients. The unobservable error component  accounts for the failure of data to lie on the straight
line and represents the difference between the true and observed realization of y . There can be several
reasons
for such difference, e.g., the effect of all deleted variables in the model, variables may be qualitative,
inherit randomness in the observations etc. We assume that  is observed as independent and
identically distributed random variable with mean zero and constant variance  2
. Later, we will
additionally assume that  is normally distributed.

The independent variable is viewed as controlled by the experimenter, so it is considered as non-

stochastic whereas y is viewed as a random variable with
E( y) = 0 + 1 X
and
Var( y) =  2 .

11
Var( y | x) =  2 .
When the values 0 , 1 and  2 are known, the model is completely described. The parameters 0 , 1 and
of 
 2 are generally unknown in practice and  is unobserved. The determination of the statistical model
y =  0 + 1 X +  depends on the determination (i.e., estimation ) 0 , 1 and  2 . In order to know the
of
values of these parameters, n pairs of observations (xi , yi )(i = 1,..., n) on ( X , y) are observed/collected
and are used to determine these unknown parameters.

Various methods of estimation can be used to

determine the estimates of the parameters.
Among them, the methods of least squares and
maximum likelihood are the popular methods
of estimation.

BLUE PROPERTY ASSUMPTIONS

• B-BEST
L-LINEAR

U-UNBIASED
E-ESTIMATOR

An estimator is BLUE if the following hold:

It is linear (Regression model)
It is unbiased

It is an efficient estimator(unbiased estimator with least

variance)

Linearity

An estimator is said to be a linear estimator of (f3) if it is a linear function of the sample observations
+X2 + + Xn
N Sample mean is a linear estimator because it is a linear
function of the X values.

21
UNBIASEDNESS

A desirable property of a distribut ion of estimates is that its mean equals the true
mean of the variables being estimated fo r m al l y , a n esti m ato r i s a n u n b i a s e d
e s t i m a t o r i f i ts Sampling distribution has as its expected value equal to the true value
of population.

We also write this as follows:

Similarly, if this is not the case, we say that the estimator is biased

TWO TYPES OF ESTIMATORS

A point estimate of a population parameter is a single value of a statistic.

POINTER ESTIMATORS

For example, the sample mean x is a point estimate of the population mean p. Similarly, the
sample proportion p is a point estimate of the population proportion P.
Interval Estimators

An interval estimate is defined by two numbers, between which a population parameter is said to
lie. For example, a < x < b is an interval estimate of the population mean p. It indicates that the
population mean is greater than a but less than b.

LEAST SQUARES ESTIMATION

Suppose a sample of n sets of paired observations (xi , yi ) (i = 1, 2,..., n) are available. These
observations are assumed to satisfy the simple linear regression model and so we can write
yi = 0 + 1 xi + i (i = 1, 2,..., n).

The method of least squares estimates the parameters 0 and 1 by minimizing the sum of squares of
Difference between the observations and the line in the scatter diagram. Such an idea is viewed from
different perspectives. When the vertical difference between the observations and the line in the
scatter
Diagram is considered and its sum of squares is minimized to obtain the estimates 0 and 1 , the meth
of is known as direct regression.

31
yi

(xi,

Y = 0 + 1 X
(Xi,

xi
Direct regression
Alternatively, the sum of squares of difference between the observations and the line in horizontal direction
in the scatter diagram can be minimized to obtain the estimates of 0 and 1 . This is known as reverse (or
Inverse) regression method.

(xi, Y = 0 +
yi) 1 X

(Xi, Yi)
xi
Reverse regression method ,

Instead of horizontal or vertical errors, if the sum of squares of perpendicular distances between the
observations and the line in the scatter diagram is minimized to obtain the estimates of 0 and 1 , the

method is known as orthogonal regression or major axis regression method.

41
xi)
(Xi
Major axis regression method
Instead of minimizing the distance, the area can also be minimized. The reduced major axis regression
method minimizes the sum of the areas of rectangles defined between the observed data points and the
nearest point on the line in the scatter diagram to obtain the estimates of regression coefficients. This is
shown in the following figure:

(xi yi)

Y = 0 + 1 X

(Xi, Yi)

xi
Reduced major axis method

The method of least absolute deviation regression considers the sum of the absolute deviation of the
observations from the line in the vertical direction in the scatter diagram as in the case of direct
regression to
obtain the estimates of 0 and 1 .

No assumption is required about the form of probability distribution of i in deriving the least squares
estimates. For the purpose of deriving the statistical inferences only, we assume that i ' s are random
variable with E( i) = 0,Var( )i =  2 and Cov (i ,  j ) = 0 for all i  j(i, j = 1, 2,..., n). This assumption is
needed to find the mean, variance and other properties of the least squares estimates. The assumption that
i ' s are normally distributed is utilized while constructing the tests of hypotheses and confidence intervals
of the parameters.

51
Data Analytics UNIT - III

Based on these approaches, different estimates of 0 and 1 are obtained which have different
statisticalproperties. Among them the direct regression approach is more popular. Generally, the
direct regression estimates are referred as the least squares estimates or ordinary least squares
estimates.

6|Page
Data Analytics UNIT - III

7|Page
Data Analytics UNIT - III

8|Page
Data Analytics UNIT - III

9|Page
Data Analytics UNIT - III

LOGISTIC REGRESSION
Logistic regression, or Logistic regression, or Logistic model is a regression model where the
dependent variable (DV) is categorical. Logistic regression was developed by statistician David Cox in
1958.
> Ordinary Least Square
> Maximum Likelihood Estimation
The ordinary least squares, or OLS, can also be called the linear least squares.

OLS and MLE:

OLS -> Ordinary Least Square
MLE -> Maximum Likelihood Estimation
The ordinary least squares, or OLS, can also be called the linear least squares. This is a method
for approximately determining the unknown parameters located in a linear regression model. According
to books of statistics and other online sources, the ordinary by minimizing the total of squared vertical
distances between the observed responses within the dataset and the responses predicted by the linear

10 | P a g e
Data Analytics UNIT - III

approximation. Through a simple formula, you can express the resulting estimation of the linear
regression model.

ANALYTICS APPLICATIONS TO VARIOUS BUSINESS DOMAINS

Predictive Analytics is an art of predicting future on the basis of past trend.

It is a branch of Statistics which comprises of Modelling Techniques, Machine Learning &

Data Mining.

Predictive Analytics is primarily used in Decision Making.

What and Why analytics:

Analytics is a journey that involves a combination of potential skills, advanced technologies,

applications, and processes used by firm to gain business insights from data and statistics. This is done to
perform business planning.
Reporting Vs Analytics:
Reporting is presenting result of data analysis and Analytics is process or systems involved in
analysis of data to obtain a desired output.
Introduction to tools and Environment:
Analytics is now days used in all the fields ranging from Medical Science to Aero science to
Government Activities.
Data Science and Analytics are used by Manufacturing companies as well as to develop their
business and solve various issues by the help of historical data base.
Tools are the software that can be used for Analytics like SAS or R. While techniques are the
procedures to be followed to reach up to a solution.
Various steps involved in Analytics:
Access
Manage
Analyze

11 | P a g e
Data Analytics UNIT - III

Report

Various Analytics techniques are:

Data Preparation
Reporting, Dashboards & Visualization
Segmentation Icon
Forecasting
Descriptive Modelling
Predictive Modelling

APPLICATION OF MODELLING IN BUSINESS:

A statistical model embodies a set of assumptions concerning the generation of the observed data,
and similar data from a larger population. A model represents, often in considerably idealized form, the
data-generating process. Signal processing is an enabling technology that encompasses the fundamental
theory, applications, algorithms, and implementations of processing or transferring information contained
in many different physical, symbolic, or abstract formats broadly designated as signals. It uses
mathematical, statistical, computational, heuristic, and linguistic representations, formalisms, and
techniques for representation, modelling, analysis, synthesis, discovery, recovery, sensing, acquisition,
extraction, learning, security, or forensics. In manufacturing statistical models are used to define
Warranty policies, solving various conveyor related issues, Statistical Process Control etc.
Databases & Type of data and variables:
A data dictionary, or metadata repository, as defined in the IBM Dictionary of Computing, is a
"centralized repository of information about data such as meaning, relationships to other data, origin,
usage, and format”.
Data can be categorized on various parameters like Categorical, Type etc.
Data can be categorized on various parameters like Categorical, Type etc.
Data is of 2 types – Numeric and Character. Again numeric data can be further divided into sub
group of – Discrete and Continuous.
Again, Data can be divided into 2 categories – Nominal and ordinal.
Also based on usage data is divided into 2 categories – Quantitative and Qualitative

12 | P a g e
Data Analytics UNIT - III

Data Modelling Techniques Overview:

Regression analysis mainly focuses on finding a relationship between a dependent variable and
one or more independent variables.
Predict the value of a dependent variable based on the value of at least one independent
variable.
It explains the impact of changes in an independent variable on the dependent variable. Y =
f(X, β) where Y is the dependent variable unknown coefficient
Missing Imputations:
In R, missing values are represented by the symbol NA (not available). Impossible values (e.g.,
dividing by zero) are represented by the symbol NaN (not a number). Unlike SAS, R uses the same
symbol for character and numeric data.

13 | P a g e

Ultrasound For Primary Care - 1st Edition Readable PDF Download
100% (17)
Ultrasound For Primary Care - 1st Edition Readable PDF Download
14 pages
Calculate With Confidence 8th Edition Morris Test Bank Available Instantly
No ratings yet
Calculate With Confidence 8th Edition Morris Test Bank Available Instantly
311 pages
DA Unit-3
No ratings yet
DA Unit-3
11 pages
Lecture 2.3 Model Validation
No ratings yet
Lecture 2.3 Model Validation
16 pages
Writing Ten Core Concepts 2nd Robert P. Yagelski Robert P. Yagelski PDF Download
No ratings yet
Writing Ten Core Concepts 2nd Robert P. Yagelski Robert P. Yagelski PDF Download
25 pages
Yellow Musk Creeper
No ratings yet
Yellow Musk Creeper
7 pages
Unit - 3 PDA
No ratings yet
Unit - 3 PDA
20 pages
Phillips Disaster 1989
50% (2)
Phillips Disaster 1989
24 pages
Unit 3 1
No ratings yet
Unit 3 1
41 pages
BA3 4 5modules
No ratings yet
BA3 4 5modules
258 pages
Lab 3 Unit 2
No ratings yet
Lab 3 Unit 2
7 pages
Resume Shekhar Vijay Barewar
No ratings yet
Resume Shekhar Vijay Barewar
4 pages
64709b0902cd9 RN Ati Capstone Proctored Comprehensive Assessment 2019 B Ati Comprehensive Practice Test B Best Study Guide Version With Complete Solution 2 Revised (1) - 2
No ratings yet
64709b0902cd9 RN Ati Capstone Proctored Comprehensive Assessment 2019 B Ati Comprehensive Practice Test B Best Study Guide Version With Complete Solution 2 Revised (1) - 2
1 page
Da Unit-Iii
No ratings yet
Da Unit-Iii
14 pages
Chapter1 Econometrics IntroductionToEconometrics
No ratings yet
Chapter1 Econometrics IntroductionToEconometrics
42 pages
Chapter 2 Econometrics Simple Linear Regression Analysis
No ratings yet
Chapter 2 Econometrics Simple Linear Regression Analysis
42 pages
Capital Budget
No ratings yet
Capital Budget
15 pages
Econometrics I
No ratings yet
Econometrics I
43 pages
Week 5 Notes
No ratings yet
Week 5 Notes
175 pages
Linear Regression Models
No ratings yet
Linear Regression Models
42 pages
Communication in Freaky Friday
No ratings yet
Communication in Freaky Friday
4 pages
Statistics 3 Notes
No ratings yet
Statistics 3 Notes
90 pages
Simple Linear Regression Analysis..
No ratings yet
Simple Linear Regression Analysis..
51 pages
Lec2 ASE
No ratings yet
Lec2 ASE
86 pages
Social Work in A Digital Age - Ethical and Risk Management Challenges
No ratings yet
Social Work in A Digital Age - Ethical and Risk Management Challenges
12 pages
Simple Regression
No ratings yet
Simple Regression
45 pages
Notes 2
No ratings yet
Notes 2
16 pages
ch12 0
No ratings yet
ch12 0
43 pages
Liao 2020
No ratings yet
Liao 2020
35 pages
Notes Simple Linear Regression Analysis
No ratings yet
Notes Simple Linear Regression Analysis
39 pages
Chapter2 Regression SimpleLinearRegressionAnalysis PDF
No ratings yet
Chapter2 Regression SimpleLinearRegressionAnalysis PDF
42 pages
CHAPTER 19 - Industrialization and Nationalism
No ratings yet
CHAPTER 19 - Industrialization and Nationalism
27 pages
Stephen and Senthamarai Kannan (2017) - Detection of Outliers in Regression Model For Medical Data
No ratings yet
Stephen and Senthamarai Kannan (2017) - Detection of Outliers in Regression Model For Medical Data
7 pages
p6 Angles Studentonline
No ratings yet
p6 Angles Studentonline
10 pages
Chapter2 Regression SimpleLinearRegressionAnalysis
No ratings yet
Chapter2 Regression SimpleLinearRegressionAnalysis
41 pages
Mock Test 5
No ratings yet
Mock Test 5
5 pages
NASA Regression Lecture
No ratings yet
NASA Regression Lecture
268 pages
Chapter 2
No ratings yet
Chapter 2
18 pages
Linear Regression - Module 3
No ratings yet
Linear Regression - Module 3
16 pages
Unit III
No ratings yet
Unit III
18 pages
Lecture3 221109 035214
No ratings yet
Lecture3 221109 035214
87 pages
Unit III Regression
No ratings yet
Unit III Regression
24 pages
GEN-Sup 2020 EN
No ratings yet
GEN-Sup 2020 EN
24 pages
Lesson 12 - Introduction To Regression and Correlation Analysis Regression Analysis
No ratings yet
Lesson 12 - Introduction To Regression and Correlation Analysis Regression Analysis
39 pages
Outline - Simple Regression
No ratings yet
Outline - Simple Regression
51 pages
Unit III
No ratings yet
Unit III
13 pages
Daunit 3
No ratings yet
Daunit 3
32 pages
Econometrics Unit 3 Tedy Best
No ratings yet
Econometrics Unit 3 Tedy Best
147 pages
Chap01-3 (Autosaved)
No ratings yet
Chap01-3 (Autosaved)
51 pages
Mungadze Linear
No ratings yet
Mungadze Linear
21 pages
Linear Regression Models
No ratings yet
Linear Regression Models
41 pages
TSNotes 1
No ratings yet
TSNotes 1
29 pages
Solucionario Capitulo 17 Giancoli Septima Edicion
No ratings yet
Solucionario Capitulo 17 Giancoli Septima Edicion
28 pages
FCDS - RA ch1 Sp21
No ratings yet
FCDS - RA ch1 Sp21
14 pages
DT0400002 en - Fe FRENIC Lift Asíncrono - Síncrono r0b
No ratings yet
DT0400002 en - Fe FRENIC Lift Asíncrono - Síncrono r0b
24 pages
Econometrics Notes
No ratings yet
Econometrics Notes
30 pages
Topic 6B Regression
No ratings yet
Topic 6B Regression
13 pages
2022 - Digital Transformation Towards Education 4.0
No ratings yet
2022 - Digital Transformation Towards Education 4.0
28 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
55 pages
Midterm 2 Nem Veg Leges
No ratings yet
Midterm 2 Nem Veg Leges
9 pages
Unit - Iii
No ratings yet
Unit - Iii
9 pages
Lecture2 241007 162001
No ratings yet
Lecture2 241007 162001
11 pages
Simple Regression Model: Erbil Technology Institute
No ratings yet
Simple Regression Model: Erbil Technology Institute
9 pages
Edu 402 Quiz Solved: Brutal Facts
No ratings yet
Edu 402 Quiz Solved: Brutal Facts
7 pages
STAT630Slide Adv Data Analysis
No ratings yet
STAT630Slide Adv Data Analysis
238 pages
Ma 2024 How AI Use in Organizations Contributes To Employee Competitive Advantage - The Moderating Role of Perceived Organization Support
No ratings yet
Ma 2024 How AI Use in Organizations Contributes To Employee Competitive Advantage - The Moderating Role of Perceived Organization Support
14 pages
Time-Series Econometrics
No ratings yet
Time-Series Econometrics
36 pages
Untitled 472
No ratings yet
Untitled 472
13 pages
(Revised) Simple Linear Regression and Correlation
No ratings yet
(Revised) Simple Linear Regression and Correlation
41 pages
ECON3049 Lecture Notes 1
No ratings yet
ECON3049 Lecture Notes 1
32 pages
Forensic Nursing
No ratings yet
Forensic Nursing
2 pages
Short - Notes - Econometric Methods
No ratings yet
Short - Notes - Econometric Methods
22 pages
How To Write A Research Proposal
No ratings yet
How To Write A Research Proposal
3 pages
Analyst Prep Quants 2024
100% (1)
Analyst Prep Quants 2024
465 pages
Topic0 Introduction
No ratings yet
Topic0 Introduction
9 pages
Module05 Notes
No ratings yet
Module05 Notes
19 pages
Econometric Theory: Module - Iii
No ratings yet
Econometric Theory: Module - Iii
10 pages
Regression 101
No ratings yet
Regression 101
18 pages
Chapter 6: How To Do Forecasting by Regression Analysis
No ratings yet
Chapter 6: How To Do Forecasting by Regression Analysis
7 pages
Entreprene Urship: Welcome
No ratings yet
Entreprene Urship: Welcome
33 pages
Specimen Signature Form
No ratings yet
Specimen Signature Form
1 page
Technical Specifications / Tender Text Wöhr Autoparksysteme GMBH Parklift 462-2,0 D / 462-2,6 D
No ratings yet
Technical Specifications / Tender Text Wöhr Autoparksysteme GMBH Parklift 462-2,0 D / 462-2,6 D
1 page
Linear Regression Chap01
100% (1)
Linear Regression Chap01
7 pages
Data Analytics Unit III
No ratings yet
Data Analytics Unit III
15 pages
Basic Regression Analysis
No ratings yet
Basic Regression Analysis
5 pages
MAN Gas Engines
No ratings yet
MAN Gas Engines
18 pages
Regression and Multiple Regression Analysis
100% (1)
Regression and Multiple Regression Analysis
21 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet

DA Unit 3 Trio

Uploaded by

DA Unit 3 Trio

Uploaded by

UNIT – III

THE SIMPLE LINEAR REGRESSION MODEL

The independent variable is viewed as controlled by the experimenter, so it is considered as non-

Various methods of estimation can be used to

BLUE PROPERTY ASSUMPTIONS

An estimator is BLUE if the following hold:

It is an efficient estimator(unbiased estimator with least

We also write this as follows:

TWO TYPES OF ESTIMATORS

A point estimate of a population parameter is a single value of a statistic.

LEAST SQUARES ESTIMATION

method is known as orthogonal regression or major axis regression method.

OLS and MLE:

ANALYTICS APPLICATIONS TO VARIOUS BUSINESS DOMAINS

Predictive Analytics is an art of predicting future on the basis of past trend.

It is a branch of Statistics which comprises of Modelling Techniques, Machine Learning &

Predictive Analytics is primarily used in Decision Making.

What and Why analytics:

Analytics is a journey that involves a combination of potential skills, advanced technologies,

Various Analytics techniques are:

APPLICATION OF MODELLING IN BUSINESS:

Data Modelling Techniques Overview:

You might also like