0% found this document useful (0 votes)

33 views83 pages

Recal5 RelationAnalysis

The document discusses different types of hypothesis testing strategies and their applications. It also provides an example of analyzing relationships in wage data, examining how wages vary with age, year, and education level. Various measures of analyzing relationships, including correlation and regression analysis, are also introduced.

Uploaded by

shvdo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views83 pages

Recal5 RelationAnalysis

Uploaded by

shvdo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 83

Relation Analysis

1
Hypothesis Testing Strategies

 There are two types of tests of hypotheses

Parametric tests (also called standard test of hypotheses).

 Non-parametric tests (also called distribution-free test of hypotheses)

2
Parametric Tests : Applications
 Usually assume certain properties of the population from
which we draw samples.
• Observation come from a normal population

• Sample size is small

• Population parameters like mean, variance, etc. are hold good.

• Requires measurement equivalent to interval scaled data.

3
Hypothesis Testing : Non-Parametric Test
Non-Parametric tests
o Does not under any assumption
o Assumes only nominal or ordinal data

Note: Non-parametric tests need entire population (or very large sample size)

4
Relationship Analysis
Example: Wage Data

A large data regarding the wages for a group of employees from the eastern
region of India is given.

In particular, we wish to understand the following relationships:

 Employee’s age and wage: How wages vary with ages?
 Calendar year and wage: How wages vary with time?
 Employee’s age and education: Whether wages are anyway related with
employees’ education levels?

5
Relationship Analysis
 Example: Wage Data

 Case I. Wage versus Age

 From the data set, we have a graphical representations, which is as follows:

 How wages vary with ages?

?
How wages vary with ages?

6
Relationship Analysis
 Example: Wage Data
 Employee’s age and wage: How wages vary with ages?

Interpretation: On the average, wage increases with age until about 60 years of age, at
which point it begins to decline.

7
Relationship Analysis
 Example: Wage Data

 Case II. Wage versus Year

 From the data set, we have a graphical representations, which is as follows:

?
How wages vary with time?

8
Relationship Analysis
 Example: Wage Data
 Wage and calendar year: How wages vary with years?

Interpretation: There is a slow but steady increase in the average wage between 2010 and
2016.
.
9
Relationship Analysis
 Example: Wage Data

 Case III. Wage versus Education

 From the data set, we have a graphical representations, which is as follows:

?
Whether wages are related with education?

10
Relationship Analysis
 Example: Wage Data
 Wage and education level: Whether wages vary with employees’ education levels?

Interpretation: On the average, wage increases with the level of education.

11
Relationship Analysis
Given an employee’s wage can we predict his age?

Whether wage has any association with both year and education
level?

etc….

12
An Open Challenge!

Suppose there are countably infinite points in the . We need a huge memory to store all
such points.

Is there any way out to store this information with a least amount of memory?
Say, with two values only.

13
Yahoo!
y=ax+b

Just decide the values of a and b

(as if storing one point’s data only!)

Note: Here, tricks was to find a relationship among all the points.

14
Measures of Relationship
 Univariate population: The population consisting of only one variable.

Here, statistical measures are suffice to find a relationship.

 Bivariate population: Here, the data happen to be on two variables.

15
Measures of Relationship
 Multivariate population: If the data happen to be one more than two variable.

l ume
Vo
Temperature
Pressure

? If we add another variable say viscosity in addition to Pressure, Volume or Temperature?

16
Measures of Relationship
In case of bivariate and multivariate populations, usually, we have to answer two
types of questions:

Q1: Does there exist correlation (i.e., association) between two (or more) variables?
If yes, of what degree?

Q2: Is there any cause and effect relationship between the two variables (in case of
bivariate population) or one variable in one side and two or more variables on the
other side (in case of multivariate population)?
If yes, of what degree and in which direction?

To find solutions to the above questions, two approaches are known.

 Correlation Analysis
 Regression Analysis

17
Correlation Analysis

18
Correlation Analysis
 In statistics, the word correlation is used to denote some form of
association between two variables.
 Example: Weight is correlated with height

Example:

The correlation may be positive, negative or zero.

 Positive correlation: If the value of the attribute A increases with the increase
in the value of the attribute B and vice-versa.
 Negative correlation: If the value of the attribute A decreases with the
increase in the value of the attribute B and vice-versa.
 Zero correlation: When the values of attribute A varies at random with B and
vice-versa.
19
Correlation Analysis
 In order to measure the degree of correlation between two attributes.

100
90
80
70
60
50
40
30
20
10

1 2 3 4 5 6 7
Hours of study

20
Correlation Analysis
 Do you find any correlation between X and Y as shown in the table?.

# CD

# Cigarette

Note:
In data analytics, correlation analysis make sense only when relationship make sense.
There should be a cause-effect relationship.
21
Correlation Analysis
Positive correlation
Negative correlation
7
7 Zero correlation
6 7
6
6
5
5
5
4
4
4
3 3
3

2 2
2

1 1
1

1 2 3 4 5 6 7 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 9 10 11

22
Correlation Coefficient
 Correlation coefficient is used to measure the degree of association.

 It is usually denoted by r.

 The value of r lies between +1 and -1.

 Positive values of r indicates positive correlation between two variables,

whereas, negative values of r indicate negative correlation.

 The value of nearer to +1 or -1 indicates high degree of correlation between

the two variables.
 r = 0 implies, there is no correlation

23
Correlation Coefficient
High Positive Correlation Low Positive Correlation
7
7

6 6

5 5

4 4

3 3

2 2

1
1

1 2 3 4 5 6 7 4 7
1 2 3 5 6

Low Negative Correlation

High Negative Correlation 7

7
6
6
5
5
4
4

3
3

2
2

1
1

4 5 6 7 8 1 2 3 4 5 6 7 8
1 2 3
24
Correlation Coefficient

25
Correlation Coefficient
R = +0.60
R = +0.80

R = +0.80

R = +0.40

26
Measuring Correlation Coefficients
 There are three methods known to measure the correlation coefficients

 Karl Pearson’s coefficient of correlation

 This method is applicable to find correlation coefficient between two numerical
attributes

 Charles Spearman’s coefficient of correlation

 This method is applicable to find correlation coefficient between two ordinal attributes

 Chi-square coefficient of correlation

 This method is applicable to find correlation coefficient between two categorical
attributes

27
Pearson’s Correlation Coefficient

28
Karl Pearson’s Correlation Coefficient
 This is also called Pearson’s Product Moment Correlation

Definition 7.1: Karl Pearson’s correlation coefficient

Let us consider two attributes are X and Y.
The Karl Pearson’s coefficient of correlation is denoted by 𝑟∗ and is defined
as

where

29
Karl Pearson’s coefficient of Correlation
Example 7.1: Correlation of Gestational Age and Birth Weight
 A small study is conducted involving 17 infants to investigate the association between
gestational age at birth, measured in weeks, and birth weight, measured in grams.

30
Karl Pearson’s coefficient of Correlation
Example 7.1: Correlation of Gestational Age and Birth Weight
 We wish to estimate the association between gestational age and infant birth weight.
 In this example, birth weight is the dependent variable and gestational age is the
independent variable. Thus Y = birth weight and X = gestational age.
 The data are displayed in a scatter diagram in the figure below.

31
Karl Pearson’s coefficient of Correlation
Example 7.1: Correlation of Gestational Age and Birth Weight
 For the given data, it can be shown the following

= 0.82

Conclusion: The sample’s correlation coefficient indicates a strong positive correlation

between Gestational Age and Birth Weight.

32
Karl Pearson’s coefficient of Correlation
Example 7.1: Correlation of Gestational Age and Birth Weight
 Significance Test
 To test whether the association is merely apparent, and might have arisen by chance use the t test
in the following calculation

𝑡=𝑟
√ 𝑛− 2
1 −𝑟
 Number of pair of observation is 17. Hence,
2

𝑡=0.82
√
17 −2
1 − 0.82 2
=1.44

 Consulting the t-test table, at degrees of freedom 15 and for , we find that t = 1.753. Thus, the
value of Pearson’s correlation coefficient in this case may be regarded as highly significant.

33
Rank Correlation Coefficient

34
Charles Spearman’s Correlation Coefficient
 This correlation measurement is also called Rank correlation.

 This technique is applicable to determine the degree of correlation between two

variables in case of ordinal data.
 We can assign rank to the different values of a variable with ordinal data type.

Example:

Rank assigned

35
Charles Spearman’s Correlation Coefficient

Definition 7.2: Charles Spearman’s correlation coefficient

The rank correlation can be defined as

 The Spearman’s coefficient is often used as a statistical methods to aid either

providing or disproving a hypothesis.

36
Charles Spearman’s Coefficient of Correlation
Example 7.2: The hypothesis that the depth of a river does not progressively increase with the
width of the river.
W

A sample of size 10 is collected to test the hypothesis, using Spearman’s correlation coefficient.

37
Charles Spearman’s Coefficient of Correlation
Step 1: Assign rank to each data. It is customary to assign rank 1 to the largest data, and 2 to
next largest and so on.
Note: If there are two or more samples with the same value, the mean rank should be
used.

38
Charles Spearman’s Coefficient of Correlation
Step 2: The contingency table will look like

𝑟 𝑠=0.9757
39
Charles Spearman’s Coefficient of Correlation
Step 3: To see, if this value is significant, the Spearman’s rank significance table (or
graph) must be consulted.
Note:

1.0
0.9
0.8
0.7
0.6
Spearaman’s rank correlation

0.5
0.4

0.3 0.1%
0.2 1%
5%
coefficient

0.1
2 4 6 8 10

40
Charles Spearman’s Coefficient of Correlation
Step 4: Final conclusion
From the graph, we see that lies above the line at 8 and 0.01%
significance level. Hence, there is a greater than 99% chance that the
relationship is significant (i.e., not random) and hence the hypothesis
should be rejected.

Thus, we can reject the hypothesis and conclude that in this case, depth of
a river progressively increases the further with the width of the river.

41
χ2-Correlation Analysis

42
Chi-Squared Test of Correlation
 This method is also alternatively termed as Pearson’s –test or simply -test
 This method is applicable to categorical (discrete) data only.

 Suppose, two attributes A and B with categorical values

A = , , ,….., and
B = , , ,…..,
having m and n distinct values.

Between whom we are to find the correlation relationship.

43
–Test Methodology
Contingency Table
Given a data set, it is customary to draw a contingency table, whose structure is given
below.

44
–Test Methodology
Entry into Contingency Table: Observed Frequency
In contingency table, an entry Oij denotes the event that attribute A takes on value ai and
attribute B takes on value bj (i.e., A = ai, B = bj).

45
–Test Methodology
Entry into Contingency Table: Expected Frequency
In contingency table, an entry eij denotes the expected frequency, which can be calculated
as
𝐶𝑜𝑢𝑛𝑡 ( 𝐴=𝑎 𝑖 )× 𝐶𝑜𝑢𝑛𝑡 ( 𝐵=𝑏 𝑗 ) 𝐴 𝑖 × 𝐵 𝑗
𝑒 𝑖𝑗 = =
𝐺𝑟𝑎𝑛𝑑 𝑇𝑜𝑡𝑎𝑙 𝑁

46
– Test

Definition 7.3: χ2-Value

The value ( also known as the Pearson’s test) can be computes as

is the expected frequency

47
– Test
 The cell that contribute the most to the 𝛘2 value are those whose
actual count is very different from the expected.

 The 𝛘2 statistics tests the hypothesis that A and B are independent.

The test is based on a significance level, with (n-1) ×(m-1) degrees
of freedom., with a contingency table of size n×m

 If the hypothesis can be rejected, then we say that A and B are

statistically related or associated.

48
– Test
Example 7.3: Survey on Gender versus Hobby.
 Suppose, a survey was conducted among a population of size 1500. In this survey,
gender of each person and their hobby as either “book” or “computer” was noted.
The survey result obtained in a table like the following.

 We have to find if there is any association between Gender and Hobby of a people,
that is, we are to test whether “gender” and “hobby” are correlated.
49
– Test
Example 7.3: Survey on Gender versus Hobby.
 From the survey table, the observed frequency are counted and entered into the
contingency table, which is shown below.

GENDER
Male Female Total

Book
HOBBY

Computer

Total

50
– Test
Example 7.3: Survey on Gender versus Hobby.
 From the survey table, the expected frequency are counted and entered into the
contingency table, which is shown below.

GENDER
Male Female Total

Book
HOBBY

Computer

Total

51
– Test
 Using equation for 𝛘2 computation, we get

𝛘2 = + + +
=
 This value needs to be compared with the tabulated value of 𝛘2 (available in any
standard book on statistics) with 1 degree of freedom (for a table of m × n, the
degrees of freedom is ; here m = 2, n = 2).
 For 1 degree of freedom, the 𝛘2 value needed to reject the hypothesis at the 0.01
significance level is 10.828. Since our computed value is above this, we reject the
hypothesis that “Gender” and “Hobby” are independent and hence, conclude that the
two attributes are strongly correlated for the given group of people.

52
– Test
Example 7.4: Hypothesis on “accident proneness” versus “driver’s handedness”.
 Consider the following on car accidents among left and right-handed drivers’ of
sample size 175.
 Hypothesis is that “fatality of accidents is independent of driver’s handedness”

HANDEDNESS

Left-Handed Right-Handed Total

Non-Fatal
FATALITY

Fatal

Total
 Find the correlation between Fatality and Handedness and test the significance of the
correlation with significance level 0.1%.
53
Regression Analysis

54
Regression Analysis
 The regression analysis is a statistical method to deal with the formulation of
mathematical model depicting relationship amongst variables, which can be used
for the purpose of prediction of the values of dependent variable, given the values
of independent variables.
 Classification of Regression Analysis Models
 Linear regression models
1. Simple linear regression
2. Multiple linear regression
 Non-linear regression models

Y Y Y

X X X
Simple linear regression Z Multiple linear regression Non-linear regression

55
Simple Linear Regression Model
In simple linear regression, we have only two variables:
 Dependent variable (also called Response), usually denoted as .
 Independent variable (alternatively called Regressor), usually denoted as .
 A reasonable form of a relationship between the Response and the Regressor is the linear
relationship, that is in the form

Y=α+βx

β=tan(θ)
θ

α
Note:
 There are infinite number of lines (and hence )
 The concept of regression analysis deal with finding the best relationship between and
(and hence best fitted values of ) quantifying the strength of that relationship.

56
Regression Analysis

Given the set of data involving pairs of values, our objective is to find “true” or population
regression line such that

Here, is a random variable with and . The quantity is often called the error variance.
Note:
 implies that at a specific , the values are distributed around the “true” regression line (i.e.,
the positive and negative errors around the true line is reasonable).
 are called regression coefficients.

 values are to be estimated from the data. 57

True versus Fitted Regression Line
 The task in regression analysis is to estimate the regression coefficients .
 Suppose, we denote the estimates a for and b for . Then the fitted regression line is

where is the predicted or fitted value.

Ŷ=a+bx

Y=α+βx

58
Least Square Method to estimate
This method uses the concept of residual. A residual is essentially an error in the fit of the model
. Thus, residual is

Ŷ=a+bx
Y ei
Ɛi
Y=α+βx

59
Least Square method
 The residual sum of squares is often called the sum of squares of the errors about the fitted
line and is denoted as SSE
SSE = =

 We are to minimize the value of SSE and hence to determine the parameters of a and b.

 Differentiating SSE with respect to a and b, we have

For minimum value of SSE, 0

60
Least Square method to estimate
Thus we set

+b=

These two equations can be solved to determine the values of and b, and it can be
calculated that

61
: Measure of Quality of Fit
 A quantity , is called coefficient of determination is used to measure the proportion
of variability of the fitted model.
 We have

 It signifies the variability due to error.

 Now, let us define the total corrected sum of squares, defined as

 SST represents the variation in the response values. The is

Note:
 If fit is perfect, all residuals are zero and thus = 1.0 (very good fit)
 If SSE is only slightly smaller than SST, then (very poor fit)

62
: Measure of Quality of Fit

Y Y Ŷ

R2≈ 1.0 (Very good fit) 𝑅 2 ≈ 0 (Very poor fit)

63
Multiple Linear Regression
 When more than one variable are independent variable, then the regression
can be estimated as a multiple regression model
 When this model is linear in coefficients, it is called multiple linear regression
model
 If k-independent variables , …………, are associated, the multiple linear
regression model is given by
++

 And the estimated response is obtained as

64
Multiple Linear Regression
Estimating the coefficients
Let the data points given to us is
( )

where is the observed response to the values of k independent variables .

Thus,
++
and ++

where and are the random error and residual error, respectively associated with true
response and fitted response.

Using the concept of Least Square Method to estimate we minimize the expression

SSE = =

65
Multiple Linear Regression
 Differentiating SSE in turn with respect to and equating to zero, we generate the set of
(k+1) normal estimation equations for multiple linear regression.

+
… … … … … …
… … … … … …
+

 The system of linear equations can be solved for by any appropriate method for solving
system of linear equations.
 Hence, the multiple linear regression model can be built.

66
Non Linear Regression Model
 When the regression equation is in terms of r-degree, r>1, then it is called nonlinear
regression model. When more than one independent variables are there, then it is
called Multiple Non linear Regression model. Also, alternatively termed as
polynomial regression model. In general, it takes the form

 The estimated response is obtained as

67
Solving for Polynomial Regression Model
Given that (); i = 1,2,…,n are n pairs of observations. Each observations would satisfy the
equations:
++
and ++ +
where, r is the degree of polynomial
= is the random error
= is the residual error

Note: The number of observations, n, must be at least as large as r+1, the number of
parameters to be estimated.

The polynomial model can be transformed into a general linear regression model setting ,
…, = . Thus, the equation assumes the form:
++
++r +

This model then can be solved using the procedure followed for multiple linear
regression model.
68
Auto-Regression Analysis

69
Auto Regression Analysis
 Regression analysis for time-ordered data is known as Auto-Regression
Analysis
 Time series data are data collected on the same observational unit at multiple
time periods

Example: Indian rate of price inflation

70
Auto Regression Analysis
 Examples: Which of the following is a time-series data?
 Aggregate consumption and GDP for a country (for example, 20 years of quarterly
observations = 80 observations)

 Yen/$, pound/$ and Euro/$ exchange rates (daily data for 1 year = 365
observations)
 Cigarette consumption per capita in a state, by years

 Rainfall data over a year

 Sales of tea from a tea shop in a season

71
Auto Regression Analysis
 Examples: Which of the following graph is due to time-series data?

72
Use of Time Series Data
 To develop forecast model
 What will the rate of inflation be next year?

 To estimate dynamic causal effects

 If the rate of interest increases the interest rate now, what will be the effect on the
rates of inflation and unemployment in 3 months? in 12 months?
 What is the effect over time on electronics good consumption of a hike in the
excise duty?

 Time dependent analysis

 Rates of inflation and unemployment in the country can be observed only over
time!

73
Modeling with Time Series Data
 Correlation over time
 Serial correlation, also called autocorrelation
 Calculating standard error

 To estimate dynamic causal effects

 Under which dynamic effects can be estimated?

 How to estimate?

 Forecasting model

 Forecasting model build on regression model

74
Auto-Regression Model for Forecasting

 Can we predict the tend at a time say 2017?

75
Some Notations and Concepts
 Yt = Value of Y in a period t

 Data set [Y1, Y2, … YT-1, YT]: T observations on the time series random variable
Y
 Assumptions
 We consider only consecutive, evenly spaced observations
 For example, monthly, 2000-2015, no missing months

 A time series Yt is stationary if its probability distribution does not change over
time, that is, if the joint distribution of (Yi+1, Yi+2, …, Yi+T) does not depend on i.
 Stationary property implies that history is relevant. In other words, Stationary requires the
future to be like the past (in a probabilistic sense).
 Auto Regression analysis assumes that Yt is stationary.

76
Some Notations and Concepts
 There are four ways to have the time series data for AutoRegression analysis

 Lag: The first lag of Yt is Yt-1, its j-th lag is Yt-j

 Difference: The fist difference of a series, Yt is its change between period t and t-
1, that is, yt = Yt - Yt-1

 Log difference: yt = log(Yt) - log(Yt-1)

 Percentage:

77
Some Notations and Concepts
 Autocorrelation
 The correlation of a series with its own lagged values is called autocorrelation
(also called serial correlation)

Definition 7.4: j-th Autocorrelation

The j-th autocorrelation, denoted by ρj is defined as

78
Some Notations and Concepts

 For the given data, say ρ1 = 0.84

 This implies that the Dollars per Pound is highly serially correlated

 Similarly, we can determine ρ2 , ρ3 …. etc., and hence different regression analyses

79
Auto-Regression Model for Forecatsing
 A natural starting point for forecasting model is to use past values of Y, that
is, Yt-1, Yt-2, … to predict Yt

 An autoregression is a regression model in which Yt is regressed against its

own lagged values.
 The number of lags used as regressors is called the order of autoregression
 In first order autoregression (denoted as AR(1)), Yt is regressed against Yt-1

 In p-th order autoregression (denoted as AR(p)), Yt is regressed against, Yt-1, Yt-2,

…,Yt-p

80
p-th Order AutoRegression Model
Definition 7.5: p-th AutoRegression Model

In general, the p-th order autregression model is defined as

is called autoregression coefficients and is the noise term or residue and in

practice it is assumed to Gausian white noise

 For example, AR(1) is

 The task in AR analysis is to derive the "best" values for i = 0, 1, …, p given
a time series Yt.

81
Computing AR Coefficients
 A number of techniques known for computing the AR coefficients
 The most common method is called Least Squares Method (LSM)
 The LSM is based upon the Yule-Walker equations

 Here, ri (i = 1, 2 , 3, …, p-1) denotes the i-th auto correlation coefficient.

 β0 can be chosen empirically, usually taken as zero.

82
Reference

The Elements of Statistical Learning, Data Mining,

Inference, and Prediction (2nd Edn.), Trevor Hastie, Robert
Tibshirani, Jerome Friedman, Springer, 2014.

Republic of The Philippines Department of Education Region Vii, Central Visayas Division of Cebu Province Self-Learning Home Task (SLHT)
100% (2)
Republic of The Philippines Department of Education Region Vii, Central Visayas Division of Cebu Province Self-Learning Home Task (SLHT)
20 pages
De Vera, Crisangelyn C
No ratings yet
De Vera, Crisangelyn C
2 pages
Grievance Report by Evangeline Ano
83% (6)
Grievance Report by Evangeline Ano
19 pages
Unit I. Introduction To The Course: Republic Act 1425
100% (2)
Unit I. Introduction To The Course: Republic Act 1425
59 pages
Correlation and Regression
100% (5)
Correlation and Regression
49 pages
Ducati Monster S4RS 2006 Parts List WWW - Manualedereparatie.info PDF
No ratings yet
Ducati Monster S4RS 2006 Parts List WWW - Manualedereparatie.info PDF
120 pages
Unit 3. Information Search Process
No ratings yet
Unit 3. Information Search Process
34 pages
Baker
No ratings yet
Baker
4 pages
Electronic Evidence Rule
No ratings yet
Electronic Evidence Rule
4 pages
Electrical System (HCR1500-EDII, D20II)
100% (2)
Electrical System (HCR1500-EDII, D20II)
20 pages
07 Relation Analysis
No ratings yet
07 Relation Analysis
86 pages
Data Analytics: Relation Analysis
No ratings yet
Data Analytics: Relation Analysis
88 pages
Module 8 Tle
No ratings yet
Module 8 Tle
13 pages
Business Communication Report
No ratings yet
Business Communication Report
15 pages
Chapter 4-Correlation and Regresssion
No ratings yet
Chapter 4-Correlation and Regresssion
60 pages
ETAP Installation Guide
No ratings yet
ETAP Installation Guide
2 pages
Digital Signatures: CCA Controller of Certifying Authorities
No ratings yet
Digital Signatures: CCA Controller of Certifying Authorities
18 pages
Reference Material II CorrelationAnalysis
No ratings yet
Reference Material II CorrelationAnalysis
21 pages
07 Relation Analysis
No ratings yet
07 Relation Analysis
88 pages
Correlation Analysis MBA
No ratings yet
Correlation Analysis MBA
17 pages
Puente Arizona Et Al v. Arpai Arizona MOTION For Summary Judgment
100% (1)
Puente Arizona Et Al v. Arpai Arizona MOTION For Summary Judgment
31 pages
Semantic Analysis, Scope
No ratings yet
Semantic Analysis, Scope
112 pages
Ch10 - Curve Fitting
No ratings yet
Ch10 - Curve Fitting
157 pages
Group3 CaseStudy3
No ratings yet
Group3 CaseStudy3
7 pages
Curriculum - Vitae: Career Objective
No ratings yet
Curriculum - Vitae: Career Objective
3 pages
Introduction To Compilers
No ratings yet
Introduction To Compilers
14 pages
Knowledge Representation
No ratings yet
Knowledge Representation
62 pages
Lecture - 6.1 and 6.2 - Correlation Analysis
No ratings yet
Lecture - 6.1 and 6.2 - Correlation Analysis
16 pages
Charles Vaughner, Cross-Appellants v. F.J. Pulito, Cross-Appellee v. General Accident Insurance Company of America, the Camden Fire Insurance Association, Potomac Insurance Company of Illinois and Pennsylvania General Insurance Company, Third-Party, 804 F.2d 873, 3rd Cir. (1986)
No ratings yet
Charles Vaughner, Cross-Appellants v. F.J. Pulito, Cross-Appellee v. General Accident Insurance Company of America, the Camden Fire Insurance Association, Potomac Insurance Company of Illinois and Pennsylvania General Insurance Company, Third-Party, 804 F.2d 873, 3rd Cir. (1986)
9 pages
Correlation
No ratings yet
Correlation
19 pages
Type Checking
No ratings yet
Type Checking
21 pages
Chapter10 2
No ratings yet
Chapter10 2
78 pages
Correlation Regression 1
No ratings yet
Correlation Regression 1
9 pages
Correlation Analysis
No ratings yet
Correlation Analysis
102 pages
Assigment 1
No ratings yet
Assigment 1
3 pages
Mba Project Literature Review
100% (2)
Mba Project Literature Review
4 pages
Comp 3 Measure of Relationship and Effect
No ratings yet
Comp 3 Measure of Relationship and Effect
6 pages
26 - Correlation and Regression Analysis
No ratings yet
26 - Correlation and Regression Analysis
50 pages
Pre Calculus
No ratings yet
Pre Calculus
107 pages
Lexical Analysis
No ratings yet
Lexical Analysis
105 pages
Correlation Analysis: Concept of Univariate, Bivariate Data
No ratings yet
Correlation Analysis: Concept of Univariate, Bivariate Data
48 pages
Clustering Basics
No ratings yet
Clustering Basics
39 pages
Ch07 Differentiation
No ratings yet
Ch07 Differentiation
71 pages
Chapter 05
No ratings yet
Chapter 05
62 pages
Mclaren Watch - Google Search
No ratings yet
Mclaren Watch - Google Search
1 page
Correlation Analysis
No ratings yet
Correlation Analysis
54 pages
Correlation and Regression Analysis
100% (1)
Correlation and Regression Analysis
59 pages
Top Down Parsing
No ratings yet
Top Down Parsing
133 pages
Decision Trees Basics
No ratings yet
Decision Trees Basics
32 pages
Quality Practices and Problems in Free Software Projects: Martin Michlmayr, Francis Hunt, David Probert
No ratings yet
Quality Practices and Problems in Free Software Projects: Martin Michlmayr, Francis Hunt, David Probert
5 pages
L6 - Biostatistics - Linear Regression and Correlation
No ratings yet
L6 - Biostatistics - Linear Regression and Correlation
121 pages
Correlation Correlation: Some Commonly Used Jargons Some Commonly Used Jargons
0% (1)
Correlation Correlation: Some Commonly Used Jargons Some Commonly Used Jargons
16 pages
Correlation Analysis and Its Types
No ratings yet
Correlation Analysis and Its Types
50 pages
Attribute Grammars
No ratings yet
Attribute Grammars
20 pages
G3 Correlation Analysis
No ratings yet
G3 Correlation Analysis
60 pages
Egression & Orrelation: Nalysis
0% (1)
Egression & Orrelation: Nalysis
48 pages
Correlation & Regression
100% (1)
Correlation & Regression
23 pages
Short Term Training Programme On Data Analytics Using SPSS and RCMDR
No ratings yet
Short Term Training Programme On Data Analytics Using SPSS and RCMDR
20 pages
1Y0-204 Dumps Citrix Virtual Apps and Desktops 7 Administration
No ratings yet
1Y0-204 Dumps Citrix Virtual Apps and Desktops 7 Administration
7 pages
Correlation
No ratings yet
Correlation
8 pages
Correlation
No ratings yet
Correlation
46 pages
Correlation: Khairil Anuar Md. Isa Bbiomedicalsc. (Hons), Ukm Msc. (Medical Stat), Usm
No ratings yet
Correlation: Khairil Anuar Md. Isa Bbiomedicalsc. (Hons), Ukm Msc. (Medical Stat), Usm
33 pages
Parallel Assignment 4
No ratings yet
Parallel Assignment 4
3 pages
STAT22209 - Chapter 01-Correlation Analyisis - 2022
No ratings yet
STAT22209 - Chapter 01-Correlation Analyisis - 2022
53 pages
Correlation Analysis
100% (1)
Correlation Analysis
51 pages
Correlation: Some Commonly Used Jargons
No ratings yet
Correlation: Some Commonly Used Jargons
19 pages
Lecture 29
No ratings yet
Lecture 29
5 pages
Assignment Project
No ratings yet
Assignment Project
1 page
Correlation (Pearson, Kendall, Spearman)
100% (1)
Correlation (Pearson, Kendall, Spearman)
4 pages
Quantitative Methods
No ratings yet
Quantitative Methods
4 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
23 pages
Correlation Rev 1.0
No ratings yet
Correlation Rev 1.0
5 pages
Measures of Relationship
No ratings yet
Measures of Relationship
17 pages
Correlation and Regration
No ratings yet
Correlation and Regration
8 pages
QT - Unit 2 - Part A - Correlation
No ratings yet
QT - Unit 2 - Part A - Correlation
48 pages
Correlation
No ratings yet
Correlation
34 pages
Lesson 11 - Regression and Correlation Analysis
No ratings yet
Lesson 11 - Regression and Correlation Analysis
8 pages
ZRO Chennai Notification Sol Tech NA 2025 - 26
No ratings yet
ZRO Chennai Notification Sol Tech NA 2025 - 26
23 pages
Lesson 10 Relationship Between Variables
No ratings yet
Lesson 10 Relationship Between Variables
85 pages
Presentation On: Correlation and Rank Correlation: Submitted To
100% (3)
Presentation On: Correlation and Rank Correlation: Submitted To
23 pages
Henry Boot Construction LTD v. Alstom Combined Cycles LTD (Formerly GEC Alsthom Combined Cycles LTD) (1999) EWHC Technology 263 (22nd January, 1999)
No ratings yet
Henry Boot Construction LTD v. Alstom Combined Cycles LTD (Formerly GEC Alsthom Combined Cycles LTD) (1999) EWHC Technology 263 (22nd January, 1999)
16 pages
Department of Educat
No ratings yet
Department of Educat
3 pages
Stats Unit 2
No ratings yet
Stats Unit 2
24 pages
Place of Suppy
No ratings yet
Place of Suppy
15 pages
Correlation Teaching Resource
No ratings yet
Correlation Teaching Resource
57 pages
Doctrinal
No ratings yet
Doctrinal
42 pages
Module III Correlation and Regression
No ratings yet
Module III Correlation and Regression
61 pages
The Significance of Correlation
No ratings yet
The Significance of Correlation
6 pages
Woman-Centered Coaching Revolution - Lesson 1 - Handout
No ratings yet
Woman-Centered Coaching Revolution - Lesson 1 - Handout
28 pages
Cce 68 D 4 CC 4
No ratings yet
Cce 68 D 4 CC 4
28 pages
Correlation
No ratings yet
Correlation
38 pages
Correlation Regression
No ratings yet
Correlation Regression
9 pages
Lecture 10 Correlation
No ratings yet
Lecture 10 Correlation
32 pages
Introduction To Correlation and Regression Analysis
No ratings yet
Introduction To Correlation and Regression Analysis
14 pages
Correlation BMLT
No ratings yet
Correlation BMLT
5 pages
A Roadmap For A 3PL
No ratings yet
A Roadmap For A 3PL
2 pages
Ficha Técnica Del Montacargas XC SERIES 3 WHEEL ELECTRIC FORKLIFT WITH LI-ION BATTERY 3,200-4,000LBS
No ratings yet
Ficha Técnica Del Montacargas XC SERIES 3 WHEEL ELECTRIC FORKLIFT WITH LI-ION BATTERY 3,200-4,000LBS
6 pages
Correlation
No ratings yet
Correlation
8 pages
Correlation Analysis
No ratings yet
Correlation Analysis
49 pages
The Danger of Credit Cards - Updated
No ratings yet
The Danger of Credit Cards - Updated
6 pages
Lecture 3
No ratings yet
Lecture 3
12 pages
ENS185 MODULE 6 Correlation and Regression
No ratings yet
ENS185 MODULE 6 Correlation and Regression
82 pages
Unit II Notes Correlation and Regression
No ratings yet
Unit II Notes Correlation and Regression
19 pages
Unit III Notes
No ratings yet
Unit III Notes
31 pages

Recal5 RelationAnalysis

Uploaded by

Recal5 RelationAnalysis

Uploaded by

Relation Analysis

 There are two types of tests of hypotheses

Parametric tests (also called standard test of hypotheses).

 Non-parametric tests (also called distribution-free test of hypotheses)

• Sample size is small

• Population parameters like mean, variance, etc. are hold good.

• Requires measurement equivalent to interval scaled data.

In particular, we wish to understand the following relationships:

 Case I. Wage versus Age

 How wages vary with ages?

 Case II. Wage versus Year

 Case III. Wage versus Education

Interpretation: On the average, wage increases with the level of education.

Just decide the values of a and b

Here, statistical measures are suffice to find a relationship.

 Bivariate population: Here, the data happen to be on two variables.

? If we add another variable say viscosity in addition to Pressure, Volume or Temperature?

To find solutions to the above questions, two approaches are known.

The correlation may be positive, negative or zero.

 The value of r lies between +1 and -1.

 Positive values of r indicates positive correlation between two variables,

 The value of nearer to +1 or -1 indicates high degree of correlation between

Low Negative Correlation

 Karl Pearson’s coefficient of correlation

 Charles Spearman’s coefficient of correlation

 Chi-square coefficient of correlation

Definition 7.1: Karl Pearson’s correlation coefficient

Conclusion: The sample’s correlation coefficient indicates a strong positive correlation

 This technique is applicable to determine the degree of correlation between two

Definition 7.2: Charles Spearman’s correlation coefficient

The rank correlation can be defined as

 The Spearman’s coefficient is often used as a statistical methods to aid either

 Suppose, two attributes A and B with categorical values

Between whom we are to find the correlation relationship.

Definition 7.3: χ2-Value

The value ( also known as the Pearson’s test) can be computes as

is the expected frequency

 The 𝛘2 statistics tests the hypothesis that A and B are independent.

 If the hypothesis can be rejected, then we say that A and B are

Left-Handed Right-Handed Total

 values are to be estimated from the data. 57

where is the predicted or fitted value.

 Differentiating SSE with respect to a and b, we have

For minimum value of SSE, 0

 It signifies the variability due to error.

 SST represents the variation in the response values. The is

R2≈ 1.0 (Very good fit) 𝑅 2 ≈ 0 (Very poor fit)

 And the estimated response is obtained as

where is the observed response to the values of k independent variables .

 The estimated response is obtained as

Example: Indian rate of price inflation

 Rainfall data over a year

 Sales of tea from a tea shop in a season

 To estimate dynamic causal effects

 Time dependent analysis

 To estimate dynamic causal effects

 Forecasting model build on regression model

 Can we predict the tend at a time say 2017?

 Lag: The first lag of Yt is Yt-1, its j-th lag is Yt-j

 Log difference: yt = log(Yt) - log(Yt-1)

Definition 7.4: j-th Autocorrelation

The j-th autocorrelation, denoted by ρj is defined as

 For the given data, say ρ1 = 0.84

 Similarly, we can determine ρ2 , ρ3 …. etc., and hence different regression analyses

 An autoregression is a regression model in which Yt is regressed against its

 In p-th order autoregression (denoted as AR(p)), Yt is regressed against, Yt-1, Yt-2,

In general, the p-th order autregression model is defined as

is called autoregression coefficients and is the noise term or residue and in

 For example, AR(1) is

 Here, ri (i = 1, 2 , 3, …, p-1) denotes the i-th auto correlation coefficient.

 β0 can be chosen empirically, usually taken as zero.

The Elements of Statistical Learning, Data Mining,

You might also like