0% found this document useful (0 votes)

56 views30 pages

Data Science With Python Relationship

1) Understanding relationships between variables is a critical step in data analysis. Relationships can be measured using summary tables, calculations, and visualizations like scatterplots. 2) Scatterplots show relationships between continuous variables and can reveal if the relationship is positive, negative, linear, or nonlinear. Summary tables describe the central tendency and dispersion of data. 3) Common metrics for measuring relationships include correlation coefficients, t-tests, ANOVA, and chi-square tests. Correlation coefficients measure the strength and direction of linear relationships between variables.

Uploaded by

Raj Chouhan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

56 views30 pages

Data Science With Python Relationship

Uploaded by

Raj Chouhan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

DATA SCIENCE WITH PYTHON

UNDERSTANDING RELATIONSHIPS

3/22/19
UNDERSTANDING RELATIONSHIPS

3/22/19
OVERVIEW

• A critical step in making sense of data is an understanding of the relationships between different
variables.

• E.g., is there a relationship between interest rates and inflation or education level and income?

• The existence of an association between variables does not imply that one variable causes another

• How to measure?
• Summary tables
• Specific calculations
• Visualization tools

3/22/19 – content to be used for explanation/reference educational purposes, Slide no. 3

SCATTERPLOTS

• For continuous variables

• Positive or negative or no
relationship at all

• linear or nonlinear relationships

• Outlier detection

3/22/19 – content to be used for explanation/reference educational purposes, Slide no. 4

SUMMARY TABLES – DESCRIBE METHOD

• Generates descriptive statistics that

summarize the central tendency,
dispersion and shape of a dataset’s
distribution, excluding NaN values.

• Analyzes both numeric and object

series, as well as DataFrame column
sets of mixed data types.

• The output will vary depending on what

is provided.

3/22/19 – content to be used for explanation/reference educational purposes, Slide no. 5

METRICS ABOUT RELATIONSHIPS

• Correlation Coefficients
• Pearson’s … developed by Karl Pearson over 120 years
• Spearman
• Kendall Tau

• t-Tests Comparing Two Groups

• ANOVA
• ANOVA-1 way
• ANOVA-2 way
• ANOVA-N way

• Chi-Square tests

3/22/19 – content to be used for explanation/reference educational purposes, Slide no. 6

COVARIANCE - COV

Covariance provides insight into how two variables are related to one another.

More precisely, covariance refers to the measure of how two random variables in a data set will change
together.

A positive covariance means that the two variables at hand are positively related, and they move in the
same direction.

A negative covariance means that the variables are inversely related, or that they move in opposite
directions.

Covariance always has units In a finance context, covariance is the term used to
describe how two stocks will move together.

3/22/19 – content to be used for explanation/reference educational purposes, Slide no. 7

CALCULATE COVARIANCE - COV

In this formula,
X represents the independent variable,
Y represents the dependent variable,
N represents the number of data points in the sample,

x-bar represents the mean of the X and

y-bar represents the mean of the dependent variable Y.

- Covariance values are not standardized. COV values ranges from -∞ to +∞

- With covariance, there is no minimum or maximum value, so the values are more difficult to
interpret. For example, a covariance of 50 may show a strong or weak relationship; this depends on
the units in which covariance is measured.

3/22/19 – content to be used for explanation/reference educational purposes, Slide no. 8

CORRELATION

Definition
• Correlation is used to test relationships between quantitative variables or categorical variables.

• In other words, it’s a measure of how things are related.

• The study of how variables are correlated is called correlation analysis.

Example:-
• Your caloric intake and your weight.
• Your eye color and your relatives’ eye colors.
• The amount of time your study and your GPA.

Some examples of data that have a low correlation (or none at all):
• A dog’s name and the type of dog biscuit they prefer.
• The cost of a car wash and how long it takes to buy a soda inside the station.
3/22/19 – content to be used for explanation/reference educational purposes, Slide no. 9
CORRELATION

Correlation is defined as covariance normalized by the product of standard deviations, so the correlation
between X and Y

Correlation coefficients are standardized.. value is always between –1 and 1

Correlation does not have units.

For example,
• a correlation of 0.9 indicates a very strong relationship in which two variables nearly always move in
the same direction;

• a correlation of –0.1 shows a very weak relationship in which there is a slight tendency for two
variables to move in opposite directions.

3/22/19 – content to be used for explanation/reference educational purposes, Slide no. 10

FORMULA

• r = correlation coefficient

• The formula used to calculate r is

3/22/19 – content to be used for explanation/reference educational purposes, Slide no. 11

CORR - EX

Correlation can have a value:

• 1 is a perfect positive correlation

• 0 is no correlation (the values don't seem linked at all)
• -1 is a perfect negative correlation

3/22/19 Slide no. 12

KEY POINTS

X1 X2 X3 X4 … … y

- Relationship between the predictors and target variables should be strong

- Relationship amongst the predictors – indicates multi-collinearity and redundancy
- Is a HUGE practical problem
- Inflates the coefficients of the variables
- How to handle?
- Feature selection to restrict the columns
- Use of advanced ML techniques (like Ridge, Lasso, XGB etc ..)
- Pearson’s corr coeff – good for continuous variables
- assumes a linear relationship between the two variables
- sensitive to outliers

3/22/19 – content to be used for explanation/reference educational purposes, Slide no. 13

RELATIONSHIP WITH R2

• Another way to interpret Pearson correlation is to use the coefficient of determination, also knows as
R2.

• While ρ is unitless, its square is interpreted at the proportion of variance of Y explained by X.

• ρ = -0.65 implies that (-0.652)*100 = 42% of variation in Y can be explained by X.

3/22/19 – content to be used for explanation/reference educational purposes, Slide no. 14

USAGE

Types of research questions a Pearson correlation can examine:

• Is there a statistically significant relationship between age, as measured in years, and height, measured
in inches?
• Is there a relationship between temperature, measured in degrees Fahrenheit, and ice cream sales,
measured by income?
• Is there a relationship between job satisfaction, as measured by the JSS, and income, measured in
dollars?

Assumptions
• For the Pearson r correlation, both variables should be normally distributed (normally distributed
variables have a bell-shaped curve).
• Other assumptions include linearity and homoscedasticity.
• Linearity assumes a straight line relationship between each of the two variables and homoscedasticity
assumes that data is equally distributed about the regression line.
3/22/19 – content to be used for explanation/reference educational purposes, Slide no. 15
SPEARMAN'S CORRELATION

• special case of Pearson ρ applied to ranked (sorted) variables.

• Unlike Pearson, Spearman's correlation is not restricted to linear relationships.

• Instead, it measures monotonic association (only strictly increasing or decreasing, but not
mixed) between two variables and relies on the rank order of values.

• In other words, rather than comparing means and variances, Spearman's coefficient looks
at the relative order of values for each variable.

• This makes it appropriate to use with both continuous and discrete data.

3/22/19 – content to be used for explanation/reference educational purposes, Slide no. 16

FORMULA

• The formula for Spearman's coefficient looks very similar to that of Pearson, with the
distinction of being computed on ranks instead of raw scores:

3/22/19 – content to be used for explanation/reference educational purposes, Slide no. 17

KENDALL TAU (RANK CORRELATION)

• based on a ranking of the observations for two variables. Kendalls' τ does not take into account the
difference between ranks — only directional agreement. Therefore, this coefficient is more
appropriate for discrete data.

• based on counts of concordant and discordant pairs of observations.

• observations A and B, Variables X, Y

concordant discordant
difference of the values for Variable X
(XB − XA or 2 − 1 = 1)

difference of the values for Variable Y

(YB − YA or 4 − 2 = 2).
Since these differences are in the same direction— A discordant pair occurs when the differences of the two
1 and 2 are both positive—the observations A and B are concordant variables’ values move in opposite directions.

3/22/19 – content to be used for explanation/reference educational purposes, Slide no. 18

EXPLANATION

Observation Variable X Variable Y Concordant Discordant • the observations A–J are ordered using Variable X, and each
name
unique pair of observations is compared.
A 1 2 8 1
• A is compared with all other observations (B, C, . . . , J) and
B 2 4 6 2
the number of concordant and discordant pairs are counted
C 3 1 7 0
D 4 3 6 0 • For observations A, there are eight concordant pairs (A–B,
A–D, A–E, A–F, A–G, A–H, A–I, A–J) and one discordant pair
E 5 6 4 1 (A–C).
F 6 5 4 0
• repeated for all other observations
G 7 7 2 1
H 8 8 2 0 • Observation B is compared to observations C through J,
observation C compared to D though J, and so on.
I 9 10 0 1
J 10 9 0 0 • Range between -1 to 1
SUM 39 6 • 1 indicating a perfect ranking
• −1 a perfect disagreement of the rankings.
𝜏A = (39 − 6) / 45 • A zero value— assigned when the ranks are tied
𝜏A = 0.73 —indicates a lack of association,
• Cannot SQ the coeff to get coeff of determination

3/22/19 – content to be used for explanation/reference educational purposes, Slide no. 19

INTERPRETATION

• The Kendall’s rank correlation coefficient can be calculated in Python using the kendalltau() SciPy
function.

• The test takes the 2 data samples as arguments and returns

• the correlation coefficient and
• the p-value.

• As a statistical hypothesis test, the method assumes (H0) that there is no association between the two
samples.

3/22/19 – content to be used for explanation/reference educational purposes, Slide no. 20

T-TESTS COMPARING TWO GROUPS

• The t test (also called Student’s T Test) compares two means and tells us if they are different from
each other. The t test also tells us how significant the differences are

• This concept can be extended to compare the mean values of two subsets. We can explore if the
means of two groups are different enough to say the difference is significant differentiability

• binary classification problem,

• each observation can be classified either into class C1 or class C2,

• t-Statistics helps us evaluate if the values of a particular feature for class C1 is significantly different
from values of same feature for class C2.

• If this holds, then the feature can helps us to better differentiate our data.

3/22/19 – content to be used for explanation/reference educational purposes, Slide no. 21

EXPLANATION

• Does the salary of a person impact his chances to get a loan ?

• Calculate mean and variance

• Salaries of individuals when the loan was approved
• Salaries of individuals when the loan was not approved

• Use t-statistics to check whether these two samples are significantly different or not.

• t- Statistics is computed using:

• calculate the t-Statistic for each feature,

• sort these values in descending order in order to select the important features.

3/22/19 – content to be used for explanation/reference educational purposes, Slide no. 22

INTERPRETATION

• If abs(t-statistic) <= critical value Accept null hypothesis that the means are equal.

• If abs(t-statistic) > critical value Reject the null hypothesis that the means are equal.
• the first mean is smaller or greater than the second mean.

• If p > alpha Accept null hypothesis that the means are equal.
• If p <= alpha Reject null hypothesis that the means are equal.

3/22/19 – content to be used for explanation/reference educational purposes, Slide no. 23

T-TEST ASSUMPTIONS

• the scale of measurement applied to the data collected follows a continuous or ordinal scale, such as
the scores for an IQ test.

• the data is collected from a representative, randomly selected portion of the total population.

• when plotted, results in a normal distribution, bell-shaped distribution curve.

• The fourth assumption is a reasonably large sample size is used. A larger sample size means the
distribution of results should approach a normal bell-shaped curve.

• The final assumption is the homogeneity of variance. Homogeneous, or equal, variance exists when
the standard deviations of samples are approximately equal.

3/22/19 – content to be used for explanation/reference educational purposes, Slide no. 24

CHI-SQUARE TEST

• for use with variables measured on a categorical (nominal or ordinal) features.

• When to Use the Chi-Square Goodness of Fit Test

• The sampling method is simple random sampling.
• The variable under study is categorical.
• The expected value of the number of sample observations in each level of the variable is at
least 5.

3/22/19 Slide no. 25

CHI-SQUARE TEST

• allows an analysis of whether there is a relationship between two categorical variables.

• Null Hypothesis: The two categorical variables are independent.

• Alternative Hypothesis: The two categorical variables are dependent.

• The chi-square test statistic is calculated by using the formula:

• O represents the observed frequency.

• E is the expected frequency under the null hypothesis & computed by:

3/22/19 Slide no. 26

CHI-SQUARE TEST

• Interpretation

3/22/19 – content to be used for explanation/reference educational purposes, Slide no. 27

ANOVA (ONE-WAY)

• The analysis of variance (ANOVA) can be thought of as an extension to the t-test.

• The independent t-test is used to compare the means of a condition between 2 groups but sometimes we want to
compare more than 2 groups
• E.g. to test whether voter age differs based on some categorical variable like race/Education level,
• compare the means of each level.
• Alternatively, a separate t-test for each pair of groups,
• The analysis of variance or ANOVA is a statistical inference test that lets you compare multiple groups at the
same time.

• H0: No difference between means, i.e. ͞x1 = ͞x2 = ͞x3

• Ha: Difference between means exist somewhere, i.e. ͞x1 ≠ ͞x2 ≠ ͞x3, or ͞x1 = ͞x2 ≠ ͞x3, or ͞x1 ≠ ͞x2 = ͞x3

3/22/19 – content to be used for explanation/reference educational purposes, Slide no. 28

ASSUMPTION

• There are 3 assumptions that need to be met for the results of an ANOVA test to be considered accurate
and trust worthy. It’s important to note the the assumptions apply to the residuals and not the variables
themselves.

• The ANOVA assumptions are the same as for linear regression and are:

• Normality - Caveat to this is, if group sizes are equal, the F-statistic is robust to violations of normality

• Homogeneity of variance - Same caveat as above, if group sizes are equal, the F-statistic is robust to this violation

• Independent observations

3/22/19 – content to be used for explanation/reference educational purposes, Slide no. 29

1-WAY, 2-WAY, N-WAY ANOVA

• a one-way ANOVA should be used if you have 1 categorical independent variable (IV) with 2+ categories
or groups and 1 continuous dependent variable (DV);

• this is a 1 factor design.

• The two-way ANOVA is an extension to the one-way ANOVA and should be used if you have 2
categorical IVs with 2+ groups, and 1 continuous DV;

• this is a multi-factor design, specifically a 2 factor design. It’s a 2 factor design, because there are 2 IVs.

• In the ANOVA framework, IVs are often called factors and each category/group within an IV is called a
level. Just as with a one-way ANOVA, a two-way ANOVA tests if there is a difference between the
means, but it does not tell which groups differ.

3/22/19 – content to be used for explanation/reference educational purposes, Slide no. 30

Biostatistics PPT - 6
No ratings yet
Biostatistics PPT - 6
35 pages
Correlation
No ratings yet
Correlation
83 pages
Examining Relationships in Quantitative Research
No ratings yet
Examining Relationships in Quantitative Research
9 pages
Coorelation
No ratings yet
Coorelation
8 pages
Pearson and Spearman Correlation
No ratings yet
Pearson and Spearman Correlation
50 pages
PRP1001 JXH1003 Week 7 2024 No Notes
No ratings yet
PRP1001 JXH1003 Week 7 2024 No Notes
49 pages
Correlation and Regression
100% (5)
Correlation and Regression
49 pages
Correlation Covariance & Model Selection
No ratings yet
Correlation Covariance & Model Selection
29 pages
Correlation Coefficient
No ratings yet
Correlation Coefficient
14 pages
Inbound 1633146381504097453
No ratings yet
Inbound 1633146381504097453
36 pages
06 Correlation and Regression
No ratings yet
06 Correlation and Regression
63 pages
Correlation
No ratings yet
Correlation
38 pages
DSUR I Chapter 06 (Correlation)
No ratings yet
DSUR I Chapter 06 (Correlation)
42 pages
DAV Practical 4
No ratings yet
DAV Practical 4
6 pages
Using Statistical Techniq Ues in Analyzing Data
100% (1)
Using Statistical Techniq Ues in Analyzing Data
40 pages
Correlation
No ratings yet
Correlation
20 pages
WINSEM2023 WINSEM2023 24 - BITE304L - TH - VL2023240503885 - DA 1 QP - KEY 01 22 - Reference Material I
No ratings yet
WINSEM2023 WINSEM2023 24 - BITE304L - TH - VL2023240503885 - DA 1 QP - KEY 01 22 - Reference Material I
57 pages
Unit 2
No ratings yet
Unit 2
17 pages
Correlation & Regression
100% (1)
Correlation & Regression
23 pages
Pearson and Correlation
No ratings yet
Pearson and Correlation
8 pages
Correlation
No ratings yet
Correlation
59 pages
May 8 2023
No ratings yet
May 8 2023
39 pages
Lesson 10 Relationship Between Variables
No ratings yet
Lesson 10 Relationship Between Variables
85 pages
L3 Correlation
No ratings yet
L3 Correlation
101 pages
Chapter 14 Summary
No ratings yet
Chapter 14 Summary
2 pages
Allahabad High Court RO ARO Exam Preparation Guide
No ratings yet
Allahabad High Court RO ARO Exam Preparation Guide
11 pages
Correlation
No ratings yet
Correlation
8 pages
Lecture 10 Correlation
No ratings yet
Lecture 10 Correlation
32 pages
202003241550009941rajeev Pandey Correlation Research
No ratings yet
202003241550009941rajeev Pandey Correlation Research
87 pages
Correlation and Regration
No ratings yet
Correlation and Regration
57 pages
Correlation and Dependence: Navigation Search
No ratings yet
Correlation and Dependence: Navigation Search
7 pages
Correlation Rank - Correlation Curve - Fitting For Student
No ratings yet
Correlation Rank - Correlation Curve - Fitting For Student
26 pages
11 Correlation
No ratings yet
11 Correlation
28 pages
Correlations
No ratings yet
Correlations
30 pages
Correlation Constant
No ratings yet
Correlation Constant
23 pages
STAT22209 - Chapter 01-Correlation Analyisis - 2022
No ratings yet
STAT22209 - Chapter 01-Correlation Analyisis - 2022
53 pages
Correlation & Regression-I
No ratings yet
Correlation & Regression-I
43 pages
Correleation and PMCC
No ratings yet
Correleation and PMCC
12 pages
Correlation
No ratings yet
Correlation
6 pages
Sq3r Méthode
No ratings yet
Sq3r Méthode
1 page
Manuals Ecdis-2000 HAGENUK PDF
100% (1)
Manuals Ecdis-2000 HAGENUK PDF
197 pages
Correlation: By: Nathaniel S. Antero
No ratings yet
Correlation: By: Nathaniel S. Antero
13 pages
Astrology & Palmistry Vol 1
100% (4)
Astrology & Palmistry Vol 1
42 pages
FODS Unit-3
No ratings yet
FODS Unit-3
25 pages
Correlation & Regression
No ratings yet
Correlation & Regression
26 pages
Correlation New
No ratings yet
Correlation New
37 pages
Correlation Analysis - Final
No ratings yet
Correlation Analysis - Final
40 pages
Cce 68 D 4 CC 4
No ratings yet
Cce 68 D 4 CC 4
28 pages
Correlation 26-2-24
No ratings yet
Correlation 26-2-24
16 pages
Correlation Notes
No ratings yet
Correlation Notes
15 pages
MRS - Diana-Correlation Analysis-Notes
No ratings yet
MRS - Diana-Correlation Analysis-Notes
16 pages
Presentation On: Correlation and Rank Correlation: Submitted To
100% (3)
Presentation On: Correlation and Rank Correlation: Submitted To
23 pages
Correlation Coefficient
No ratings yet
Correlation Coefficient
8 pages
Correlation
No ratings yet
Correlation
14 pages
Psych Assess Chap 4
No ratings yet
Psych Assess Chap 4
5 pages
Chapter 6 Correlation and Regression
No ratings yet
Chapter 6 Correlation and Regression
29 pages
Correlation
No ratings yet
Correlation
46 pages
Up-Ending The Tea Table: Race and Culture in Mary Zimmerman's The Jungle Book
100% (2)
Up-Ending The Tea Table: Race and Culture in Mary Zimmerman's The Jungle Book
23 pages
LESSON 3 - Correlation Analysis
No ratings yet
LESSON 3 - Correlation Analysis
32 pages
Correlation and Its Significance
No ratings yet
Correlation and Its Significance
15 pages
Concepts of God in The Traditional Faith of The Meru People of Kenya - Marete Dedan Gitari
No ratings yet
Concepts of God in The Traditional Faith of The Meru People of Kenya - Marete Dedan Gitari
138 pages
Chapter 8 - PSYC 284
No ratings yet
Chapter 8 - PSYC 284
7 pages
Food and Human Nutrition For Ordinary Secondary Education
100% (1)
Food and Human Nutrition For Ordinary Secondary Education
42 pages
Chapter - Six
No ratings yet
Chapter - Six
8 pages
Oe Statistics Notes
No ratings yet
Oe Statistics Notes
32 pages
Full Download Shakespeare Among The Courtesans Prostitution Literature and Drama 1500 1650 1st Edition Duncan Salkeld Ebook PDF & DOCX All Chapters
No ratings yet
Full Download Shakespeare Among The Courtesans Prostitution Literature and Drama 1500 1650 1st Edition Duncan Salkeld Ebook PDF & DOCX All Chapters
67 pages
How To Write A Recommendation Report
No ratings yet
How To Write A Recommendation Report
3 pages
How To Pull Women - Ed West
No ratings yet
How To Pull Women - Ed West
122 pages
Correlation: Some Commonly Used Jargons
No ratings yet
Correlation: Some Commonly Used Jargons
19 pages
Reach For The Top Extra Question Answer Class 9 (2) (1)
No ratings yet
Reach For The Top Extra Question Answer Class 9 (2) (1)
29 pages
All My Sons
No ratings yet
All My Sons
17 pages
DLL-WK 1-LC 1
No ratings yet
DLL-WK 1-LC 1
12 pages
Operators and Statements in VHDL
No ratings yet
Operators and Statements in VHDL
24 pages
Thesis Lu Dai
No ratings yet
Thesis Lu Dai
85 pages
DIGPRA2 Prac Tutorial Prac1
No ratings yet
DIGPRA2 Prac Tutorial Prac1
10 pages
3 Writing Business Letters and Memos
No ratings yet
3 Writing Business Letters and Memos
22 pages
Unit I
No ratings yet
Unit I
9 pages
Hadith Handwriting and Colouring Book (Girls)
No ratings yet
Hadith Handwriting and Colouring Book (Girls)
18 pages
Dudaram PSCP Assignment
No ratings yet
Dudaram PSCP Assignment
26 pages
Various Kinds of Ownership
No ratings yet
Various Kinds of Ownership
7 pages
Amanda Mclean
No ratings yet
Amanda Mclean
2 pages
Depressed Skull Fracture
No ratings yet
Depressed Skull Fracture
15 pages
Comprehension Strategy Summarize The Text John Munro
No ratings yet
Comprehension Strategy Summarize The Text John Munro
15 pages
Spania
No ratings yet
Spania
7 pages
Diagram Power Supply: Electronic Devices and Circuit Theory, 10/e Robert L. Boylestad and Louis Nashelsky
No ratings yet
Diagram Power Supply: Electronic Devices and Circuit Theory, 10/e Robert L. Boylestad and Louis Nashelsky
10 pages
Something Just Like This
No ratings yet
Something Just Like This
4 pages
A. Generic Name: B. Brand Name: C. Date and Time Ordered: D. Classification: E. Dosage, Route, Frequency: F. Mechanisms of Action
No ratings yet
A. Generic Name: B. Brand Name: C. Date and Time Ordered: D. Classification: E. Dosage, Route, Frequency: F. Mechanisms of Action
2 pages
ACCCOB1 APPROVED Mechanics and Rubric For VIDEO Final Exam Group Project May 2020 PDF
No ratings yet
ACCCOB1 APPROVED Mechanics and Rubric For VIDEO Final Exam Group Project May 2020 PDF
3 pages
Vishal Solanki
No ratings yet
Vishal Solanki
2 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)

Data Science With Python Relationship

Uploaded by

Data Science With Python Relationship

Uploaded by

DATA SCIENCE WITH PYTHON

3/22/19 – content to be used for explanation/reference educational purposes, Slide no. 3

• For continuous variables

• linear or nonlinear relationships

3/22/19 – content to be used for explanation/reference educational purposes, Slide no. 4

• Generates descriptive statistics that

• Analyzes both numeric and object

• The output will vary depending on what

3/22/19 – content to be used for explanation/reference educational purposes, Slide no. 5

• t-Tests Comparing Two Groups

3/22/19 – content to be used for explanation/reference educational purposes, Slide no. 6

3/22/19 – content to be used for explanation/reference educational purposes, Slide no. 7

x-bar represents the mean of the X and

- Covariance values are not standardized. COV values ranges from -∞ to +∞

3/22/19 – content to be used for explanation/reference educational purposes, Slide no. 8

• In other words, it’s a measure of how things are related.

• The study of how variables are correlated is called correlation analysis.

Correlation coefficients are standardized.. value is always between –1 and 1

Correlation does not have units.

3/22/19 – content to be used for explanation/reference educational purposes, Slide no. 10

• The formula used to calculate r is

3/22/19 – content to be used for explanation/reference educational purposes, Slide no. 11

Correlation can have a value:

• 1 is a perfect positive correlation

3/22/19 Slide no. 12

- Relationship between the predictors and target variables should be strong

3/22/19 – content to be used for explanation/reference educational purposes, Slide no. 13

• While ρ is unitless, its square is interpreted at the proportion of variance of Y explained by X.

• ρ = -0.65 implies that (-0.652)*100 = 42% of variation in Y can be explained by X.

3/22/19 – content to be used for explanation/reference educational purposes, Slide no. 14

Types of research questions a Pearson correlation can examine:

• special case of Pearson ρ applied to ranked (sorted) variables.

• Unlike Pearson, Spearman's correlation is not restricted to linear relationships.

3/22/19 – content to be used for explanation/reference educational purposes, Slide no. 16

3/22/19 – content to be used for explanation/reference educational purposes, Slide no. 17

• based on counts of concordant and discordant pairs of observations.

• observations A and B, Variables X, Y

difference of the values for Variable Y

3/22/19 – content to be used for explanation/reference educational purposes, Slide no. 18

3/22/19 – content to be used for explanation/reference educational purposes, Slide no. 19

• The test takes the 2 data samples as arguments and returns

3/22/19 – content to be used for explanation/reference educational purposes, Slide no. 20

• binary classification problem,

• each observation can be classified either into class C1 or class C2,

3/22/19 – content to be used for explanation/reference educational purposes, Slide no. 21

• Does the salary of a person impact his chances to get a loan ?

• Calculate mean and variance

• t- Statistics is computed using:

• calculate the t-Statistic for each feature,

3/22/19 – content to be used for explanation/reference educational purposes, Slide no. 22

3/22/19 – content to be used for explanation/reference educational purposes, Slide no. 23

• when plotted, results in a normal distribution, bell-shaped distribution curve.

3/22/19 – content to be used for explanation/reference educational purposes, Slide no. 24

• for use with variables measured on a categorical (nominal or ordinal) features.

• When to Use the Chi-Square Goodness of Fit Test

3/22/19 Slide no. 25

• allows an analysis of whether there is a relationship between two categorical variables.

• Null Hypothesis: The two categorical variables are independent.

• The chi-square test statistic is calculated by using the formula:

• O represents the observed frequency.

3/22/19 Slide no. 26

3/22/19 – content to be used for explanation/reference educational purposes, Slide no. 27

• The analysis of variance (ANOVA) can be thought of as an extension to the t-test.

• H0: No difference between means, i.e. ͞x1 = ͞x2 = ͞x3

3/22/19 – content to be used for explanation/reference educational purposes, Slide no. 28

3/22/19 – content to be used for explanation/reference educational purposes, Slide no. 29

• this is a 1 factor design.

3/22/19 – content to be used for explanation/reference educational purposes, Slide no. 30

You might also like