0% found this document useful (0 votes)
21 views6 pages

Lecture 5 - Scatter Plots

Uploaded by

Samuel LI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
21 views6 pages

Lecture 5 - Scatter Plots

Uploaded by

Samuel LI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 6
2181S Introduction to Data Mining Lecture 5 Scatter Plots | Scatter Plots Scatter plot (or scatter diagram) shows the position of all of the cases in an x-y coordinate system. The relationship between two interval variables can be identified from Scatter graph. — The independent variable is plotted on the x-axis, or the horizontal axis. — The dependent variable is plotted on the y-axis, or the vertical axis A dot in the body of the chart represents the intersection of the data on the x and y-axis Use of Scatter Plots + The purpose of Scatter Plot: — Perform regression analysis and Correlation checks + With the information obtained from the above techniques, can answer the following questions that identify the relationships between variables: — Is there a relationship between the two variables? (The two variables usually have range of values with scale or ordinal measure) — How strong is the relationship? — What is the direction of the relationship? Finding Out the Relationship — + The overall pattern of data points summarizes the nature of the relationship between the two variables. + The clarity of the pattem formed by the data points can be enhanced by drawing a straight line through the cluster such that the line touches every dot or comes as close to doing so as possible. + This summarizing line is called “regression line.” — It finds the line that best predicts Y from values of X Sample Scatter Plot Direction of Relationship The pattern of the points on the scatter plot + Positive vs Negative Relations shows the relationship between the variables. — The dependent variable increases when the The regression line independent variable increases > Positive relation makes it easier — The dependent variable decreases when the to understand the independent variable increases > Negative relation relationship. ip i s AS i y i ry | i i : | A » a : Fostive Relation] [M@awvercawo] =, Evidence of Relationship | Strength of Relationship The angle between the regression line and the > ‘The relationship between the two variables can horizontal x-axis provides evidence of a be drawn from the scatter plot relationship. If there is no relationship, the — A horizonial or vertical line means no relation | regression line will be parallel to the x or y-axis. — Avery flat or steep line usually means slight relation [> Perfect Relator : i [eon eee ET However, by looking at the angle of the regression line, you can be easily misled by the scaling of the graph! — Next slide shows an example Effect of Scaling on Scatter Plot Slope/Intercept of Regression Line | + The scale of the x and y-axis can affect the view | + The value of the slope b for the regression line | of the scatter plot! - can be found by the formula: | | 7 La- Ladin | | it ee = : 2 Satay ln | | spe |= where (3, y) is the value of data point i, and nis the | [ivaetaitiwepbtearefem) = | number sf data points aay | | aes cece meren) os + The y-intercept a can be found by: LPL | | | should be the same. Sealing : 9 . | eangveyouthewongvewl | "ss yp ww r Equation of the Regression Line | Important Note | | + Amore objective measure about the relation + If we interchange the X and Y data, the scatter | | between the two variables is to observe the plot will be transposed. However, the regression | equation of the regression line line is notan exact transpose. — We know the line can be expressed as y = a + bx ~ The transpose of y = 1.6x + 1 is y = 0.625x - 0.625, |= aand b are constants, where b is the slope and a is but the ragression lin is y = 0.597x ~ 0.4627 | the y-intercept (value of y is a when x is 0) — The slope gives a more objective measure of the | steepness of the graph | — Positive relation has positive slope, negative relation has negative slope What is R2? + You may notice R? appears in Excel and SPSS — Itis a measure of how well the values are fitted to the regression line (goodness of fit) — A value closer to 1 means the data points are closer to the regression line. —Avalue 1 means all the data points are on the regression line. + In fact, Ris the square of the correlation coefficient ~ R has the value between O and 1 Correlation Coefficient + The (Pearson) correlation coefficient (also written as r) has been talked about in Lecture 4: ~ Correlation(x, y): om a2, See a0i-m) ee vihere iB -H, 2 nol (3 ar tha sev of and ¥ resp. Correlation coefficient has a value between -1 and +4 Meaning of Correlation Coefficient Correlation Coefficient Meaning No correlation Slight correlation Low/Small correlation ‘Moderate to Substantial 44-070 correlation 0.71-0.90 Strong/High correlation 0.91-0.99 Near Perfect correlation 1.00 Perfect correlation + Similar for negative values What do all these things mean? + Correlation coefficient rand R? give the relation between 2 variables — ralso shows the direction of relationship — R? closer to 1 > more relation between 2 variables + The regression line is the “best fit’ line of the points on the scatter plot — We would also like to know the average error of each point with respect to the regression line How to find the Error? + The y ertor for every data point is: — For every data point, get the x-coordinate. Find the difference of the expected y value with the actual y value ~ Example: For the data expected actual y=2, Summing the Errors + Note that the summation of y errors for every data point in the scatter plot is always 0 Standard Error of Regression Line | + The Standard Error S, y of the regression line is defined by the formula: 7 yom Saf n-2 In the previous example, the standard error of the regression line is sqrt(1.20/3) = 0.632 Question: What is the standard error for the transpose of the scatter plot (i.e., interchanging X and Y)? Transpose of the Scatter Plot | + The transpose of the scatter plot has regression | line with equation y = 0.697% — 0.4627 | x_[Expected y (y)] Actual y | Error | Error? i 7 ‘ 1 26 20 06! 0.36 2 1 : cage 2 42 5.0 08] 0.64 5 2 ‘ 3 58 60 -0.2| 0.04 6 3 2 [sf 7a 70} 04] 076 Tope : 5 a0 | 90 0.0/ 0.00 9 | 5 ‘ 5 Total oo] 1.20 Standard Error x_[Expectedy (y)| Actualy | Eror | Erro® 2 [07313 1.0 _|-0.2687| 5 2.5223 2.0 | 0.5223] 0.2728| 6 | 31193 3.0 | 0.1193) 0.0142 7 | 3.7163 4.0 |-0.2837| 0.0805 9 | 4.9103 5.0 _|-0.0897| 0.0080 Total 0.0] 0.447 + Standard error is sqrt(0.4477/3) = 0.3863

You might also like