0% found this document useful (0 votes)
21 views

Data Mining CSE-443: Ayesha Aziz Prova Lecturer, Dept. of CSE CWU

This document discusses various methods for graphically displaying statistical data, including histograms, boxplots, quantile plots, scatter plots, and Loess curves. It provides examples and explanations of each method. Histograms display the frequency of data values using bars. Boxplots show the minimum, three quartiles (25th, 50th, 75th), and maximum of a distribution. Quantile plots pair data values with their percentile. Scatter plots show the relationship between two variables, and Loess curves add smooth lines to scatter plots. The document also covers correlation analysis and the correlation coefficient.

Uploaded by

Dipty Sarker
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Data Mining CSE-443: Ayesha Aziz Prova Lecturer, Dept. of CSE CWU

This document discusses various methods for graphically displaying statistical data, including histograms, boxplots, quantile plots, scatter plots, and Loess curves. It provides examples and explanations of each method. Histograms display the frequency of data values using bars. Boxplots show the minimum, three quartiles (25th, 50th, 75th), and maximum of a distribution. Quantile plots pair data values with their percentile. Scatter plots show the relationship between two variables, and Loess curves add smooth lines to scatter plots. The document also covers correlation analysis and the correlation coefficient.

Uploaded by

Dipty Sarker
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 21

DATA MINING

CSE-443

Ayesha Aziz Prova


Lecturer,
Dept. of CSE
CWU
GRAPHIC DISPLAYS OF BASIC STATISTICAL
DESCRIPTIONS

 Histogram
 Boxplot: (covered before)
 Quantile plot: each value xi is paired with fi indicating that approximately 100 fi %
of data are  xi
 Quantile-quantile (q-q) plot: graphs the quantiles of one univariant distribution
against the corresponding quantiles of another
 Scatter plot: each pair of values is a pair of coordinates and plotted as points in the
plane
 Loess (local regression) curve: add a smooth curve to a scatter plot to provide better
perception of the pattern of dependence
BOXPLOT ANALYSIS

• Five-number summary of a distribution:


Minimum, Q1, M, Q3, Maximum

• Boxplot
– Data is represented with a box
– The ends of the box are at the first and third quartiles,
i.e., the height of the box is IRQ
– The median is marked by a line within the box
– Whiskers: two lines outside the box extend to
Minimum and Maximum
BOXPLOT ANALYSIS
HISTOGRAM ANALYSIS
 Graph displays of basic statistical class descriptions
 Frequency histograms
 A univariate graphical method
 Consists of a set of rectangles that reflect the counts or
frequencies of the classes present in the given data
QUANTILE PLOT
 Displays all of the data (allowing the user to assess both the overall behavior
and unusual occurrences)
 Plots quantile information
 For a data xi data sorted in increasing order, fi indicates that
approximately 100 fi% of the data are below or equal to the
value xi
QUANTILE-QUANTILE (Q-Q) PLOT

 Graphs the quantiles of one univariate distribution against the corresponding


quantiles of another
 Allows the user to view whether there is a shift in going from one
distribution to another
SCATTER PLOT
 Provides a first look at bivariate data to see clusters of points, outliers, etc
 Each pair of values is treated as a pair of coordinates and plotted as points in
the plane
 A scatter plot (or scatter diagram) is used to show the relationship between two variables
 Correlation analysis is used to measure strength of the association (linear relationship) between two variables
 The pattern of data is indicative of the type of relationship between your two variables:

 positive relationship
 negative relationship
 no relationship

SCATTER PLOT
POSITIVELY AND NEGATIVELY CORRELATED
DATA
 Positive Correlation: The correlation is said to be positive correlation if the values of
two variables changing with same direction. Example: Height & weight

18

16

14

12

10
Height in CM

0
0 10 20 30 40 50 60 70 80 90
Age in Weeks
POSITIVELY AND NEGATIVELY CORRELATED
DATA
 Negative Correlation: The correlation is said to be negative correlation when
the values of variables change with opposite direction.
LINEAR CORRELATION
Linear relationships Curvilinear relationships

Y Y

X X

Y Y

X X
LINEAR CORRELATION

Strong relationships Weak relationships

Y Y

X X

Y Y

X X
LINEAR CORRELATION
No relationship

X
CORRELATION COEFFICIENT “R”
A measure of the strength and direction of a linear
relationship between two variables

The range of r is from –1 to 1.

–1 0 1
If r is close to –1 If r is close to If r is close to
there is a strong 0 there is no 1 there is a
negative linear strong
correlation. correlation. positive
correlation.
APPLICATION
Final
Absences Grade
x y
95
90 8 78
85
2 92
Final Grade

80
75
70 5 90
65
60
55
12 58
50
45 15 43
40
9 74
0 2 4 6 8 10 12 14 16
Absences 6 81
X
COMPUTATION OF R
x y xy x2 y2
1 8 78 624 64 6084
2 2 92 184 4 8464
3 5 90 450 25 8100
4 12 58 696 144 3364
5 15 43 645 225 1849
6 9 74 666 81 5476
7 6 81 486 36 6561
57 516 3751 579 39898
COMPUTATION OF R

r = 0.886 → relatively strong negative linear


association between x and y
LOESS CURVE
 Adds a smooth curve to a scatter plot in order to provide better perception
of the pattern of dependence
 Loess curve is fitted by setting two parameters: a smoothing parameter,
and the degree of the polynomials that are fitted by the regression
THANKS

20
Ayesha Aziz Prova,
Lecturer, CSE, CWU
Any Question???

You might also like