Data Mining CSE-443: Ayesha Aziz Prova Lecturer, Dept. of CSE CWU
Data Mining CSE-443: Ayesha Aziz Prova Lecturer, Dept. of CSE CWU
CSE-443
Histogram
Boxplot: (covered before)
Quantile plot: each value xi is paired with fi indicating that approximately 100 fi %
of data are xi
Quantile-quantile (q-q) plot: graphs the quantiles of one univariant distribution
against the corresponding quantiles of another
Scatter plot: each pair of values is a pair of coordinates and plotted as points in the
plane
Loess (local regression) curve: add a smooth curve to a scatter plot to provide better
perception of the pattern of dependence
BOXPLOT ANALYSIS
• Boxplot
– Data is represented with a box
– The ends of the box are at the first and third quartiles,
i.e., the height of the box is IRQ
– The median is marked by a line within the box
– Whiskers: two lines outside the box extend to
Minimum and Maximum
BOXPLOT ANALYSIS
HISTOGRAM ANALYSIS
Graph displays of basic statistical class descriptions
Frequency histograms
A univariate graphical method
Consists of a set of rectangles that reflect the counts or
frequencies of the classes present in the given data
QUANTILE PLOT
Displays all of the data (allowing the user to assess both the overall behavior
and unusual occurrences)
Plots quantile information
For a data xi data sorted in increasing order, fi indicates that
approximately 100 fi% of the data are below or equal to the
value xi
QUANTILE-QUANTILE (Q-Q) PLOT
positive relationship
negative relationship
no relationship
SCATTER PLOT
POSITIVELY AND NEGATIVELY CORRELATED
DATA
Positive Correlation: The correlation is said to be positive correlation if the values of
two variables changing with same direction. Example: Height & weight
18
16
14
12
10
Height in CM
0
0 10 20 30 40 50 60 70 80 90
Age in Weeks
POSITIVELY AND NEGATIVELY CORRELATED
DATA
Negative Correlation: The correlation is said to be negative correlation when
the values of variables change with opposite direction.
LINEAR CORRELATION
Linear relationships Curvilinear relationships
Y Y
X X
Y Y
X X
LINEAR CORRELATION
Y Y
X X
Y Y
X X
LINEAR CORRELATION
No relationship
X
CORRELATION COEFFICIENT “R”
A measure of the strength and direction of a linear
relationship between two variables
–1 0 1
If r is close to –1 If r is close to If r is close to
there is a strong 0 there is no 1 there is a
negative linear strong
correlation. correlation. positive
correlation.
APPLICATION
Final
Absences Grade
x y
95
90 8 78
85
2 92
Final Grade
80
75
70 5 90
65
60
55
12 58
50
45 15 43
40
9 74
0 2 4 6 8 10 12 14 16
Absences 6 81
X
COMPUTATION OF R
x y xy x2 y2
1 8 78 624 64 6084
2 2 92 184 4 8464
3 5 90 450 25 8100
4 12 58 696 144 3364
5 15 43 645 225 1849
6 9 74 666 81 5476
7 6 81 486 36 6561
57 516 3751 579 39898
COMPUTATION OF R
20
Ayesha Aziz Prova,
Lecturer, CSE, CWU
Any Question???