Chapter 02 - Fundamentals of Data Visualization
Chapter 02 - Fundamentals of Data Visualization
1
Learning Outcomes
At the end of this lesson you will be able to,
1. Apply basic statistics in analyzing business problems.
2
Data Visualization Techniques based on Variables
Variable
Type
One One
Categorical Numerical
Variable Variable
3
One Way Frequency Table
Categorical Variable
4
Numerical Variable
5
Bar Chart
• In bar charts, each bar will represent each category level. These bars can be
drawn in vertically or horizontally.
6
• There are several types of bar charts. Some of them are,
1. Simple bar chart
2. Multiple/Clustered bar chart
3. Stacked/ Component bar chart
4. Percentage component bar chart
7
1. Simple Bar Chart
8
2. Multiple/Clustered Bar Chart
9
3. Stacked/Component Bar Chart
10
4. Percentage Component Bar Chart
11
Example 01
Following table represents the sales of newspapers in four years. Plot a
bar chart for the following data set:
12
Example 02
Plot a bar chart for the following data set:
13
Pie Chart
15
Histogram
• First, divide the given data set into suitable number of classes
(intervals/categories) which have the same width.
• Classes with their frequencies (counts) is called a frequency
distribution.
• Frequency, relative frequency or percentages can be used for the y
axis while x axis will represent the classes of the variable.
• In histograms, each bar will represent each class and length of the bar
will proportional to the frequency of respective class.
• In histograms, bars are drawn adjacent with each other (No gaps
between two bars).
16
Example 04
Draw histogram for the following data set:
42 74 40 60 82 115 41 61 75 83 63
53 110 76 84 50 57 78 77 63 65 95
68 69 104 80 79 79 54 73 59 81 100
56 49 77 90 84 76 42 64 69 70 80
72 50 79 52 103 96 51 86 73 94 71
17
Step 01:
18
Step 02:
• Calculate the range of the data set.
• Range is the Difference between the largest and smallest
observation.
19
Step 03: Estimate the number of class intervals
20
Step 04:Estimate the Class Width
𝑹𝒂𝒏𝒈𝒆
𝑪𝒍𝒂𝒔𝒔 𝑾𝒊𝒅𝒕𝒉 =
𝑵𝒐 𝒐𝒇 𝑰𝒏𝒕𝒆𝒓𝒗𝒂𝒍𝒔
21
Step 05: Class intervals
22
Step 06: Frequency table
24
Step 08: Mid point
25
Class Class Class Mid Frequency
Interval boundaries values ( xi ) ( fi )
26
Frequency Polygon
• This is another way of displaying data graphically.
Step 02: Plot the frequency over the class mid point.
Step 03: Extend one class left to the first mid point and one class right to the last mid point.
Step 04: Then connect the mid points by straight lines, so that the polygons begin and ends
with frequency of zero.
27
Ogive Curve (Cumulative Frequency Curve)
• There are two types of ogive
1. Less than ogive
2. More than ogive
29
Example 05
Draw Frequency Polygon, Less than Ogive and More than Ogive for
the data set in example 04.
30
Box Plot
• First identify the five number summary & outliers for the variable.
31
• A box is used to represent the middle half of the data.
32
Outliers
• Outliers should be identify before draw the box-plot.
• Values outside the range are considered as outliers and marked with
asterisks (*).
33
• 𝑄1, Median, 𝑄3 are marked as a box.
• Minimum & maximum values which are not outliers, will be end point
for whiskers of the box plot.
34
Example 06
Draw box plot for the following data set:
53, 43, 30, 38, 30, 42, 12, 46, 39, 37, 34, 46, 32, 18, 5
35
Stem and Leaf Plot
• Stem and leaf plot is useful when the data set is very small.
• Next, each data value split into two parts known as “stem” and “leaf.
• The other digits to the left of the “leaf” form the “stem”.
36
Example 07
Draw stem and leaf plot for the following data set:
53, 43, 30, 38, 30, 42, 12, 46, 39, 37, 34, 46, 32, 18, 5
37
In Next Chapter…
• Descriptive Statistics will be discussed
38
Thank You
Rajika Gunarathne
Email: [email protected] 39