0% found this document useful (0 votes)
9 views

Chapter1 Introduction Data visualization

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Chapter1 Introduction Data visualization

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 73

Data visualization

Chapter 01

Introduction of Data
Visualization
Tiết Gia Hồng [email protected]

Khoa Công Nghệ Thông Tin


Trường Đại Học Khoa Học Tự Nhiên
Content
• What is visualization?
• Visualization Areas
• Visualization workflow
• Simple charts
• Multiple series
• Confidence Intervals

2
Data visualization

• Definition 2
“Computer-based visualization systems provide visual
representations of datasets designed to help people carry
out tasks more effectively”
Tarmara Munzner, 2014

• Definition 1
“The use of computer-supported, interactive, visual
representations of abstract data to amplify cognition” [1]

3
[1] Information Visualization Using vision to think
What’s the difference here?
Illustration
a visual representation (a
picture or diagram) that is
used to make some subject
more pleasing or easier to
understand

Visualization
Visualization communicates
data.
•From abstract things
(invisible) to visible things
•Visualization is about
MAPPING. 4
Visualization
• Visualization
▪ The static or interactive visual representation of
abstract or spatial data to reinforce human
cognition

5
Examples

The Commercial and Political Atlas in 1786 – William Playfair 6


Examples

The decimation of Napoleon's army during his Russian campaign in 1812 –


Charles Mansard
7
Examples

Florence Nightingale’s “coxcomb diagrams” of British casualties in the8


Crimean War (1853-1856)
Examples
Washington DC Metro Map

9
Examples

China Air Pollution Map (Amar Toor, 2015)


10
Examples

Nitrogen Dioxide Reduction Across the United States 11


Visualization Areas
There are many areas under “visualization”
• Data visualization
• Information visualization
• Scientific visualization
• Algorithm visualization
• Software visualization
•…

12
Data visualization
•Data visualization is to communicate
information using graphical representations
▪ Communicate using statistical graphs, plots

13
Information visualization
•Information visualization is the study of visual
representation of abstract data to reinforce
human cognition

14
Scientific visualization
•Scientific visualization is primarily concerned
with the visualization of three-dimensional
phenomena, where the emphasis is on the realistic
rendering of volumes, surfaces, illumination
sources, etc…, perhaps with a dynamic (time)
component

15
Active brain visualization Edward Tejnil, NASA/Ames
Algorithm visualization
•Algorithm visualization shows all the states of
the data structures during the execution of an
algorithm
▪ Usually uses animation
•A cryptography algorithm visualization helps
the user to see all the procedures

16
Software visualization
•Software visualization is the practice of
creating visual tools to map software elements
and/or display various aspects of the source code
of a software system
▪ Show architecture, running behavior, development
process

17
Why is data visualization important?

• Graphics and images are easier to digest


▪ Quick and easy way to convey concepts
• A good visualization
▪ Show potential connection, relationship, … which are
not as obvious in non-visual quantitative data

18
Examples
•1854 cholera outbreak
in London
▪ Killed more than 600
people
Broad Street •John Snow – a physician
▪ Used data visualization
to show that cholera

Drawn and lithographed by Charles Cheffins


19
Examples

20
Examples

21
Visualization workflow

Best data
What data do I What do I
start
have? want to know?

Formulate
Try again our question

Charts
Does it make What vis method
sense? I should use?

22
Visualization workflow
• What data do I have?
▪ Very difficult task, because data sets are very difficult
to collect
o EX: collect student grade records to study grade
distributions of all CS courses
▪ Never ever form a visual first before getting the needed
data

23
Visualization workflow
• What do you want to know?
▪ Data vis helps understand and explore data sets.
However, we must specify what we actually want to
know.
▪ EX: the journalist wanted to know
“How big a city would have to be to house the world’s 7
billion people if it were as dense as … ”
=> How would we answer this question?

24
Visualization workflow

Density: 54,156 per sq mile Density: 17,246 per sq mile


127,930 sq miles 397,975 sq miles

Density: 28,256 per sq mile Density: 14,550 per sq mile


250,404 sq miles 553,745 sq miles

Density: 21,646 per sq mile Density: 3,842 per sq mile


379,069 sq miles 1,769,085 sq miles 25
Visualization workflow
• What vis method should I use?
▪ The chosen method depends on the data and the
questions
o Sometimes there are many methods available
o EX: the same data set can be presented as:
➢ Table
➢ Line graph
➢ Bar chart
➢…
o Use multiple charts to compare all variables

26
Visualization workflow
• Does it make sense?
▪ See something interesting such as a pattern, trend,
outlier,…
▪ Ask ourselves this questions
o Does what we see make any sense?
o Why does it make sense?
▪ Try to explain this “sense” and/or explore further
▪ If our visuals do not reveal anything
important/interesting
o Ask ourselves: what went wrong?

27
Some simple charts
•Software tools (e.g., excel) offer many different
graphics for charting.
•The right table shows the grade distribution of a
class
▪ Data values are categorized into 8 categories:
o A
o AB
o B
o BC
o CD
o D
o F

28
Some simple charts
• Lines are the most commonly used ones.
• Scale is usually a tricky issue

29
Some simple charts
• An area chart is just a different form of a line chart.
▪ Scale and large area of the same color can be problems

30
Some simple charts
•The bar chart in its various forms is another form
of a line chart.

31
Some simple charts
•The point/ bubble chart is yet a different form of a
line chart

32
Some simple charts
• The pie chart is different from the line chart.
▪ Hard to see thin slices

WhereisD?Theuseofcolorscan
beaseriousproblem! 33
Some simple charts
•The doughnut chart is a variation of the bar
chart.

34
Some simple charts
•The radar chart combines the pie chart and the
pie chart

35
Multiple series
•Challenges occur when building a chart for two
or more series:
▪ Occlusion can be a big problem
▪ The meaning of each series must be consistent

36
Multiple series
• Lines are always the easiest .
• But too many lines can be a problem

37
Multiple series
• Bars share the problems of the lines.
• A very busy bar chart can be difficult to read.

38
Multiple series
•Pie charts are for single series. Use doughnut for
multiple series

39
Multiple series
•Sometimes a radar chart can be effective for
comparison

40
Multiple series
• This is an area chart
▪ Occlusion problem

41
Multiple series
•One may also stack bars together. What does this
mean? It depends.

42
Multiple series
• A typical bar chart is very busy

43
Multiple series
•A typical line chart is also busy and difficult to
read

44
Multiple series
• Scatter plots are not better

45
Multiple series
• Radar chart are busy too

46
Multiple series
• Now try the scaled bar chart. Isn’t it easier to read?

47
Multiple series
• Here is a variation

48
Mosaic plot
•Mosaic plots (Mekko chart) is a plot for visualizing
the relationship of two or more categorical
variables
•It may be considered as a combination of 1 100%
stacked column chart and a 100% stacked
horizontal bar chart, each of which uses a different
variable

49
Mosaic plot
• Consider a table first
• Divide a square and each bar according to the
proportion of the table

50
Mosaic plot
• This is again a 2D table with two factors
▪ Hair color
▪ Eye color

51
Mosaic plot

52
Mosaic plot

53
Mosaic plot
•The shading is designed to visualize the
differences between the observed frequency and
the expected frequency
•The measure of difference used is often the
Pearson standardized residuals, which can be
computed in most systems
•Some systems may report the p-value. A small p-
value (i.e., < 0.05) indicates that strong evidence
against the null hypothesis (i.e., observed =
expected)

54
Mosaic plot

55
Treemap
•Treemaps are very useful for showing large
amounts of hierarchically structured data
•The root receives a big rectangle
•The rectangle of a node is split into smaller
rectangles, one for that node’s child node
•The size of each rectangle has an area
proportional to the amount of data it represents

56
Treemap
• Consider the following tree
▪ Ignore the “node size” issue

57
Treemap

58
Treemap

59
Treemap

60
Treemap

61
Treemap

62
Treemap

63
Parallel coordinates
•A data table has multiple rows each of which has
multiple attributes
•Suppose each row has attributes (x1, x2,…, xn).
Then, n vertical lines are drawn, one for each
attribute
•A parallel coordinate plot maps each row in the
data table as a line
▪ Each attribute value of a row is represented by a point
on the line corresponding to that attribute

64
Parallel coordinates
•The values of a parallel coordinate plot are always
normalized
•Each point along the horizontal axis is 0% and the
highest value in that column is set 100% along the
vertical axis
•Therefore, do not compare the “heights” because
the scale of the columns is completely separated
•Each column in the plot only shows the portion of
each value of the column in the table

65
Parallel coordinates
• Consider the following table
• For each food type, we want to plot a profile of
how the carbohydrates are distributed

66
Parallel coordinates

67
68
Multiple series
•When you have a set of numbers, the first thing
for you have to do is calculate the following:
▪ Mean: m
▪ Standard deviation: s
▪ Standard error of the mean: S/ n, where n is the
same size
▪ Median
▪ Quartiles
▪ Confidence interval at 95%:  = 0.05

69
Confidence Intervals
•The confidence interval of mean m, standard
deviation s, and sample size n at confidence level
95% is

70
Box-Whisker Plot
•The five-number summary is a set of descriptive
statistics that provide summary information of a
dataset
•It consists of the following five numbers:
▪ Minimum
▪ First quartile
▪ Median
▪ Third quartile
▪ Maximum
•The box-whisker plot is a visual display of the
five-number summary

71
Box-Whisker Plot
• The following shows a typical box-whisker plot.
• It could be horizontal or vertical.
• We may also add the mean value.

72
End

You might also like