0% found this document useful (0 votes)
20 views105 pages

Unit III

Data Visualizations
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views105 pages

Unit III

Data Visualizations
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 105

Unit III

Data Visualization
• As data and insights grow in number, a new requirement is the ability
of the executives and decision makers to absorb this information in
real time.
• There is a limit to human comprehension and visualization capacity.
• That is a good reason to prioritize and manage with fewer but key
variables that relate directly to the Key Result Areas (KRAs) of a role.
Data Visualization
• Data visualization is the graphical representation of information and
data. By using visual elements like charts, graphs, and maps, data
visualization tools provide an accessible way to see and understand
trends, outliers, and patterns in data.
• Additionally, it provides an excellent way for employees or business
owners to present data to non-technical audiences without
confusion.
• In the world of Big Data, data visualization tools and technologies are
essential to analyze massive amounts of information and make data-
driven decisions.
Considerations
• Here are few considerations when presenting using data:
1. Present the conclusions and not just report the data.
2. Choose wisely from a palette of graphs to suit the data.
3. Organize the results to make the central point stand out.
4. Ensure that the visuals accurately reflect the numbers. Inappropriate visuals
can create misinterpretations and misunderstandings.
5. Make the presentation unique, imaginative and memorable.
Data Visualization
History
• The classic presentation of the story of Napoleon’s march to Russia in
1812, by French cartographer Joseph Minard,
• It covers about six dimensions.
• Time is on horizontal axis. The geographical coordinates and rivers are
mapped in. The thickness of the bar shows the number of troops at
any point of time that is mapped. One color is used for the onward
march and another for the retreat. The weather temperature at each
time is shown in the line graph at the bottom.
Advantages of data visualization
• Easily sharing information.
• Interactively explore opportunities.
• Visualize patterns and relationships.
Disadvantages
• Biased or inaccurate information.
• Correlation doesn’t always mean causation.
• Core messages can get lost in translation.
Why data visualization is important?
• it helps people see, interact with, and better understand data.
Whether simple or complex, the right visualization can bring everyone
on the same page, regardless of their level of expertise.
• Every STEM field benefits from understanding data—and so do fields
in government, finance, marketing, history, consumer goods, service
industries, education, sports, and so on.
• Data visualization is one of the steps of the data science process,
which states that after data has been collected, processed and
modeled, it must be visualized for conclusions to be made.

Data Science
• While both fields involve working with data to gain insights, data
science often involves using data to build models that can predict
future outcomes, while data analytics tends to focus more on
analyzing past data to inform decisions in the present.
• Data Science makes use of machine learning algorithms to get
insights. Data Analytics does not use machine learning to get the
insight of data.
Difference
Why data visualization is important?
• Data Visualization Discovers the Trends in Data
Why data visualization is important?
• Data Visualization Provides a Perspective on the Data
Why data visualization is important?
• Data Visualization Puts the Data into the Correct Context
Why data visualization is important?
• Data Visualization Saves Time
Why data visualization is important?
• Data Visualization Tells a Data Story
General Types of Visualizations
• Chart: Information presented in a tabular, graphical form with data displayed
along two axes. Can be in the form of a graph, diagram, or map.
• Table: A set of figures displayed in rows and columns.
• Graph: A diagram of points, lines, segments, curves, or areas that represents
certain variables in comparison to each other, usually along two axes at a right
angle.
• Geospatial: A visualization that shows data in map form using different shapes
and colors to show the relationship between pieces of data and specific
locations.
• Infographic: A combination of visuals and words that represent data. Usually uses
charts or diagrams.
• Dashboards: A collection of visualizations and data displayed in one place to help
with analyzing and presenting data.
Categories of Data Visualization
Numerical Data
• Numerical data is also known as Quantitative data. Numerical data is
any data where data generally represents amount such as height,
weight, age of a person, etcNumerical data is categorized into two
categories :
• Continuous Data –
• It can be narrowed or categorized (Example: Height measurements).
• Discrete Data –
• This type of data is not “continuous” (Example: Number of cars or children’s a household
has).
• The type of visualization techniques that are used to represent
numerical data visualization is Charts and Numerical Values. Examples
are Pie Charts, Bar Charts, Averages, Scorecards, etc.
Categorical Data
• Categorical data is also known as Qualitative data. Categorical data is any data
where data generally represents groups. It simply consists of categorical variables
that are used to represent characteristics such as a person’s ranking, a person’s
gender, etc. Categorical data visualization is all about depicting key themes,
establishing connections, and lending context. Categorical data is classified into
three categories :
• Binary Data –
• In this, classification is based on positioning (Example: Agrees or Disagrees).
• Nominal Data –
• In this, classification is based on attributes (Example: Male or Female).
• Ordinal Data –
• In this, classification is based on ordering of information (Example: Timeline or processes).
• The type of visualization techniques that are used to represent categorical data is
Graphics, Diagrams, and Flowcharts. Examples are Word clouds, Sentiment
Mapping, Venn Diagram, etc.
Top Data Visualization Tools
• The following are the 10 best Data Visualization Tools
• Tableau
• Looker
• Zoho Analytics
• Sisense
• IBM Cognos Analytics
• Qlik Sense
• Domo
• Microsoft Power BI
• Klipfolio
• SAP Analytics Cloud
Spatial Visualization Techniques
• Univariate data --1 dimension data

• A single value can be displayed


• as the number itself -- a string of digits
• as a dial (such as the altimeter, speedometer, guage)
• as a slider or thermometer
Spatial Visualization Techniques
• Maximization

use least amount of "ink" or non-


background pixels and leverage our
pre-attentive vision to fill in the area.

Tukey plot as typically presented on


the left and a revised minimized plot
on the right (or below):
Spatial Visualization Techniques
• Information in the axes

Histogram removal of y axis; axis values are aligned with the pre-
attentive "white" line through the data
Spatial Visualization Techniques
• Sparklines

• Sparklines are examples of high data-ink ratios. They are typically a time series
and can be used to represent visually the sequence in a very dense and compact
manner. They may be small enough to just be included in the flow of the text
rather than having to refer to a separate figure.
Spatial Visualization Techniques
• One Dimensional Data as
Spatial Data

• Time is now displayed as the


x axis and the data values are
the y axis
Spatial Visualization Techniques
• Two Dimensional Data as Spatial Data
• Mapping spatial attributes of the data to the screen.
• We really are working in three dimensions now.
• Two dimensions specify the location
• A third dimension is then plotted, maybe with several other dimension (see
height and color on the map below).
• Scatterplot -- discrete data values are mapped to a location (pixel or dot) and marked by
color, shape or size; result is 2D
• Image -- each point is mapped to a pixel location and intermediate pixels that are
unmapped are interpolated for color or brightness according to neighboring mapped
pixels; result is 2D; often referred to as a "heat map"
• Rubber sheet -- each point is mapped to an image pixel and it has a third value that
controls a height. Missing points are also interpolated to make a smooth surface. Result is
3D
Spatial Visualization Techniques
• 3D Data as spatial
• Visualizing the surface
• Visualizing the volume
Visualizing Geospatial Data on a
Map
Visualizing Geospatial Data on a Map
• 1. Point map

A point map is one


of the simplest ways
to visualize
geospatial data.
Basically, you place a
point at any location
on the map that
corresponds to the
variable you’re
trying to measure
(such as a building,
e.g. a hospital).
Visualizing Geospatial Data on a Map
• Proportional symbol map

This is a variation of the point


map. It uses a circle or other shape
to represent data at a particular
location. However, based on the
point's size and/or color, it can be
used to represent multiple other
variables at once (such as
population and/or average age).
Visualizing Geospatial Data on a Map
• Cluster map

This is a proportional symbol


map with a twist. It features a
similar concept of using points
of varying sizes and colors to
represent multiple types of
data at a location at once.
However, these larger points
serve as stand-ins for smaller
points, which become visible if
you increase the map’s scale.
This gets around the main issue
of overcrowding in point maps,
but requires special geospatial
data visualization tools such as
GIS software.
Visualizing Geospatial Data on a Map
• Choropleth map

It’s made by
separating the area
being mapped, such
as by geographic or
political boundaries,
and then filling each
resulting section
with a different
color or shade.
Visualizing Geospatial Data on a Map
• Cartogram map

This variation of the


choropleth map is a hybrid
of a map and a chart. It
involves taking a land area
map of a geographic
region and dividing it into
segments in such a way
that sizes and/or distances
are proportional to the
values of the variable
being measured.
Visualizing Geospatial Data on a Map
• Hexagonal binning map
Visualizing Geospatial Data on a Map
• Heat map
Visualizing Geospatial Data on a Map
• Topographic map
Visualizing Geospatial Data on a Map
• Flow map

Flow maps, also known as ‘path’


maps, are more specialized
versions of line maps. Instead of
focusing on physical features of
the earth, they are used to
represent the movement of things
across the earth over time.
Visualizing Geospatial Data on a Map
• Spider map

The spider map is a


variation of the flow
map. Instead of
focusing on discrete
pairs of origin and
destination data points,
the spider map looks at
the relationships
between origin points
and multiple
destination points –
some of which may be
held in common.
Visualizing Geospatial Data on a Map
• Time-space distribution map

This is an advanced form of geospatial


data mapping that combines the
precision of a point map with the
dynamism of a flow map. It seeks to
accurately determine the locations of
objects at any point in time as they
move.
Visualizing Geospatial Data on a Map
• Data space distribution map

This is another variant of the


flow map that aims to not
only represent the movement
of things over time, but also
how variables dependent on
that movement change over
time.
Time Oriented Visualizations
Time Oriented Visualizations
• Time can be simply viewed as linear and chosen as the x-axis in most
visualizations.
Time Oriented Visualizations
1. Scale
• How is time measured? When are the data measurements/samples
taken?

• Ordinal -- before, during, after


• Discrete -- clear intervals (seconds, minutes, hours.....)
• Continuous -- mapping to the real numbers. Discrete values can be
interpolated
Time Oriented Visualizations
2. Scope
• The range of time associated with a measurement/sample

• point -- the sample is from a point in time that has no duration


• interval-based -- there is a duration; a start and end
These time primitives can be anchored (absolute) or unanchored (relative)

• We can also recognize determinancy:

• determinant -- all aspects of time is known and fixed


• indeterminant -- there may be some uncertainty. Intervals are sometimes used here
to compensate.
Time Oriented Visualizations
3. Arrangement
Time often has a cyclical nature, compared to the linear nature described
above:
• hourly cycle
• 24 hour cycle in a daily cyclc
• 7 days in a weekly cycle (Mon->Tues....Sun->Mon)
• ~30 days in a monthly cyclc
• lunar cycle
• quarterly/seasonal cycle (financial, astronomical, meteorological)
• 365 days, 52 weeks, 12 months in a yearly cycle
• Decades
The different units suggest granularity. How you might represent a
visualization may vary (interactively) by granularity (zoom in, zoom out)
Characteristics of Time-Oriented Data
• This is more of a reminder of the data typing we have discussed
earlier in the course
Multivariate Data
Multivariate Data
• Univariate statistics summarize only one variable at a time. Bivariate
statistics compare two variables. Multivariate statistics compare more
than two variables.

• Multivariate visualizations can be done by adding more than one


visual variable to a simple renderer. Common combinations include:
1.Color and size
2.Size and rotation
3.Size, rotation, and color
Scatter Plot
Correlation Matrix
Heatmap
Parallel Coordinate Plot
Bubble Chart
Graphs, Trees, and How to
Visualize Them
Graphs, Trees, and How to Visualize Them
• Let’s instead talk about graphs, networks, & trees in the mathematical
sense: a model for representing items and the relationships between
those items
• Social / friendship networks
• Computer networks
• Energy or transportation grids
• Organizational structures
• Etc.
Node-link tree diagrams
• Nodes are distributed in space, connected by straight or curved lines
• Typical approach is to use 2D space to break apart breadth and depth
• Often, space is used to communicate hierarchical orientation
Tidy Tree
Text and Document Visualization
Text and Document Visualization
• Here we consider visualizing the text within a document, and collections of
documents which are likely related (corpus).

• Difficulty in analysis includes the loose structure, varied vocabulary, and


optional metadata such as author(s), date, modification dates, comments,
keywords, catalog codes, citations.
• Levels of text to be represented:
• Lexical level -- Simple grouping of characters into "tokens" which are typically words,
but word stems, phrases, word n-grams and character n-grams may be beneficial
• Syntactic level --Parsing purpose of token, grammatical category, tense, plurality, in
the context of the phrase, sentence and paragraph
• Semantic level -- Extract meaning of the syntactic structure with the tokens using
fuller analysis of the context.
Vector Space Model
• Analysis of the words in a document and determine their value in
contribution and significance to the document.

• Removal of noise words ("a", "an", "the", "that") and punctuation,


and stemming (collecting roots of words) are typical of preprocessing.

• Simple frequency counts of significant words ordered by decreasing


frequency is a simple vector.
Vector Space Model
• https://wordcounter.net/

• Here we consider visualizing the text within a document, and collections of documents which
are likely related (corpus).

• Difficulty in analysis includes the loose structure, varied vocabulary, and optional metadata
such as author(s), date, modification dates, comments, keywords, catalog codes, citations.
• Levels of text to be represented:
• Lexical level -- Simple grouping of characters into "tokens" which are typically words, but word
stems, phrases, word n-grams and character n-grams may be beneficial
• Syntactic level --Parsing purpose of token, grammatical category, tense, plurality, in the
context of the phrase, sentence and paragraph
• Semantic level -- Extract meaning of the syntactic structure with the tokens using fuller
analysis of the context.
Term Frequency--Inverse Document
Frequency
Mapping vector
space models to
the document
Single Document Visualization
• Tag Clouds visualizes the words by size based on frequency. Again this
is the opening Intro section.
• tagcrowd.com
• Here we consider visualizing the text within a document, and collections of documents which
are likely related (corpus).

• Difficulty in analysis includes the loose structure, varied vocabulary, and optional metadata such
as author(s), date, modification dates, comments, keywords, catalog codes, citations.
• Levels of text to be represented:
• Lexical level -- Simple grouping of characters into "tokens" which are typically words, but word
stems, phrases, word n-grams and character n-grams may be beneficial
• Syntactic level --Parsing purpose of token, grammatical category, tense, plurality, in the context
of the phrase, sentence and paragraph
• Semantic level -- Extract meaning of the syntactic structure with the tokens using fuller analysis
of the context.
Wordle
• Creates a visualization with size based on frequency.
• http://wordle.net
Wordle
Word Tree
• https://www.jasondavies.com/wordtree/
TextArc
Music theme
visualization
example
Literature fingerprinting
• Here we look at n-word-grams to match patterns of the author.
• N-gram is probably the easiest concept to understand in the whole
machine learning space, I guess. An N-gram means a sequence of N
words. So for example, “Medium blog” is a 2-gram (a bigram), “A
Medium blog post” is a 4-gram, and “Write on Medium” is a 3-gram
(trigram). Well, that wasn’t very interesting or exciting. True, but we
still have to look at the probability used with n-grams, which is quite
interesting.
Document Collection Visualizations
• Goal is to place similar documents close together.
• graph spring layouts,
• multi-dimensional scaling
• clustering (K-means, hierarchical)
• self-organizing maps
• Self-organizing maps -- use the vectors from each document to calculate distances from
each other. Higher weights draw the documents closer together. Randomly start with
one document.
• Stream Graph
Power Query & M Language
Power Query & M Language
• Power Query is built on what was then a new query language called
M. It is a mashup language (hence the letter M) designed to create
queries that mix together data.
Advanced Editor
Case Sensitivity
• One of the first things someone needs to be aware of when
writing M code is that it is a case-sensitive language.
Expressions And Values In Power Query
• An expression is something that can be evaluated to return a value in
power query. 1 + 1 is an expression that evaluates to the value 2.
• A value is a single piece of data. Values can be single values such as
numbers, text, logical, null, binary, date, time, datetime,
datetimezone, or durations.
• Values can also have more complex structures than single
values such as lists, records, and tables.
• You can also have values that are a combination of lists,
records, and tables. Lists of lists, tables of lists, tables of tables,
etc. These are all possible value structures.
Single Literal Values
• Single literal values are the basic building block of all the other values.

• 123.45 is a number value.


• "Hello World!" is a text value.
• true is a logical value.
• null represent the absence of a value.
Single Intrinsic Values
• #time(hours, minutes, seconds)
• #date(years, months, days)
• #datetime(years, months, days, hours, minutes, seconds)
• #datetimezone( years, months, days, hours, minutes, seconds, offset-
hours, offset-minutes)
• #duration(days, hours, minutes, seconds)
• For example, to construct the date 2018-12-31 you would need to
construct it using the #date(2018, 12, 31) intrinsic function.
Structured Values
• Lists
• Records
• Tables
Expressions
• Expressions are anything that can be evaluated to a value.
• This is true of values themselves. For example, the expression 1 evaluates
to the value 1.
• Although you would typically think of expressions as being made up of
more complex operations or functions.

• For example.
• The expression 1 + 1 evaluates to 2.
• The expression 3 > 2 evaluates to true.
• The expression "Hello " & "World" evaluates to "Hello World".
• The expression Text.Upper("Hello World") evaluates to "HELLO WORLD".
Operators
• Arithmetic +, -, *, and /
• Comparison <, >, <=, >=, =, <>.
• Concatenation and Merger &
• Logical not, and, and or
If Then Else Statements
References
• 12 Methods for Visualizing Geospatial Data on a Map | SafeGraph

• Time Oriented Visualizations (juniata.edu)

You might also like