Unit III
Unit III
Data Visualization
• As data and insights grow in number, a new requirement is the ability
of the executives and decision makers to absorb this information in
real time.
• There is a limit to human comprehension and visualization capacity.
• That is a good reason to prioritize and manage with fewer but key
variables that relate directly to the Key Result Areas (KRAs) of a role.
Data Visualization
• Data visualization is the graphical representation of information and
data. By using visual elements like charts, graphs, and maps, data
visualization tools provide an accessible way to see and understand
trends, outliers, and patterns in data.
• Additionally, it provides an excellent way for employees or business
owners to present data to non-technical audiences without
confusion.
• In the world of Big Data, data visualization tools and technologies are
essential to analyze massive amounts of information and make data-
driven decisions.
Considerations
• Here are few considerations when presenting using data:
1. Present the conclusions and not just report the data.
2. Choose wisely from a palette of graphs to suit the data.
3. Organize the results to make the central point stand out.
4. Ensure that the visuals accurately reflect the numbers. Inappropriate visuals
can create misinterpretations and misunderstandings.
5. Make the presentation unique, imaginative and memorable.
Data Visualization
History
• The classic presentation of the story of Napoleon’s march to Russia in
1812, by French cartographer Joseph Minard,
• It covers about six dimensions.
• Time is on horizontal axis. The geographical coordinates and rivers are
mapped in. The thickness of the bar shows the number of troops at
any point of time that is mapped. One color is used for the onward
march and another for the retreat. The weather temperature at each
time is shown in the line graph at the bottom.
Advantages of data visualization
• Easily sharing information.
• Interactively explore opportunities.
• Visualize patterns and relationships.
Disadvantages
• Biased or inaccurate information.
• Correlation doesn’t always mean causation.
• Core messages can get lost in translation.
Why data visualization is important?
• it helps people see, interact with, and better understand data.
Whether simple or complex, the right visualization can bring everyone
on the same page, regardless of their level of expertise.
• Every STEM field benefits from understanding data—and so do fields
in government, finance, marketing, history, consumer goods, service
industries, education, sports, and so on.
• Data visualization is one of the steps of the data science process,
which states that after data has been collected, processed and
modeled, it must be visualized for conclusions to be made.
•
Data Science
• While both fields involve working with data to gain insights, data
science often involves using data to build models that can predict
future outcomes, while data analytics tends to focus more on
analyzing past data to inform decisions in the present.
• Data Science makes use of machine learning algorithms to get
insights. Data Analytics does not use machine learning to get the
insight of data.
Difference
Why data visualization is important?
• Data Visualization Discovers the Trends in Data
Why data visualization is important?
• Data Visualization Provides a Perspective on the Data
Why data visualization is important?
• Data Visualization Puts the Data into the Correct Context
Why data visualization is important?
• Data Visualization Saves Time
Why data visualization is important?
• Data Visualization Tells a Data Story
General Types of Visualizations
• Chart: Information presented in a tabular, graphical form with data displayed
along two axes. Can be in the form of a graph, diagram, or map.
• Table: A set of figures displayed in rows and columns.
• Graph: A diagram of points, lines, segments, curves, or areas that represents
certain variables in comparison to each other, usually along two axes at a right
angle.
• Geospatial: A visualization that shows data in map form using different shapes
and colors to show the relationship between pieces of data and specific
locations.
• Infographic: A combination of visuals and words that represent data. Usually uses
charts or diagrams.
• Dashboards: A collection of visualizations and data displayed in one place to help
with analyzing and presenting data.
Categories of Data Visualization
Numerical Data
• Numerical data is also known as Quantitative data. Numerical data is
any data where data generally represents amount such as height,
weight, age of a person, etcNumerical data is categorized into two
categories :
• Continuous Data –
• It can be narrowed or categorized (Example: Height measurements).
• Discrete Data –
• This type of data is not “continuous” (Example: Number of cars or children’s a household
has).
• The type of visualization techniques that are used to represent
numerical data visualization is Charts and Numerical Values. Examples
are Pie Charts, Bar Charts, Averages, Scorecards, etc.
Categorical Data
• Categorical data is also known as Qualitative data. Categorical data is any data
where data generally represents groups. It simply consists of categorical variables
that are used to represent characteristics such as a person’s ranking, a person’s
gender, etc. Categorical data visualization is all about depicting key themes,
establishing connections, and lending context. Categorical data is classified into
three categories :
• Binary Data –
• In this, classification is based on positioning (Example: Agrees or Disagrees).
• Nominal Data –
• In this, classification is based on attributes (Example: Male or Female).
• Ordinal Data –
• In this, classification is based on ordering of information (Example: Timeline or processes).
• The type of visualization techniques that are used to represent categorical data is
Graphics, Diagrams, and Flowcharts. Examples are Word clouds, Sentiment
Mapping, Venn Diagram, etc.
Top Data Visualization Tools
• The following are the 10 best Data Visualization Tools
• Tableau
• Looker
• Zoho Analytics
• Sisense
• IBM Cognos Analytics
• Qlik Sense
• Domo
• Microsoft Power BI
• Klipfolio
• SAP Analytics Cloud
Spatial Visualization Techniques
• Univariate data --1 dimension data
Histogram removal of y axis; axis values are aligned with the pre-
attentive "white" line through the data
Spatial Visualization Techniques
• Sparklines
• Sparklines are examples of high data-ink ratios. They are typically a time series
and can be used to represent visually the sequence in a very dense and compact
manner. They may be small enough to just be included in the flow of the text
rather than having to refer to a separate figure.
Spatial Visualization Techniques
• One Dimensional Data as
Spatial Data
It’s made by
separating the area
being mapped, such
as by geographic or
political boundaries,
and then filling each
resulting section
with a different
color or shade.
Visualizing Geospatial Data on a Map
• Cartogram map
• Here we consider visualizing the text within a document, and collections of documents which
are likely related (corpus).
• Difficulty in analysis includes the loose structure, varied vocabulary, and optional metadata
such as author(s), date, modification dates, comments, keywords, catalog codes, citations.
• Levels of text to be represented:
• Lexical level -- Simple grouping of characters into "tokens" which are typically words, but word
stems, phrases, word n-grams and character n-grams may be beneficial
• Syntactic level --Parsing purpose of token, grammatical category, tense, plurality, in the
context of the phrase, sentence and paragraph
• Semantic level -- Extract meaning of the syntactic structure with the tokens using fuller
analysis of the context.
Term Frequency--Inverse Document
Frequency
Mapping vector
space models to
the document
Single Document Visualization
• Tag Clouds visualizes the words by size based on frequency. Again this
is the opening Intro section.
• tagcrowd.com
• Here we consider visualizing the text within a document, and collections of documents which
are likely related (corpus).
• Difficulty in analysis includes the loose structure, varied vocabulary, and optional metadata such
as author(s), date, modification dates, comments, keywords, catalog codes, citations.
• Levels of text to be represented:
• Lexical level -- Simple grouping of characters into "tokens" which are typically words, but word
stems, phrases, word n-grams and character n-grams may be beneficial
• Syntactic level --Parsing purpose of token, grammatical category, tense, plurality, in the context
of the phrase, sentence and paragraph
• Semantic level -- Extract meaning of the syntactic structure with the tokens using fuller analysis
of the context.
Wordle
• Creates a visualization with size based on frequency.
• http://wordle.net
Wordle
Word Tree
• https://www.jasondavies.com/wordtree/
TextArc
Music theme
visualization
example
Literature fingerprinting
• Here we look at n-word-grams to match patterns of the author.
• N-gram is probably the easiest concept to understand in the whole
machine learning space, I guess. An N-gram means a sequence of N
words. So for example, “Medium blog” is a 2-gram (a bigram), “A
Medium blog post” is a 4-gram, and “Write on Medium” is a 3-gram
(trigram). Well, that wasn’t very interesting or exciting. True, but we
still have to look at the probability used with n-grams, which is quite
interesting.
Document Collection Visualizations
• Goal is to place similar documents close together.
• graph spring layouts,
• multi-dimensional scaling
• clustering (K-means, hierarchical)
• self-organizing maps
• Self-organizing maps -- use the vectors from each document to calculate distances from
each other. Higher weights draw the documents closer together. Randomly start with
one document.
• Stream Graph
Power Query & M Language
Power Query & M Language
• Power Query is built on what was then a new query language called
M. It is a mashup language (hence the letter M) designed to create
queries that mix together data.
Advanced Editor
Case Sensitivity
• One of the first things someone needs to be aware of when
writing M code is that it is a case-sensitive language.
Expressions And Values In Power Query
• An expression is something that can be evaluated to return a value in
power query. 1 + 1 is an expression that evaluates to the value 2.
• A value is a single piece of data. Values can be single values such as
numbers, text, logical, null, binary, date, time, datetime,
datetimezone, or durations.
• Values can also have more complex structures than single
values such as lists, records, and tables.
• You can also have values that are a combination of lists,
records, and tables. Lists of lists, tables of lists, tables of tables,
etc. These are all possible value structures.
Single Literal Values
• Single literal values are the basic building block of all the other values.
• For example.
• The expression 1 + 1 evaluates to 2.
• The expression 3 > 2 evaluates to true.
• The expression "Hello " & "World" evaluates to "Hello World".
• The expression Text.Upper("Hello World") evaluates to "HELLO WORLD".
Operators
• Arithmetic +, -, *, and /
• Comparison <, >, <=, >=, =, <>.
• Concatenation and Merger &
• Logical not, and, and or
If Then Else Statements
References
• 12 Methods for Visualizing Geospatial Data on a Map | SafeGraph