Chapter 2 Visualization of Data
Chapter 2 Visualization of Data
Chapter 2
Visualization of Data
Objectives
After completing this chapter, you will be able to:
Introduction
Researchers agree that vision is our dominant sense: 80–85% of information we
perceive, learn or process is mediated through vision. It is even more so when we are
trying to understand and interpret data or when we are looking for relationships among
hundreds or thousands of variables to determine their relative importance. One of the
most effective ways to discern important relationships is through advanced analysis and
easy-to-understand visualizations.
Data visualization is applied in practically every field of knowledge. Scientists in
various disciplines use computer techniques to model complex events and visualize
phenomena that cannot be observed directly, such as weather patterns, medical
conditions or mathematical relationships.
Data visualization provides an important suite of tools and techniques for gaining a
qualitative understanding. The basic techniques are the following plots:
Line Plot
The simplest technique, a line plot is used to plot the relationship or dependence
of one variable on another. To plot the relationship between the two variables, we can
simply call the plot function.
ENGINEERING DATA ANALYSIS | MODULE 2
Bar Chart
Bar charts are used for comparing the quantities of different categories or
groups. Values of a category are represented with the help of bars and they can be
configured with vertical or horizontal bars, with the length or height of each bar
representing the value.
However, they can be difficult to interpret because the human eye has a hard time
estimating areas and comparing visual angles.
Histogram Plot
A histogram, representing the distribution of a continuous variable over a given
interval or period of time, is one of the most frequently used data visualization
techniques in machine learning. It plots the data by chunking it into intervals called
‘bins’. It is used to inspect the underlying frequency distribution, outliers, skewness, and
so on.
Scatter Plot
Another common visualization techniques is a scatter plot that is a two-
dimensional plot representing the joint variation of two data items. Each marker
ENGINEERING DATA ANALYSIS | MODULE 4
(symbols such as dots, squares and plus signs) represents an observation. The marker
position indicates the value for each observation. When you assign more than two
measures, a scatter plot matrix is produced that is a series of scatter plots displaying
every possible pairing of the measures that are assigned to the visualization. Scatter
plots are used for examining the relationship, or correlations, between X and Y
variables.
Correlation Matrices
A correlation matrix allows quick identification of relationships between variables
by combining big data and fast response times. Basically, a correlation matrix is a table
showing correlation coefficients between variables: Each cell in the table represents the
relationship between two variables. Correlation matrices are used as a way to
summarize data, as an input into a more advanced analysis, and as a diagnostic for
advanced analyses.
ENGINEERING DATA ANALYSIS | MODULE 7
Data visualization may become a valuable addition to any presentation and the
quickest path to understanding your data. Besides, the process of visualizing data can
be both enjoyable and challenging. However, with the many techniques available, it is
easy to end up presenting the information using a wrong tool. To choose the most
appropriate visualization technique you need to understand the data, its type and
composition, what information you are trying to convey to your audience, and how
viewers process visual information. Sometimes, a simple line plot can do the task
saving time and effort spent on trying to plot the data using advanced Big Data
techniques. Understand your data — and it will open its hidden values to you.
A company which neglects proper descriptive statistics and data analysis already
finds itself at a disadvantage compared to its competition. With data quickly turning into
the lifeblood of the business world, it must be put to good use for a business to remain
relevant and successful. The first step is collecting the data, but in many ways, that’s
the easy part. Once all that information has been gathered, what should companies do
next? How does one make use of the large sets of data now at hand? That’s where
descriptive statistics and data visualization come in.
ENGINEERING DATA ANALYSIS | MODULE 8
If you’ve ever seen a pie graph (and that’s probably a given), then you know what
this looks like in action. Pie graphs are very simple examples of visualization, but they’re
very effective in what they do. Think of bar charts, line graphs, spider charts, scatter
plots, and diagrams and all the information they can convey in a moment. Think of it like
the ultimate visual aid. It’s easy to see why data visualization is a key ingredient in
interpreting data.
The Importance of Data Visualization
From a business perspective, data visualization is indispensable. Data scientists
may be able to look at raw data and discover key findings, but communicating what data
says to those who lack expertise in data science will always be needed. If you need to
get a point across in a short amount of time, data visualization is the way to do it. It
makes the data clear and cohesive, eliminating the fluff and showing the most important
points. With good data visualization, there will be no dispute over what the data is,
rather the only discussion would be what to do with the data presented.
Data Visualization and Descriptive Statistics in Business
Combining both descriptive statistics and data visualization transforms them into
a valuable asset for any company. One of the most important functions they serve is to
help company leaders in making key business decisions. Data has normally been used
when coming to a crucial business decision and the use of descriptive statistics and
data visualization only amplifies that effectiveness.
There are many ways in which the two are used to inform business decisions.
Through data visualization, it’s easier to notice patterns and identify how various data
points relate to each other. Business leaders can also look at recent historical trends
and determine where those trends might go and how best the company can capitalize
on them. With raw data, many of these instances would be hard to figure out, but after
employing descriptive statistics and utilizing data visualization, the correlations can
quickly become evident.
With these vital pieces, businesses suddenly become much more versatile. With
the data visually displayed for everyone to understand, companies can identify
untapped markets where their products or services might flourish. They can determine
which parts of a company’s operations can be made more efficient, thus cutting down
on costs and optimizing overall performance. They may also identify ways to improve
the customer experience by getting real time feedback from customers. Businesses can
even prepare for future growth or possible downturns, keeping organizations ahead of
the curve and ready to handle all the opportunities and challenges that await them.
The Right Data Visualization Tool
All of this requires the use of an effective and versatile data visualization tool, the
exact kind that Import.io provides. With this tool, you can become proficient in
understanding the data that you collect. A good data visualization tool like this is
extremely helpful in turning abstract data into something much easier to grasp. As part
of turning data into a visual element, the data gets cleaned, shaping into a manageable
item and filtering out data values that may unnecessarily interfere with the message
communicated through the information. Only through this process does data
visualization turn data into something you can use to help strategize and plan ahead.
ENGINEERING DATA ANALYSIS | MODULE 10
2. Arrangement
the next term Evergreen and Emery discuss is arrangement. Improper arrangement of
graph elements can confuse readers at best and mislead viewers at worst. The goal of
the arrangement is getting the viewer to focus on the substance of the visualization
rather than on how the visualization was developed. We will illustrate the argument by
providing two examples; the first example consists of disagreement where graph
elements are not clearly outlined and the second example is with agreement.
i. Example of disagreement
3. Colors
Colors are an important part of any visualization. We must think of colors when we
ENGINEERING DATA ANALYSIS | MODULE 12
apply visualization to statistical analysis. Colors are the visual perceptual properties
corresponding to the categories called red, blue, yellow, and others. Based on
Evergreen and Emery (2014), colors are used to highlight key patterns. Action colors
should guide the viewer to key parts of the display. Less important or supporting
information should be in muted colors—mix your color arrangement with white or grey,
making it less bright.
4. Lines
Lines are also an important part of the visualization. Excessive line use—gridlines,
border tick marks, and axes can add clutter or noise to a graph, so eliminate them
whenever they are not useful for interpreting data.
Our first example below consists of gridlines that, according to Evergreen and
Emery, need to be muted.
Overall meaning:
While the meaning of visualization is still a
difficult subject to determine, Evergreen and Emery
recommend we provide more details in order to
help the user to better understand the visualization.
An important goal of any research scientist is the
publication of the results of a completed study.
Most academic and professional publications in our
field require the researcher to provide a written
document, based on their style that
includes specifics of data analysis and data
collection methods, in order for it to be accepted
for review. Although the written word is still the dominant platform for reporting statistical
ENGINEERING DATA ANALYSIS | MODULE 13
If the function in question is a continuous one, then the average value of the
function between x = a and x = b is simply
Video Links:
Visualization of Data
• https://www.youtube.com/watch?v=hEWY6kkBdpo
• https://www.youtube.com/watch?v=SKv7xUvJSpk
ENGINEERING DATA ANALYSIS | MODULE 15
References
• https://datajournalism.com/read/handbook/one/unde
rstanding-data/using-data-visualization-to-find-
insights-in-data
• https://www.kdnuggets.com/2019/04/best-data-
visualization-techniques.html
• https://www.import.io/post/what-is-descriptive-
statistics-and-how-data-visualization-can-transform-
your-data/