Bda Unit 5
Bda Unit 5
Bda Unit 5
BDA UNIT-5
lOMoARcPSD|25913154
UNIT V
Predictive Analytics and Visualizations: Predictive Analytics, Simple linear regression, Multiple
linear regression, Interpretation of regression coefficients, Visualizations, Visual data analysis
techniques, interaction techniques, Systems and application
Time series models use various data inputs at a specific time frequency, such as daily, weekly,
monthly, et cetera. It is common to plot the dependent variable over time to assess the data for
seasonality, trends, and cyclical behavior, which may indicate the need for specific transformations
and model types. Autoregressive (AR), moving average (MA), ARMA, and ARIMA models are
all frequently used time series models. As an example, a call center can use a time series model to
forecast how many calls it will receive per hour at different times of day.
Predictive analytics industry use cases
Predictive analytics can be deployed in across various industries for different business problems.
Below are a few industry use cases to illustrate how predictive analytics can inform decision-
making within real-world situations.
• Banking: Financial services use machine learning and quantitative tools to predict credit
risk and detect fraud. As an example, BondIT is a company that specializes in fixed-income
asset-management services. Predictive analytics allows them to support dynamic market
changes in real-time in addition to static market constraints. This use of technology allows
it to both customize personal services for clients and to minimize risk.
• Healthcare: Predictive analytics in health care is used to detect and manage the care of
chronically ill patients, as well as to track specific infections such as sepsis. Geisinger
Health used predictive analytics to mine health records to learn more about how sepsis is
diagnosed and treated. Geisinger created a predictive model based on health records for
more than 10,000 patients who had been diagnosed with sepsis in the past. The model
yielded impressive results, correctly predicting patients with a high rate of survival.
• Human resources (HR): HR teams use predictive analytics and employee survey metrics
to match prospective job applicants, reduce employee turnover and increase employee
engagement. This combination of quantitative and qualitative data allows businesses to
reduce their recruiting costs and increase employee satisfaction, which is particularly
useful when labor markets are volatile.
• Marketing and sales: While marketing and sales teams are very familiar with business
intelligence reports to understand historical sales performance, predictive analytics enables
companies to be more proactive in the way that they engage with their clients across the
customer lifecycle. For example, churn predictions can enable sales teams to identify
dissatisfied clients sooner, enabling them to initiate conversations to promote retention.
Marketing teams can leverage predictive data analysis for cross-sell strategies, and this
commonly manifests itself through a recommendation engine on a brand’s website.
• Supply chain: Businesses commonly use predictive analytics to manage product inventory
and set pricing strategies. This type of predictive analysis helps companies meet customer
demand without overstocking warehouses. It also enables companies to assess the cost and
return on their products over time. If one part of a given product becomes more expensive
to import, companies can project the long-term impact on revenue if they do or do not pass
on additional costs to their customer base. For a deeper look at a case study, you can read
more about how FleetPride used this type of data analytics to inform their decision making
on their inventory of parts for excavators and tractor trailers. Past shipping orders enabled
them to plan more precisely to set appropriate supply thresholds based on demand.
where, X: independent variable & Y: dependent variable Now, if the result of the above command
is greater than 0.85 then choose simple linear regression.
If r < 0.85 then use transformation of data to increase the value of “r” and then build a simple linear
regression model on transformed data.
Steps to Implement Simple Linear Regression:
1. Analyze data (analyze scatter plot for linearity)
2. Get sample data for model building
3. Then design a model that explains the data
4. And use the same developed model on the whole population to make predictions.
The equation that represents how an independent variable X is related to a dependent variable Y.
Example:
Let us understand simple linear regression by considering an example. Consider we want to predict
the weight gain based upon calories consumed only based on the below given data.
Now, if we want to predict weight gain when you consume 2500 calories. Firstly, we need to
visualize data by drawing a scatter plot of the data to conclude that calories consumed is the best
independent variable X to predict dependent variable Y.
As, r = 0.9910422 which is greater than 0.85, we shall consider calories consumed as the best
independent variable(X) and weight gain(Y) as the predict dependent variable.
Now, try to imagine a straight line drawn in a way that should be close to every data point in the
scatter diagram.
To predict the weight gain for consumption of 2500 calories, you can simply extend the straight
line further to the y-axis at a value of 2,500 on x-axis . This projected value of y-axis gives you
the rough weight gain. This straight line is a regression line.
Similarly, if we substitute the x value in equation of regression model such as:
So, weight gain predicted by our simple linear regression model is 4.49Kgs after consumption of
2500 calories.
The independent variables can be either continuous (like age and height) or categorical (like
gender and occupation). It's important to note that if your dependent variable is categorical, you
should dummy code it before running the analysis.
Let k represent the number of variables denoted by x1, x2, x3, ……, xk.
For this method, we assume that we have k independent variables x1, . . . , xk that we can set,
then they probabilistically determine an outcome Y.
• The slope of y depends on the y-intercept, that is, when xi and x2 are both zero, y will be β0.
• The regression coefficients β1 and β2 represent the change in y as a result of one-unit changes in xi1
and xi2.
Where ε is a standard error, this is just like we had for simple linear regression, except k doesn’t
have to be 1.
For i th observation, we set the independent variables to the values xi1, xi2 . . . , xik and measure
a value yi for the random variable Yi.
Where the errors i are independent standard variables, each with mean 0 and the same unknown
variance σ2.
Altogether the model for multiple linear regression has k + 2 unknown parameters:
When k was equal to 1, we found the least squares line y = βˆ 0 +βˆ 1x.
yˆi = βˆ 0 + βˆ 1xi1 + βˆ 2xi2 + · · · + βˆ kxik for i = 1, . . . , n that should be close to the actual
values yi.
For your model to be reliable and valid, there are some essential requirements:
A simple linear regression can accurately capture the relationship between two variables in
simple relationships. On the other hand, multiple linear regression can capture more complex
interactions that require more thought.
A multiple regression model uses more than one independent variable. It does not suffer from the
same limitations as the simple regression equation, and it is thus able to fit curved and non-linear
relationships. The following are the uses of multiple linear regression.
2. Prediction or Forecasting.
Estimating relationships between variables can be exciting and useful. As with all other
regression models, the multiple regression model assesses relationships among variables in terms
of their ability to predict the value of the dependent variable.
The p-value from the regression table tells us whether or not this regression coefficient is actually
statistically significant. We can see that the p-value for Hours studied is 0.009, which is
statistically significant at an alpha level of 0.05.
Note: The alpha level should be chosen before the regression analysis is conducted – common
choices for the alpha level are 0.01, 0.05, and 0.10.
Related post: An Explanation of P-Values and Statistical Significance
Interpreting the Coefficient of a Categorical Predictor Variable
For a categorical predictor variable, the regression coefficient represents the difference in the
predicted value of the response variable between the category for which the predictor variable = 0
and the category for which the predictor variable = 1.
In this example, Tutor is a categorical predictor variable that can take on two different values:
• 1 = the student used a tutor to prepare for the exam
• 0 = the student did not used a tutor to prepare for the exam
From the regression output, we can see that the regression coefficient for Tutor is 8.34. This means
that, on average, a student who used a tutor scored 8.34 points higher on the exam compared to a
student who did not used a tutor, assuming the predictor variable Hours studied is held constant.
For example, consider student A who studies for 10 hours and uses a tutor. Also consider student
B who studies for 10 hours and does not use a tutor. According to our regression output, student
A is expected to receive an exam score that is 8.34 points higher than student B.
The p-value from the regression table tells us whether or not this regression coefficient is actually
statistically significant. We can see that the p-value for Tutor is 0.138, which is not statistically
significant at an alpha level of 0.05. This indicates that although students who used a tutor scored
higher on the exam, this difference could have been due to random chance.
Interpreting All of the Coefficients At Once
We can use all of the coefficients in the regression table to create the following estimated
regression equation:
Expected exam score = 48.56 + 2.03*(Hours studied) + 8.34*(Tutor)
Note: Keep in mind that the predictor variable “Tutor” was not statistically significant at alpha
level 0.05, so you may choose to remove this predictor from the model and not use it in the final
estimated regression equation.
Using this estimated regression equation, we can predict the final exam score of a student based
on their total hours studied and whether or not they used a tutor.
For example, a student who studied for 10 hours and used a tutor is expected to receive an exam
score of:
Expected exam score = 48.56 + 2.03*(10) + 8.34*(1) = 77.2
Considering Correlation When Interpreting Regression Coefficients
It’s important to keep in mind that predictor variables can influence each other in a regression
model. For example, most predictor variables will be at least somewhat related to one another (e.g.
perhaps a student who studies more is also more likely to use a tutor).
This means that regression coefficients will change when different predict variables are added or
removed from the model.
One good way to see whether or not the correlation between predictor variables is severe enough
to influence the regression model in a serious way is to check the VIF between the predictor
variables. This will tell you whether or not the correlation between predictor variables is a problem
that should be addressed before you decide to interpret the regression coefficients.
If you are running a simple linear regression model with only one predictor, then correlated
predictor variables will not be a problem.
between the right individuals for specific tasks. Project managers frequently use Gantt charts and
waterfall charts to illustrate workflows. Data modeling also uses abstraction to represent and better
understand data flow within an enterprise’s information system, making it easier for developers,
business analysts, data architects, and others to understand the relationships in a database or data
warehouse.
Visual discovery
Visual discovery and every day data viz are more closely aligned with data teams. While visual
discovery helps data analysts, data scientists, and other data professionals identify patterns and
trends within a dataset, every day data viz supports the subsequent storytelling after a new insight
has been found.
Data visualization
Data visualization is a critical step in the data science process, helping teams and individuals
convey data more effectively to colleagues and decision makers. Teams that manage reporting
systems typically leverage defined template views to monitor performance. However, data
visualization isn’t limited to performance dashboards. For example, while text mining an analyst
may use a word cloud to to capture key concepts, trends, and hidden relationships within this
unstructured data. Alternatively, they may utilize a graph structure to illustrate relationships
between entities in a knowledge graph. There are a number of ways to represent different types of
data, and it’s important to remember that it is a skillset that should extend beyond your core
analytics team.
Types of data visualizations
The earliest form of data visualization can be traced back the Egyptians in the pre-17th century,
largely used to assist in navigation. As time progressed, people leveraged data visualizations for
broader applications, such as in economic, social, health disciplines. Perhaps most notably, Edward
Tufte published The Visual Display of Quantitative Information (link resides outside IBM), which
illustrated that individuals could utilize data visualization to present data in a more effective
manner. His book continues to stand the test of time, especially as companies turn to dashboards
to report their performance metrics in real-time. Dashboards are effective data visualization tools
for tracking and visualizing data from multiple data sources, providing visibility into the effects of
specific behaviors by a team or an adjacent one on performance. Dashboards include common
visualization techniques, such as:
• Tables: This consists of rows and columns used to compare variables. Tables can show a
great deal of information in a structured way, but they can also overwhelm users that are
simply looking for high-level trends.
• Pie charts and stacked bar charts: These graphs are divided into sections that represent
parts of a whole. They provide a simple way to organize data and compare the size of each
component to one other.
• Line charts and area charts: These visuals show change in one or more quantities by
plotting a series of data points over time and are frequently used within predictive analytics.
Line graphs utilize lines to demonstrate these changes while area charts connect data points
with line segments, stacking variables on top of one another and using color to distinguish
between variables.
• Histograms: This graph plots a distribution of numbers using a bar chart (with no spaces
between the bars), representing the quantity of data that falls within a particular range. This
visual makes it easy for an end user to identify outliers within a given dataset.
• Scatter plots: These visuals are beneficial in reveling the relationship between two
variables, and they are commonly used within regression data analysis. However, these can
sometimes be confused with bubble charts, which are used to visualize three variables via
the x-axis, the y-axis, and the size of the bubble.
• Heat maps: These graphical representation displays are helpful in visualizing behavioral
data by location. This can be a location on a map, or even a webpage.
• Tree maps, which display hierarchical data as a set of nested shapes, typically rectangles.
Treemaps are great for comparing the proportions between categories via their area size.
Open source visualization tools
Access to data visualization tools has never been easier. Open source libraries, such as D3.js,
provide a way for analysts to present data in an interactive way, allowing them to engage a broader
audience with new data. Some of the most popular open source visualization libraries include:
• D3.js: It is a front-end JavaScript library for producing dynamic, interactive data
visualizations in web browsers. D3.js (link resides outside IBM) uses HTML, CSS, and
SVG to create visual representations of data that can be viewed on any browser. It also
provides features for interactions and animations.
• ECharts: A powerful charting and visualization library that offers an easy way to add
intuitive, interactive, and highly customizable charts to products, research papers,
presentations, etc. Echarts (link resides outside IBM) is based in JavaScript and ZRender,
a lightweight canvas library.
• Vega: Vega (link resides outside IBM) defines itself as “visualization grammar,”
providing support to customize visualizations across large datasets which are accessible
from the web.
• deck.gl: It is part of Uber's open source visualization framework suite. deck.gl (link resides
outside IBM) is a framework, which is used for exploratory data analysis on big data. It
helps build high-performance GPU-powered visualization on the web.
Data visualization best practices
With so many data visualization tools readily available, there has also been a rise in ineffective
information visualization. Visual communication should be simple and deliberate to ensure that
your data visualization helps your target audience arrive at your intended insight or conclusion.
The following best practices can help ensure your data visualization is useful and clear:
Set the context: It’s important to provide general background information to ground the audience
around why this particular data point is important. For example, if e-mail open rates were
underperforming, we may want to illustrate how a company’s open rate compares to the overall
industry, demonstrating that the company has a problem within this marketing channel. To drive
an action, the audience needs to understand how current performance compares to something
tangible, like a goal, benchmark, or other key performance indicators (KPIs).
Know your audience(s): Think about who your visualization is designed for and then make sure
your data visualization fits their needs. What is that person trying to accomplish? What kind of
questions do they care about? Does your visualization address their concerns? You’ll want the data
that you provide to motivate people to act within their scope of their role. If you’re unsure if the
visualization is clear, present it to one or two people within your target audience to get feedback,
allowing you to make additional edits prior to a large presentation.
Choose an effective visual: Specific visuals are designed for specific types of datasets. For
instance, scatter plots display the relationship between two variables well, while line graphs
display time series data well. Ensure that the visual actually assists the audience in understanding
your main takeaway. Misalignment of charts and data can result in the opposite, confusing your
audience further versus providing clarity.
Keep it simple: Data visualization tools can make it easy to add all sorts of information to your
visual. However, just because you can, it doesn’t mean that you should! In data visualization, you
want to be very deliberate about the additional information that you add to focus user attention.
For example, do you need data labels on every bar in your bar chart? Perhaps you only need one
or two to help illustrate your point. Do you need a variety of colors to communicate your idea?
Are you using colors that are accessible to a wide range of audiences (e.g. accounting for color
blind audiences)? Design your data visualization for maximum impact by eliminating information
that may distract your target audience.
Data visualization is a graphical representation of information and data. By using visual elements
like charts, graphs, and maps, data visualization tools provide an accessible way to see and
understand trends, outliers, and patterns in data. This blog on data visualization techniques will
help you understand detailed techniques and benefits.
In the world of Big Data, data visualization in Python tools and technologies are essential to
analyze massive amounts of information and make data-driven decisions.
Contributed by: Dinesh
Benefits of good data visualization
Our eyes are drawn to colours and patterns. We can quickly identify red from blue, and square
from the circle. Our culture is visual, including everything from art and advertisements to TV and
movies.
Data visualization is another form of visual art that grabs our interest and keeps our eyes on the
message. When we see a chart, we quickly see trends and outliers. If we can see something, we
internalize it quickly. It’s storytelling with a purpose. If you’ve ever stared at a massive spreadsheet
of data and couldn’t see a trend, you know how much more effective a visualization can be. The
uses of Data Visualization as follows.
• Powerful way to explore data with presentable results.
• Primary use is the pre-processing portion of the data mining process.
• Supports the data cleaning process by finding incorrect and missing values.
• For variable derivation and selection means to determine which variable to include and
discarded in the analysis.
• Also play a role in combining categories as part of the data reduction process.
Data Visualization Techniques
• Box plots
• Histograms
• Heat maps
• Charts
• Tree maps
• Word Cloud/Network diagram
Enrol Now – Data Visualization Using Tableau course for free offered by Great Learning
Academy.
Box Plots
The image above is a box plot. A boxplot is a standardized way of displaying the distribution of
data based on a five-number summary (“minimum”, first quartile (Q1), median, third quartile (Q3),
and “maximum”). It can tell you about your outliers and what their values are. It can also tell you
if your data is symmetrical, how tightly your data is grouped, and if and how your data is skewed.
A box plot is a graph that gives you a good indication of how the values in the data are spread out.
Although box plots may seem primitive in comparison to a histogram or density plot, they have
the advantage of taking up less space, which is useful when comparing distributions between many
groups or datasets. For some distributions/datasets, you will find that you need more information
than the measures of central tendency (median, mean, and mode). You need to have information
on the variability or dispersion of the data.
List of Methods to Visualize Data
• Column Chart: It is also called a vertical bar chart where each category is represented by
a rectangle. The height of the rectangle is proportional to the values that are plotted.
• Bar Graph: It has rectangular bars in which the lengths are proportional to the values
which are represented.
• Stacked Bar Graph: It is a bar style graph that has various components stacked together
so that apart from the bar, the components can also be compared to each other.
• Stacked Column Chart: It is similar to a stacked bar; however, the data is stacked
horizontally.
• Area Chart: It combines the line chart and bar chart to show how the numeric values of
one or more groups change over the progress of a viable area.
• Dual Axis Chart: It combines a column chart and a line chart and then compares the two
variables.
• Line Graph: The data points are connected through a straight line; therefore, creating a
representation of the changing trend.
• Mekko Chart: It can be called a two-dimensional stacked chart with varying column
widths.
• Pie Chart: It is a chart where various components of a data set are presented in the form
of a pie which represents their proportion in the entire data set.
• Waterfall Chart: With the help of this chart, the increasing effect of sequentially
introduced positive or negative values can be understood.
• Bubble Chart: It is a multi-variable graph that is a hybrid of Scatter Plot and a
Proportional Area Chart.
• Scatter Plot Chart: It is also called a scatter chart or scatter graph. Dots are used to denote
values for two different numeric variables.
• Bullet Graph: It is a variation of a bar graph. A bullet graph is used to swap dashboard
gauges and meters.
• Funnel Chart: The chart determines the flow of users with the help of a business or sales
process.
• Heat Map: It is a technique of data visualization that shows the level of instances as color
in two dimensions.
Five Number Summary of Box Plot
Minimum Q1 -1.5*IQR
First quartile (Q1/25th The middle number between the smallest number (not the “minimum”)
Percentile)”: and the median of the dataset
Third quartile (Q3/75th the middle value between the median and the highest value (not the
Percentile)”: “maximum”) of the dataset.
Maximum” Q3 + 1.5*IQR
Histograms
A histogram is a graphical display of data using bars of different heights. In a histogram, each bar
groups numbers into ranges. Taller bars show that more data falls in that range. A histogram
displays the shape and spread of continuous sample data.
It is a plot that lets you discover, and show, the underlying frequency distribution (shape) of a set
of continuous data. This allows the inspection of the data for its underlying distribution (e.g.,
normal distribution), outliers, skewness, etc. It is an accurate representation of the distribution of
numerical data, it relates only one variable. Includes bin or bucket- the range of values that divide
the entire range of values into a series of intervals and then count how many values fall into each
interval.
Bins are consecutive, non- overlapping intervals of a variable. As the adjacent bins leave no gaps,
the rectangles of histogram touch each other to indicate that the original value is continuous.
Histograms are based on area, not height of bars
In a histogram, the height of the bar does not necessarily indicate how many occurrences of scores
there were within each bin. It is the product of height multiplied by the width of the bin that
indicates the frequency of occurrences within that bin. One of the reasons that the height of the
bars is often incorrectly assessed as indicating the frequency and not the area of the bar is because
a lot of histograms often have equally spaced bars (bins), and under these circumstances, the height
of the bin does reflect the frequency.
Also Read: Machine Learning Interview Questions
Histogram Vs Bar Chart
The major difference is that a histogram is only used to plot the frequency of score occurrences in
a continuous data set that has been divided into classes, called bins. Bar charts, on the other hand,
can be used for a lot of other types of variables including ordinal and nominal data sets.
Heat Maps
A heat map is data analysis software that uses colour the way a bar graph uses height and width:
as a data visualization tool.
If you’re looking at a web page and you want to know which areas get the most attention, a heat
map shows you in a visual way that’s easy to assimilate and make decisions from. It is a graphical
representation of data where the individual values contained in a matrix are represented as colours.
Useful for two purposes: for visualizing correlation tables and for visualizing missing values in
the data. In both cases, the information is conveyed in a two-dimensional table.
Note that heat maps are useful when examining a large number of values, but they are not a
replacement for more precise graphical displays, such as bar charts, because colour differences
cannot be perceived accurately.
Also Read: Top Data Mining Tools
Charts
Line Chart
The simplest technique, a line plot is used to plot the relationship or dependence of one variable
on another. To plot the relationship between the two variables, we can simply call the plot function.
Bar Charts
Bar charts are used for comparing the quantities of different categories or groups. Values of a
category are represented with the help of bars and they can be configured with vertical or horizontal
bars, with the length or height of each bar representing the value.
Pie Chart
It is a circular statistical graph which decides slices to illustrate numerical proportion. Here the arc
length of each slide is proportional to the quantity it represents. As a rule, they are used to compare
the parts of a whole and are most effective when there are limited components and when text and
percentages are included to describe the content. However, they can be difficult to interpret
because the human eye has a hard time estimating areas and comparing visual angles.
Scatter Charts
Another common visualization technique is a scatter plot that is a two-dimensional plot
representing the joint variation of two data items. Each marker (symbols such as dots, squares and
plus signs) represents an observation. The marker position indicates the value for each observation.
When you assign more than two measures, a scatter plot matrix is produced that is a series scatter
plot displaying every possible pairing of the measures that are assigned to the visualization. Scatter
plots are used for examining the relationship, or correlations, between X and Y variables.
Bubble Charts
It is a variation of scatter chart in which the data points are replaced with bubbles, and an additional
dimension of data is represented in the size of the bubbles.
Timeline Charts
Timeline charts illustrate events, in chronological order — for example the progress of a project,
advertising campaign, acquisition process — in whatever unit of time the data was recorded — for
example week, month, year, quarter. It shows the chronological sequence of past or future events
on a timescale.
Tree Maps
A treemap is a visualization that displays hierarchically organized data as a set of nested rectangles,
parent elements being tiled with their child elements. The sizes and colours of rectangles are
proportional to the values of the data points they represent. A leaf node rectangle has an area
proportional to the specified dimension of the data. Depending on the choice, the leaf node is
coloured, sized or both according to chosen attributes. They make efficient use of space, thus
display thousands of items on the screen simultaneously.
Word Clouds and Network Diagrams for Unstructured Data
The variety of big data brings challenges because semi-structured, and unstructured data require
new visualization techniques. A word cloud visual represents the frequency of a word within a
body of text with its relative size in the cloud. This technique is used on unstructured data as a way
to display high- or low-frequency words.
Another visualization technique that can be used for semi-structured or unstructured data is the
network diagram. Network diagrams represent relationships as nodes (individual actors within the
network) and ties (relationships between the individuals). They are used in many applications, for
example for analysis of social networks or mapping product sales across geographic areas.
Learn all about Data Visualization with Power BI with this free course.
FAQs Related to Data Visualization
• What are the techniques of Visualization?
A: The visualization techniques include Pie and Donut Charts, Histogram Plot, Scatter Plot, Kernel
Density Estimation for Non-Parametric Data, Box and Whisker Plot for Large Data, Word Clouds
and Network Diagrams for Unstructured Data, and Correlation Matrices.
• What are the types of visualization?
A: The various types of visualization include Column Chart, Line Graph, Bar Graph, Stacked Bar
Graph, Dual-Axis Chart, Pie Chart, Mekko Chart, Bubble Chart, Scatter Chart, and Bullet Graph.
• What are the various visualization techniques used in data analysis?
A: Various visualization techniques are used in data analysis. A few of them include Box and
Whisker Plot for Large Data, Histogram Plot, and Word Clouds and Network Diagrams for
Unstructured Data, to name a few.
• How do I start visualizing?
A: You need to have a basic understanding of data and present it without misleading the data.
Once you understand it, you can further take up an online course or tutorials.
• What are the two basic types of data visualization?
A: The two very basic types of data visualization are exploration and explanation.
• Which is the best visualization tool?
A: Some of the best visualization tools include Visme, Tableau, Infogram, Whatagraph, Sisense,
DataBox, ChartBlocks, DataWrapper, etc.
These are some of the Visualization techniques used to represent data effectively for their better
understanding and interpretation. We hope this article was useful. You can also upskill with our
free courses on Great Learning Academy.
What is Interactive Data Visualization?
Interactive data visualization refers to the use of modern data analysis software that enables users
to directly manipulate and explore graphical representations of data. Data visualization uses visual
aids to help analysts efficiently and effectively understand the significance of data. Interactive data
visualization software improves upon this concept by incorporating interaction tools that facilitate
the modification of the parameters of a data visualization, enabling the user to see more detail,
create new insights, generate compelling questions, and capture the full value of the data.
5.6 Interactive Data Visualization Techniques
Deciding what the best interactive data visualization will be for your project depends on your end
goal and the data available. Some common data visualization interactions that will help users
explore their data visualizations include:
• Brushing: Brushing is an interaction in which the mouse controls a paintbrush that directly
changes the color of a plot, either by drawing an outline around points or by using the brush
itself as a pointer. Brushing scatterplots can either be persistent, in which the new
appearance is retained once the brush has been removed, or transient, in which changes
only remain visible while the active plot is enclosed or intersected by the brush. Brushing
is typically used when multiple plots are visible and a linking mechanism exists between
the plots.
• Painting: Painting refers to the use of persistent brushing, followed by subsequent
operations such as touring to compare the groups.
• Identification: Identification, also known as label brushing or mouseover, refers to the
automatic appearance of an identifying label when the cursor hovers over a particular plot
element.
• Scaling: Scaling can be used to change a plot’s aspect ratio, revealing different data
features. Scaling is also commonly used to zoom in on dense regions of a scatter plot.
• Linking: Linking connects selected elements on different plots. One-to-one linking entails
the projection of data on two different plots, in which a point in one plot corresponds to
exactly one point in the other. Elements may also be categorical variables, in which all data
values corresponding to that category are highlighted in all the visible plots. Brushing an
area in one plot will brush all cases in the corresponding category on another plot.
How to Create Interactive Data Visualizations
Creating various interactive widgets, bar charts, and plots for data visualization should start with
the three basic attributes of a successful data visualization interaction design - available, accessible,
and actionable. Is there sufficient source data available to meet your data visualization goals? Can
you present this data in an accessible manner so that it is intuitive and comprehensible? Do your
data visualization interactions provide meaningful, actionable insights?
The general framework for an interactive data structure visualization project typically follows
these steps: identify your desired goals, understand the challenges presented by data constraints,
and design a conceptual model in which data can be quickly iterated and reviewed.
With a rough, conceptual model in place, data modeling is leveraged to thoroughly document every
piece of data and related meta-data. This is followed by the design of a user interface and the
development of your design’s core technology, which can be accomplished with a variety of
interactive data visualization tools.
Next it’s time to user test in order to refine compatibility, functionality, security, the user interface,
and performance. Now you are ready to launch to your target audience. Methods for rapid updates
should be built in so that your team can stay up to date with your interactive data visualization.
Some popular libraries for creating your own interactive data visualizations include: Altair, Bokeh,
Celluloid, Matplotlib, nbinteract, Plotly, Pygal, and Seaborn. Libraries are available for Python,
Jupyter, Javascript, and R interactive data visualizations. Scott Murray’s Interactive Data
Visualization for the Web is one of the most popular educational resources for learning how to
create interactive data visualizations.
Benefits of Interactive Data Visualizations
An interactive data visualization allows users to engage with data in ways not possible with static
graphs, such as big data interactive visualizations. Interactivity is the ideal solution for large
amounts of data with complex data stories, providing the ability to identify, isolate, and visualize
information for extended periods of time. Some major benefits of interactive data visualizations
include:
• Identify Trends Faster - The majority of human communication is visual as the human
brain processes graphics magnitudes faster than it does text. Direct manipulation of
analyzed data via familiar metaphors and digestible imagery makes it easy to understand
and act on valuable information.
• Identify Relationships More Effectively - The ability to narrowly focus on specific metrics
enables users to identify otherwise overlooked cause-and-effect relationships throughout
definable timeframes. This is especially useful in identifying how daily operations affect
an organization’s goals.
• Useful Data Storytelling - Humans best understand a data story when its development over
time is presented in a clear, linear fashion. A visual data story in which users can zoom in
and out, highlight relevant information, filter, and change the parameters promotes better
understanding of the data by presenting multiple viewpoints of the data.
• Simplify Complex Data - A large data set with a complex data story may present itself
visually as a chaotic, intertwined hairball. Incorporating filtering and zooming controls can
help untangle and make these messes of data more manageable, and can help users glean
better insights.
Static vs Interactive Data Visualization
A static data visualization is one that does not incorporate any interaction capabilities and does not
change with time, such as an infographic focused on a specific data story from a single viewpoint.
As there are no tools to adjust the final results of static visualizations, such as filtering and zooming
tools in interactive designs, it is essential to give great consideration about what data is being
displayed.
A static visualization is more suited for less complex data stories, building relationships between
concepts, and conveying a predetermined view than encouraging exploration and increasing user
autonomy. Static designs are also significantly less expensive to build than interactive designs.
Deciding whether to build a static or interactive data visualization depends on customer preference,
data story complexity, and ROI.
The graphical depiction of information and data is known as data visualisation. Data visualisation
tools make it easy to view and comprehend trends, outliers, and patterns in data by utilising visual
components like charts, graphs, and maps.
It provides insights on one or more pages or screens to assist you keep track of events or activities
at a glance. Unlike an infographic, which displays a static graphical representation, a dashboard
displays real-time data by extracting complicated data points from massive data sets.
An interactive dashboard allows you to quickly sort, filter, and dive into many sorts of data. Data
science approaches may be used to quickly understand what is occurring, why it is occurring, and
what will occur next.
Different applications of data visualisation
1. Healthcare Industries
A dashboard that visualises a patient's history might aid a current or new doctor in comprehending
a patient's health. It might give faster care facilities based on illness in the event of an emergency.
Instead than sifting through hundreds of pages of information, data visualisation may assist in
finding trends.
Health care is a time-consuming procedure, and the majority of it is spent evaluating prior reports.
By boosting response time, data visualisation provides a superior selling point. It gives matrices
that make analysis easier, resulting in a faster reaction time.(From)
2. Business intelligence
When compared to local options, cloud connection can provide the cost-effective “heavy lifting”
of processor-intensive analytics, allowing users to see bigger volumes of data from numerous
sources to help speed up decision-making.
Because such systems can be diverse, comprised of multiple components, and may use their own
data storage and interfaces for access to stored data, additional integrated tools, such as those
geared toward business intelligence (BI), help provide a cohesive view of an organization's entire
data system (e.g., web services, databases, historians, etc.).
Multiple datasets can be correlated using analytics/BI tools, which allow for searches using a
common set of filters and/or parameters. The acquired data may then be displayed in a standardised
manner using these technologies, giving logical "shaping" and better comparison grounds for end
users.
3. Military
It's a matter of life and death for the military; having clarity of actionable data is critical, and
taking the appropriate action requires having clarity of data to pull out actionable insights.
The adversary is present in the field today, as well as posing a danger through digital warfare
and cybersecurity. It is critical to collect data from a variety of sources, both organised and
unstructured. The volume of data is enormous, and data visualisation technologies are essential for
rapid delivery of accurate information in the most condensed form feasible. A greater grasp of past
data allows for more accurate forecasting.
Dynamic Data Visualization aids in a better knowledge of geography and climate, resulting in a
more effective approach. The cost of military equipment and tools is extremely significant; with
bar and pie charts, analysing current inventories and making purchases as needed is simple.
4. Finance Industries
For exploring/explaining data of linked customers, understanding consumer behaviour, having a
clear flow of information, the efficiency of decision making, and so on, data visualisation tools are
becoming a requirement for financial sectors.
For associated organisations and businesses, data visualisation aids in the creation of patterns,
which aids in better investment strategy. For improved business prospects, data visualisation
emphasises the most recent trends.
5. Data science
Data scientists generally create visualisations for their personal use or to communicate
information to a small group of people. Visualization libraries for the specified programming
languages and tools are used to create the visual representations.
Open source programming languages, such as Python, and proprietary tools built for
complicated data analysis are commonly used by data scientists and academics. These data
scientists and researchers use data visualisation to better comprehend data sets and spot patterns
and trends that might otherwise go undiscovered.
6. Marketing
In marketing analytics, data visualisation is a boon. We may use visuals and reports to analyse
various patterns and trends analysis, such as sales analysis, market research analysis, customer
analysis, defect analysis, cost analysis, and forecasting. These studies serve as a foundation for
marketing and sales.
Visual aids can assist your audience grasp your main message by visually engaging them and
visually engaging them. The major advantage of visualising data is that it can communicate a point
faster than a boring spreadsheet.
In b2b firms, data-driven yearly reports and presentations don't fulfil the needs of people who are
seeing the information. They are unable to grasp the art of engaging with their audience in a
meaningful or memorable manner. Your audience will be more interested in your facts if you
present them as visual statistics, and you will be more inclined to act on your discoveries.
7. Food delivery apps
When you place an order for food on your phone, it is given to the nearest delivery person. There
is a lot of math involved here, such as the distance between the delivery executive's present position
and the restaurant, as well as the time it takes to get to the customer's location.
Customer orders, delivery location, GPS service, tweets, social media messages, verbal
comments, pictures, videos, reviews, comparative analyses, blogs, and updates have all become
common ways of data transmission.
Users may obtain data on average wait times, delivery experiences, other records, customer
service, meal taste, menu options, loyalty and reward point programmes, and product stock and
inventory data with the help of the data.
8. Real estate business
Brokers and agents seldom have the time to undertake in-depth research and analysis on their
own. Showing a buyer or seller comparable home prices in their neighbourhood on a map,
illustrating average time on the market, creating a sense of urgency among prospective buyers and
managing sellers' expectations, and attracting viewers to your social media sites are all examples
of common data visualisation applications.