Ls 5 - IMP
Ls 5 - IMP
Visualization can be thought of as the “front end” of big data. There are following data
visualization myths [4]:
•All data must be visualized: It is important not to overly rely on visualization; some data
does not need visualization methods to uncover its messages.
•Only good data should be visualized: A simple and quick visualization can highlight
something wrong with data just as it helps uncover interesting trends.
• Visualization will always manifest the right decision or action: Visualization cannot
replace critical thinking.
•Visualization will lead to certainty: Data is visualized doesn’t mean it shows an accurate
picture of what is important. Visualization can be manipulated with different effects.
Visualization approaches are used to create tables, diagrams, images, and other intuitive
display ways to represent data. Big Data visualization is not as easy as traditional small
data sets. The extension of traditional visualization approaches have already been emerged
but far from enough. In large-scale data visualization, many researchers use feature
extraction and geometric modeling to greatly reduce data size before actual data rendering.
Choosing proper data representation is also very important when visualizing big data [5].
The goal and the objectives of this chapter are to present new methods and advances of Big
Data visualization through introducing conventional visualization methods and the
extension of some them to handling big data, discussing the challenges of big data
visualization, and analyzing technology progress in big data visualization.
In this study, authors first searched for papers that are related to data visualization and were
published in recent years through the university library system. At this stage, authors
mainly summarized traditional data visualization methods and new progress in this area.
Next, authors searched for papers that are related to big data visualization. Most of these
papers were published in the past three years because big data is a newer area. At this stage,
authors found that most conventional data visualization methods do not apply to big data.
The extension of some conventional visualization approaches to handling big data is far
from enough in functions. The authors focused on big data visualization challenges as well
as new methods, technology progress, and developed tools for big data visualization.
Scalability and dynamics are two major challenges in visual analytics. Table 2 shows the
research status for static data and dynamic data according to the data size. For big dynamic
data, solutions for type A problems or type B problems often do not work for A and B
problems [9].
Table 2. The research status and challenge of visual analytics
The visualization-based methods take the challenges presented by the “four Vs” of big data
and turn them into following opportunities [2].
• Volume: The methods are developed to work with an immense number of datasets and
enable to derive meaning from large volumes of data.
•Variety: The methods are developed to combine as many data sources as needed.
•Velocity: With the methods, businesses can replace batch processing with real-time stream
processing.
•Value: The methods not only enable users to create attractive infographics and heatmaps,
but also create business value by gaining insights from big data.
Visualization of big data with diversity and heterogeneity (structured, semi-structured, and
unstructured) is a big problem. Speed is the desired factor for the big data analysis.
Designing a new visualization tool with efficient indexing is not easy in big data. Cloud
computing and advanced graphical user interface can be merged with the big data for the
better management of big data scalability [3].
Visualization systems must contend with unstructured data forms such as graphs, tables,
text, trees, and other metadata. Big data often has unstructured formats. Due to bandwidth
limitations and power requirements, visualization should move closer to the data to extract
meaningful information efficiently. Visualization software should be run in an in situ
manner. Because of the big data size, the need for massive parallelization is a challenge in
visualization. The challenge in parallel visualization algorithms is decomposing a problem
into independent tasks that can be run concurrently [10].
Effective data visualization is a key part of the discovery process in the era of big data. For
the challenges of high complexity and high dimensionality in big data, there are different
dimensionality reduction methods. However, they may not always be applicable. The more
dimensions are visualized effectively, the higher are the chances of recognizing potentially
interesting patterns, correlations, or outliers [11].
[12]
There are also following problems for big data visualization :
• Visual noise: Most of the objects in dataset are too relative to each other. Users cannot
divide them as separate objects on the screen.
• Information loss: Reduction of visible data sets can be used, but leads to information
loss.
• Large image perception: Data visualization methods are not only limited by aspect ratio
and resolution of device, but also by physical perception limits.
• High rate of image change: Users observe data and cannot react to the number of data
change or its intensity on display.
• High performance requirements: It can be hardly noticed in static visualization because
of lower visualization speed requirements--high performance requirement.
Perceptual and interactive scalability are also challenges of big data visualization.
Visualizing every data point can lead to over-plotting and may overwhelm users’ perceptual
and cognitive capacities; reducing the data through sampling or filtering can elide
interesting structures or outliers. Querying large data stores can result in high latency,
disrupting fluent interaction [13].
In Big Data applications, it is difficult to conduct data visualization because of the large
size and high dimension of big data. Most of current Big Data visualization tools have poor
performances in scalability, functionalities, and response time. Uncertainty can result in a
great challenge to effective uncertainty-aware visualization and arise during a visual
analytics process [5].
Potential solutions to some challenges or problems about visualization and big data
were presented [14]:
1. Meeting the need for speed: One possible solution is hardware. Increased memory and
powerful parallel processing can be used. Another method is putting data in-memory but
using a grid computing approach, where many machines are used.
2. Understanding the data: One solution is to have the proper domain expertise in place.
3. Addressing data quality: It is necessary to ensure the data is clean through the process of
data governance or information management.
4. Displaying meaningful results: One way is to cluster data into a higher-level view where
smaller groups of data are visible and the data can be effectively visualized.
5. Dealing with outliers: Possible solutions are to remove the outliers from the data or
create a separate chart for the outliers.
Big Data Visualization Tools.
2. Kibana
Kibana is an open-source log analysis and time series analysis information visualization and
exploring device for the surveillance of applications and operational intelligence instances. It
provides strong and easy-to-use characteristics like histograms, diagrams, pie charts, thermal
maps and integrated geospatial assistance. In addition, it ensures close inclusion with the
famous analytics and search engine Elasticsearch, which makes Kibana the main option for
viewing the information saved in Elasticsearch.
Kibana has been intended with Elasticsearch to render large and complicated information
flows understandable by visual depiction more rapidly and smoothly. Elasticsearch analytics
provide both information and improved aggregation mathematical transformations. The
application produces a versatile, vibrant dashboard with PDF records on request or on
timetable. The generated documents can depict information with customisable colors and
highlighted search outcomes in the form of bar, row, scatter plot and paste graph sizes.
Kibana also involves visualized data sharing instruments.
3. Grafana
Grafana is a metrics & visualizing package of open source analysis. It is used most frequently
for moment serial data visualization for infrastructure and implementation analysis, but many
use it in other areas including agricultural equipment, domestic automation, climate, and
process control.
Grafana is a temporary information sequence display instrument. A graphical description can
be obtained from a lot of gathered information of the position of a business or organisation.
How are they doing it? The collaborative editing of Wikidata, an extensive database of
information, that increasingly builds papers in Wikipedia, utilizes the grafana.wikimedia.org
to demonstrate openly (in our situation we do so on a regular basis) the publishings
conducted out by associates and computers, in a certain span of moment produced and edited’
websites,’ or information sheets:
4. Tableau
Tableau has been utilized in the business intelligence industry as a strong and rapidly
increasing information vision instrument. It makes it readily understandable to simplify raw
information.
Data analysis with Tableau is very quick and the visualizations are in the shape of dashboards
and tablets. The information produced using Tableau can be comprehended at every stage in
an organisation by the specialist. It even enables a non-technical user a personalized
dashboard to be created.
The best feature Tableau are
Data Blending
Real-time analysis
Collaboration of data
Tableau software is fantastic because it does not require any technical or programming
abilities to function. The instrument has attracted individuals from all sectors, such as
company, scientists, various industries, etc.
Types of Data Visualization.(Learn any 8 minimum)
The various types of visualization include Column Chart, Line Graph, Bar Graph, Stacked
Bar Graph, Dual-Axis Chart, Pie Chart, Mekko Chart, Bubble Chart, Scatter Chart, and Bullet
Graph.
1. Column Chart
A column chart is used to show a comparison among different items, or it can show a
comparison of items over time. You could use this format to see the revenue per landing page
or customers by close date.
2. Bar Graph
A bar graph, basically a horizontal column chart, should be used to avoid clutter when one
data label is long or if you have more than 10 items to compare. This type of visualization
can also be used to display negative numbers.
3. Line Graph
A line graph reveals trends or progress over time and can be used to show many different
categories of data. You should use it when you chart a continuous data set.
4. Dual Axis Chart
A dual axis chart allows you to plot data using two y-axes and a shared x-axis. It's used with
three data sets, one of which is based on a continuous set of data and another which is better
suited to being grouped by category. This should be used to visualize a correlation or the lack
thereof between these three data sets.
5. Area Chart
An area chart is basically a line chart, but the space between the x-axis and the line is filled
with a color or pattern. It is useful for showing part-to-whole relations, such as showing
individual sales reps' contribution to total sales for a year. It helps you analyze both overall
and individual trend information.
This should be used to compare many different items and show the composition of each item
being compared.
7. Mekko Chart
Also known as a marimekko chart, this type of graph can compare values, measure each one's
composition, and show how your data is distributed across each one.
It's similar to a stacked bar, except the mekko's x-axis is used to capture another dimension of
your values -- rather than time progression, like column charts often do. In the graphic below,
the x-axis compares each city to one another.
8. Pie Chart
A pie chart shows a static number and how categories represent part of a whole -- the
composition of something. A pie chart represents numbers in percentages, and the total sum
of all segments needs to equal 100%.
9. Scatter Plot Chart
A scatter plot or scattergram chart will show the relationship between two different variables
or it can reveal the distribution trends. It should be used when there are many different data
points, and you want to highlight similarities in the data set. This is useful when looking for
outliers or for understanding the distribution of your data.
A bubble chart is similar to a scatter plot in that it can show distribution or relationship. There
is a third data set, which is indicated by the size of the bubble or circle.
11. Waterfall Chart
A waterfall chart should be used to show how an initial value is affected by intermediate
values -- either positive or negative -- and resulted in a final value. This should be used to
reveal the composition of a number. An example of this would be to showcase how overall
company revenue is influenced by different departments and leads to a specific profit
number.
A funnel chart shows a series of steps and the completion rate for each step. This can be used
to track the sales process or the conversion rate across a series of pages or steps.
13. Bullet Graph
A bullet graph reveals progress toward a goal, compares this to another measure, and
provides context in the form of a rating or performance.
A heat map shows the relationship between two items and provides rating information, such
as high to low or poor to excellent. The rating information is displayed using varying colors
or saturation.
Analytical techniques in Big Data Visualization
1. Word Clouds
Word clouds work easy: the larger and bolder the word is in the term cloud the more a
particular word is displayed in a source of text information (such as a lecture, newspaper post
or database).
Here is an instance of USA Today using the United States. State of Union Speech 2012 by
President Barack Obama:
As you can see, words like “American,” “jobs,” “energy” and “every” stand out since they
were used more frequently in the original text.
Now, compare that to the 2014 State of the Union address:
You can easily see the similarities and differences between the two speeches at a glance.
“America” and “Americans” are still major words, but “help,” “work,” and “new” are more
prominent than in 2012.
2. Symbol Maps
Symbol maps are merely maps shown over a certain length and latitude. You can rapidly
create a strong visual with the “Marks” card at Tableau, which tells customers of their place
information. You can also use the information to manage the form of the label on the map
using the illustration in the Pie chart or forms for a different degree of detail.
These maps can be as simple or as complex as you need them to be
3. Line charts
Alternatively known as a row graph, a row graph is a graph of the information shown using a
number of rows. Line diagrams show rows horizontally through the diagram, with the scores
axis on the left hand of the diagram. An instance of a line chart displaying distinctive
Computer Hope travelers can be seen in the image below.
As can be seen in this example, you can easily see the increases and decreases each year over
different years.
4. Pie charts
A diagram is a circular diagram, split into sections like wedges, which shows the amount.
The complete valuation of each coin is 100% and is a proportional portion of the whole.
The portion size can easily be understood on a look at pie charts. They are commonly used to
demonstrate the proportion of expenditure, population sections or study responses across a
big number of classifications.
5. Bar Charts
A bar graph is a visual instrument which utilizes bars to match information between cities.
bars are also called a bar chart or bar diagram. A bar chart can be executed horizontally or
vertically. What we need to understand is that the longer the bar is, the more valuable it is.
Two axes are the bar graphs. The horizontal axis (or x-axis) is shown on a graph of the
vertical bar, as shown above. They are years in this instance. The vertical axis is the
magnitude. The information sequence is the colored rows.
Bar charts have three main attributes:
A bar character allows for a simple comparison of information sets among distinct
organizations.
The graph shows classes on one axis and on the other a separate value. The objective is to
demonstrate the connection between the two axes.
Bar diagrams can also display over moment large information modifications.
6. Heat Maps
A heat map represents information that are displayed two-dimensionally by color values. An
instant visual overview of the data is provided by a straightforward heat chart.
There can be numerous methods to show thermal maps, but they all share one thing in
common: to transmit interactions between information values in a tablet, they use a color that
would be much difficult to comprehend.
Techniques for Visual Data Representation.
1. Pie Chart
Pie charts are one of the most common and basic data visualization techniques, used across a
wide range of applications. Pie charts are ideal for illustrating proportions, or part-to-whole
comparisons.
Because pie charts are relatively simple and easy to read, they’re best suited for audiences
who might be unfamiliar with the information or are only interested in the key takeaways. For
viewers who require a more thorough explanation of the data, pie charts fall short in their
ability to display complex information.
2. Bar Chart
The classic bar chart, or bar graph, is another common and easy-to-use method of data
visualization. In this type of visualization, one axis of the chart shows the categories being
compared, and the other, a measured value. The length of the bar indicates how each group
measures according to the value.
One drawback is that labeling and clarity can become problematic when there are too many
categories included. Like pie charts, they can also be too simple for more complex data sets.
3. Line Chart
The simplest technique, a line plot is used to plot the relationship or dependence of one
variable on another. To plot the relationship between the two variables, we can simply
call the plot function.
4. Histogram
Unlike bar charts, histograms illustrate the distribution of data over a continuous interval or
defined period. These visualizations are helpful in identifying where values are concentrated,
as well as where there are gaps or unusual values.
Histograms are especially useful for showing the frequency of a particular occurrence. For
instance, if you’d like to show how many clicks your website received each day over the last
week, you can use a histogram. From this visualization, you can quickly determine which
days your website saw the greatest and fewest number of clicks.
5. Scatter Plot
Another technique commonly used to display data is a scatter plot. A scatter plot displays
data for two variables as represented by points plotted against the horizontal and vertical axis.
This type of data visualization is useful in illustrating the relationships that exist between
variables and can be used to identify trends or correlations in data.
Scatter plots are most effective for fairly large data sets, since it’s often easier to identify
trends when there are more data points present. Additionally, the closer the data points are
grouped together, the stronger the correlation or trend tends to be.