0% found this document useful (0 votes)
19 views30 pages

DA Unit 4

Uploaded by

Ayan Shaikh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views30 pages

DA Unit 4

Uploaded by

Ayan Shaikh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

Unit 4

Data Visualization
Graphical representation of data
• Graphical Representation of Data,” where numbers and facts become lively
pictures and colorful diagrams. Instead of staring at boring lists of numbers, we
use fun charts, cool graphs, and interesting visuals to understand information
better. In this exciting concept of data visualization, we’ll learn about different
kinds of graphs, charts, and pictures that help us see patterns and stories hidden in
data.
• There is an entire branch in mathematics dedicated to dealing with collecting,
analyzing, interpreting, and presenting numerical data in visual form in such a
way that it becomes easy to understand and the data becomes easy to compare as
well, the branch is known as Statistics.
• The branch is widely spread and has a plethora of real-life applications such
as Business Analytics, demography, Astro statistics, and so on.
• Graphics Representation is a way of representing any data in
picturized form. It helps a reader to understand the large set of data very
easily as it gives us various data patterns in visualized form.
• There are two ways of representing data,
1. Table
2. Pictorial Representation through graphs.
• They say, “A picture is worth a thousand words”. It’s always better to
represent data in a graphical format. Even in Practical Evidence and
Surveys, scientists have found that the restoration and understanding of any
information is better when it is available in the form of visuals as Human
beings process data better in visual form than any other form.
• Does it increase the ability 2 times or 3 times? The answer is it increases
the Power of understanding 60,000 times for a normal Human being,
the fact is amusing and true at the same time.
Characteristics and charts for effective
graphical display
• Clear purpose: The display should have a clear purpose, such as description,
exploration, tabulation, or decoration.
• Data integrity: The display should avoid distorting the data.
• Data presentation: The display should present data in a way that encourages
the viewer to think about the data rather than the design or technology.
• Data comparison: The display should encourage the viewer to compare
different pieces of data.
• Data detail: The display should reveal the data at multiple levels of detail.
• Data integration: The display should be closely integrated with the statistical
and verbal descriptions of the data.
Chart Types
• Single Variable
• Two Variable
• More than two variables
Dot Plot
• A dot plot, or dot chart, is a relatively simple but at the same time highly
efficient graphic form that can be used for displaying and analyzing data.
One of the easiest means of representing data is the use of a dot plot, which
provides the reader with a simple scale on which the data is represented
using only dots that represent a single or multiple data points.
• This type of chart is useful for smaller to medium-sized data because it is
easy for a reader to visualize patterns, groups, holes in data, and outliers.
Dot plots are mainly used for studying data distribution in statistics,
education, business, and so forth.
• Because dot plots center on individual pieces of data in terms of frequency
and spread, they are extremely helpful for making sense of data in the
preliminary stages and for communications.
• The dot plot below is used to show how each student scored his or her in
class essay in Mr. Jhonson’s class. Each group represents a different student.
How do you know the lowest essay score achieved by a student and the
highest number of essay score achieved by a student?

 Here for easier imaging of the data dot plot feature was used which displays the data of the
number of students who received scores for essays on a 6-point scale.
 The lowest grade that was attained in a given essay is 2.
 There are four students who got 3 and they are the majority leaving one with 2.
 Thus, the lowest score in the marking of the minimum essay look is 2, and 3 – the highest number
of marks earned by the students.
Jitter Plot
A jitter plot is a data visualization technique that displays the distribution of data
points by plotting them as dots along an axis:
Purpose: Jitter plots are used to visualize the relationship between a categorical
variable and a measurement variable. They are particularly useful for small datasets
and for showing the distribution of values when data points are clustered together.
How it works: Jitter plots are similar to scatter plots, but the dots are randomly
shifted along the other axis to avoid overlap. This allows the viewer to see more
data points without losing clarity.
Features: Jitter plots can be displayed horizontally or vertically. The shape, size,
and color of the dots can be customized. To distinguish categories, the dots can be
color coded or have their opacity reduced.
Comparison: Several jitter plots can be placed side by side to compare the
distribution of data points across different categories or ranges.
• A jitter plot is a variant of the strip plot with a better view of overlapping data
points, used to visualise the distribution of many individual one-dimensional
values. The values are plotted as dots along one axis, and the dots are then shifted
randomly along the other axis, which has no meaning in itself data-wise, allowing
the dots not to overlap. Typically, several jitter plots are placed side by side to
compare the distributions of data points among several values, categories or
ranges.
• Another way to avoid overlap in a strip plot is to make a counts plot, and other
ways to visualise similar data include violin plots and boxplots.
Error Bar Graph
• Error bars function used as graphical enhancement that visualizes the variability
of the plotted data on a Cartesian graph. Error bars can be applied to graphs to
provide an additional layer of detail on the presented data. They can be added to
many types of charts, including bar, column, line, xy (scatter), and bubble charts.
• Error bars help you indicate estimated error or uncertainty to give a general sense of
how precise a measurement is this is done through the use of markers drawn over
the original graph and its data points.
• To visualize this information error bars work by drawing lines that extend from the
center of the plotted data point or edge with bar charts the length of an error bar
helps to reveal uncertainty of a data point as shown in the below graph.
• A short error bar shows that values are concentrated signalling that the plotted
averaged value is more likely while a long error bar would indicate that the values
are more spread out and less reliable.
• Also depending on the type of data. the length of each pair of error bars tends to be
of equal length on both sides, however, if the data is skewed then the lengths on
each side would be unbalanced.
• Error bars always run parallel to a quantity of scale axis so they can be displayed
either vertically or horizontally depending on whether the quantitative scale is on
the y-axis or x-axis if there are two quantity of scales and two pairs of arrow bars
can be used for both axes.
Box and Whisker Plot
• Box and Whisker Plot is defined as a visual representation of the five-point
summary. The Box and Whisker Plot is also called as Box Plot. It consists of a
rectangular “box” and two “whiskers.” Box and Whisker Plot contains the following
parts:
• Box: The box in the plot spans from the first quartile (Q1) to the third quartile (Q3).
This box contains the middle 50% of the data and represents the interquartile range
(IQR). The width of the box provides insights into the data’s spread.
• Whiskers: The whiskers extend from the minimum value to Q1 and from Q3 to the
maximum value. They signify the range of the data, excluding potential outliers. The
whiskers can vary in length, indicating the data’s skewness or symmetry.
• Median Line: A line within the box represents the median (Q2). It divides the data
into two halves, revealing the central tendency.
• Outliers: Individual data points lying beyond the whiskers are considered outliers
and are often plotted as individual points.
What is a Five-Point Summary?
The five-point summary rundown comprises five key measurements: the base worth,
the principal quartile (Q1), the middle (Q2), the third quartile (Q3), and the greatest
worth. These measurements partition a dataset into four similarly estimated parts,
uncovering important data about the dataset’s focal inclination, spread, and
skewness.
Use of Box Plot:
1. Imagining Information Dispersion
2. Contrasting Distributions
3. Estimating Skewness
4. Information Investigation
5. Statistical Analysis
6. Quality Control
7. Navigation
8. Risk Appraisal
9. General Wellbeing and Epidemiology
10. Ecological Science
When to Use Box and Whisker Plot
1. Comparing Scores
2. Analysing Worker Compensations
3. Evaluating Product Quality
4. Distinguishing Anomalies in Financial Data
5. Comparing Patient Recuperation Times
6. Assessing Marketing Campaigns
7. Observing Air Quality
8. Assessing Investment Portfolios
9. Comparing Housing Prices
10. Breaking down Crime Percentages
Histogram Plot
• Histogram is a graphical representation used in
statistics to show the distribution of numerical
data. It looks somewhat like a bar chart, but with
key differences that make it suitable for showing
how data is distributed across continuous intervals
or specific categories that are considered “bins”.
Unlike bar graphs, which are used for categorical
data, histograms are designed for continuous data,
grouping it into logical ranges or “bins.”
• A histogram is similar to a bar graph. The basic
difference between the two is that bar charts
correlate a value with a single category or discrete
variable, whereas histograms visualize
frequencies for continuous variables.
When to Use Histogram?
Histogram graphs are utilized under various scenarios and some of them are,
• When you have numbers as data.
• To understand how your data is distributed, especially whether it follows a typical
pattern.
• To determine if a process satisfies consumer needs.
• Analyze the results of a supplier’s procedure.
• Compare changes in a process over time.
• To compare the results of several processes.
• When you want to quickly and clearly show people how your data is distributed.
Difference between Bar Graph And Histogram

Feature Bar Graph Histogram

Used to show the distribution of continuous data over


Purpose Used to show comparisons among discrete categories.
intervals.

Data Type Categorical or discrete. Continuous, but binned into discrete intervals.

Orientation Bars can be oriented horizontally or vertically. Bars are typically vertical.

No space between bars (except for gaps indicating no


Spacing Between Bars Spaces between bars to indicate that categories are distinct.
data for a bin) to signify continuous data range.

Order of Bars Can be arranged in any order, often sorted by frequency. Arranged in ascending order of the variable.

Represents the intervals or “bins” of the continuous


X-axis Represents different categories.
data.

Represents the value (count, percentage, etc.) for each Represents the frequency or count of data points within
Y-axis
category. each bin.

Comparing population sizes in different cities, showing Showing the distribution of exam scores, ages of
Use Cases
sales by product category. participants in a study.
Bar Chart
• Bar Graph in Maths: A bar chart displays categorical data using rectangular bars
whose heights or lengths correspond to the values they represent. These bars can
be arranged vertically or horizontally. When plotted vertically, the bar chart is
often referred to as a column chart.
• A bar graph is a visual representation of data using rectangular bars. The bars can
be vertical or horizontal, and their lengths are proportional to the data they
represent. Bar graphs are also known as bar charts or bar diagrams. Bar graphs
can compare items or show how something changes over time.
Bar graph is a visual representation of data in statistics that uses bars to compare
different categories or groups. Each bar in a bar graph represents a category or
group, and the length or height of the bar corresponds to the value or frequency of
that category.
Uses of Bar Graph / Applications of Bar Graphs in Real-Life
Some of the most important applications of Bar Graph are:
• In education, they help students visualize and understand numerical data.
• Bar graphs are handy for businesses. They help with financial analysis, market
research, and presenting data like quarterly sales, customer demographics, or
product comparisons.
• In science, bar graphs are used to display and compare
data from experiments or research studies.
• They effectively present survey results, including
responses to different questions or options.
• They are used to display performance metrics
in various fields, including sports, education, and business.
• Governments and organizations use bar graphs
to report on data like population statistics, environmental data, etc.
Scatter Plot
• Scatter plot is one of the most important data visualization techniques and it is
considered one of the Seven Basic Tools of Quality. A scatter plot is used to plot
the relationship between two variables, on a two-dimensional graph that is known
as Cartesian Plane on mathematical grounds.
• It is generally used to plot the relationship between one independent variable and
one dependent variable, where an independent variable is plotted on the x-axis and
a dependent variable is plotted on the y-axis so that you can visualize the effect of
the independent variable on the dependent variable. These plots are known as
Scatter Plot Graph or Scatter Diagram.
Applications of Scatter Plot
• As already mentioned, a scatter plot is a very useful data visualization technique.
A few applications of Scatter Plots are listed below.
• Correlation Analysis: Scatter plot is useful in the investigation of the correlation
between two different variables. It can be used to find out whether two variables
have a positive correlation, negative correlation or no correlation.
• Outlier Detection: Outliers are data points, which are different from the rest of
the data set. A Scatter Plot is used to bring out these outliers on the surface.
• Cluster Identification: In some cases, scatter plots can help identify clusters or
groups within the data.
Line Plot
• A line Graph is nothing but a way to represent two or more variables in the
form of line or curves to visualize the concept and helps to understand it in a
better form. It displays the data that changes continuously concerning time. In a
line graph data points are connected with an edge and data points are represented
either with points.
Log Log Plot
• Log–log plots are often use for visualizing log-log linear regression models with
(roughly) log-normal, or Log-logistic, errors. In such models, after log-
transforming the dependent and independent variables, a Simple linear
regression model can be fitted, with the errors becoming homoscedastic.
• This model is useful when dealing with data that exhibits exponential growth or
decay, while the errors continue to grow as the independent value grows
Stacked Plot
• A stacked plot is a type of graph that shows multiple variables stacked vertically
on top of each other, with a common x-axis. Stacked plots can be used to show
how variables change over time, or to visualize the composition of a whole over
time.
• How to create a stacked plot: In MATLAB, you can use the stackedplot function
to plot variables from a table or timetable in a stacked plot. In Python, you can use
the stackplot() function from matplotlib to create a stacked area plot.
• How to customize a stacked plot: You can customize the properties of the lines
and axes in a stacked plot. For example, you can set the same line and axis
properties for all plots, or set different properties for individual plots.
• When to use a stacked plot: Stacked plots can be useful for highlighting changes
in contribution over time. For example, you can use a stacked plot to show how
the relative contribution of different methods changes over time.
Parallel Coordinate Plot
• A parallel coordinates plot is a graphical method for visualizing multivariate data
by plotting data points as lines across a series of parallel axes:
Parallel coordinates plot:
• Purpose: Visualizes relationships between multiple variables in high-dimensional
datasets
• How it works: Each variable has its own axis, and data points are plotted as lines
across each axis
• What it shows: Trends, variations, and relationships that might be hidden in raw
data
• When it's used: In academic and scientific communities, and for data analytics
• The primary advantage of a parallel coordinate plot lies in its capacity to handle
the visualization of multivariate data. It may seem complicated, but this is because
each observation possesses several attributes.
• parallel coordinate plots find applications in various domains, including data
analysis, engineering, scientific research, and business intelligence. They are
particularly useful when exploring the relationships between data points with
multiple dimensions.

You might also like