PA-NOTE-6 Data Visualization (different types of chart)
PA-NOTE-6 Data Visualization (different types of chart)
a)
Figure a represents a bell-shaped distribution, which has a single peak and tapers
off to both the left and to the right of the peak. The shape appears to be symmetric
about the center of the histogram. The single peak indicates that the distribution
is unimodal. The highest peak of the histogram represents the location of the mode
of the data set. The mode is the data value that occurs the most often in a data set.
For a symmetric histogram, the values of the mean, median, and mode are all the
same and are all located at the center of the distribution.
b)
Figure b represents a distribution that is approximately uniform and forms a
rectangular, flat shape. The frequency of each class is approximately the same.
c)
Figure c represents a right-skewed distribution, which has a peak to the left of the
distribution and data values that taper off to the right. This distribution has a single
peak and is also unimodal. For a histogram that is skewed to the right, the mean is
located to the right on the distribution and is the largest value of the measures of
central tendency. The mean has the largest value because it is strongly affected by
the outliers on the right tail that pull the mean to the right. The mode is the smallest
value, and it is located to the left on the distribution. The mode always occurs at the
highest point of the peak. The median is located between the mode and the mean.
d)
Figure d represents a left-skewed distribution, which has a peak to the right of the
distribution and data values that taper off to the left. This distribution has a single
peak and is also unimodal. For a histogram that is skewed to the left, the mean is
located to the left on the distribution and is the smallest value of the measures of
central tendency. The mean has the smallest value because it is strongly affected by
the outliers on the left tail that pull the mean to the left. The median is located
between the mode and the mean.
e)
Figure e has no shape that can be defined. The only defining characteristic about
this distribution is that it has 2 peaks of the same height. This means that the
distribution is bimodal.
While there are similarities between a bar graph and a histogram, such as each bar
being the same width, a histogram has no spaces between the bars. The quantitative
data is grouped according to a determined bin size, or interval. The bin size refers to
the width of each bar, and the data is placed in the appropriate bin.
The bins, or groups of data, are plotted on the x-axis, and the frequencies of the
bins are plotted on the y-axis. A grouped frequency distribution is constructed for
the numerical data, and this table is used to create the histogram. In most cases, the
grouped frequency distribution is designed so there are no breaks in the intervals.
The last value of one bin is actually the first value counted in the next bin. This
means that if you had groups of data with a bin size of 10, the bins would be
represented by the notation [0-10), [10-20), [20-30), etc. Each bin appears to contain
11 values, which is 1 more than the desired bin size of 10. Therefore, the last digit of
each bin is counted as the first digit of the following bin.
The first bin includes the values 0 through 9, and the next bin includes the values 9
through 19. This makes the bins the proper size. Bin sizes are written in this manner
to simplify the process of grouping the data. The first bin can begin with the smallest
number of the data set and end with the value determined by adding the bin width to
this value, or the bin can begin with a reasonable value that is smaller than the
smallest data value.
Different Types of Graphs and Charts for Presenting Data
To better understand each chart and how they can be used, here's an overview of
each type of chart.
1. Column Chart
A column chart is used to show a comparison among different items, or it can show a
comparison of items over time. You could use this format to see the revenue per
landing page or customers by close date.
Design Best Practices for Column Charts:
A bar graph, basically a horizontal column chart, should be used to avoid clutter
when one data label is long or if you have more than 10 items to compare. This type
of visualization can also be used to display negative numbers.
Design Best Practices for Bar Graphs:
A line graph reveals trends or progress over time and can be used to show many
different categories of data. You should use it when you chart a continuous data set.
Design Best Practices for Line Graphs:
A dual axis chart allows you to plot data using two y-axes and a shared x-axis. It's
used with three data sets, one of which is based on a continuous set of data and
another which is better suited to being grouped by category. This should be used to
visualize a correlation or the lack thereof between these three data sets.
Design Best Practices for Dual Axis Charts:
An area chart is basically a line chart, but the space between the x-axis and the line
is filled with a color or pattern. It is useful for showing part-to-whole relations, such as
showing individual sales reps' contribution to total sales for a year. It helps you
analyze both overall and individual trend information.
Design Best Practices for Area Charts:
A pie chart shows a static number and how categories represent part of a whole --
the composition of something. A pie chart represents numbers in percentages, and
the total sum of all segments needs to equal 100%.
Design Best Practices for Pie Charts:
A scatter plot or scattergram chart will show the relationship between two different
variables or it can reveal the distribution trends. It should be used when there are
many different data points, and you want to highlight similarities in the data set. This
is useful when looking for outliers or for understanding the distribution of your data.