DMA 2 Dot&ScatterPlots, Exponential&FreqDistributionGraphs
DMA 2 Dot&ScatterPlots, Exponential&FreqDistributionGraphs
Cleveland dot plot was created by William Cleveland. The Cleveland dot plot is similar to the usual bar chart and is
a good alternative as a statistical tool especially if you have a few numbers of items. This type of dot plot is easier
to read even if many more values are plotted in the same amount of space. It is slightly different than the bar chart
as it uses location to represent the number of items instead of the duration of the data encoding value. The
position of the dots in the graph represents a number of items of the data. The Cleveland dot plot is useful when
using multiple variables, as it does not require the axis to start at zero, allowing for the use of a long axis.
Dot Plot vs Scatter plot
Dot Plots vs Histogram
Dot plots are usually more preferred than histograms because the histogram by definition is a chart that groups the
data into classes and represents information of those frequencies. On the other hand, a dot plot represents each
observation on a number line. Moreover, for a small dataset dot plot is considered more appropriate than the
histogram. Also in histograms, the identity of each observation is lost as the data is grouped into classes and then
plotted while in the case of dot plot the identity of individual observation is not lost.
Advantages and Disadvantages of Dot Plots
Dot plots are essentially useful for small sets of data. They offer a visual means of comparison and don’t need
frequency of the data to be put into the table. However, a disadvantage of the dot plot is that you need to count the
total number of data elements in each stack of dots and at times it is difficult to construct and interpret data set for
many points.
Use dot plots to do the following:
Imagine that a government agency is testing an education program that aims to increase calcium intake in children. To
assess the effectiveness of their program, the researchers graph the calcium intake of subjects that they randomly
assigned to either the control group (no education) or the education group.
Dot Plots
Dot plots typically contain the following elements:
For the calcium intake data, the dot plot shows that the distributions for the two groups appear to be different. The control
group centers on approximately 800 milligrams of average daily calcium intake. There also seems to be several outliers
with extremely low values that the analysts should investigate. The education group centers on a higher value and has a
tighter distribution than the control group. Hypothesis testing is required to determine the statistical significance of these
differences.
Interpreting Dot Plots and Assessing the Distribution of your Data
Dot plots display the distribution of your data. Look at the central tendency, variation, and overall shape of the distribution.
You might create a dot plot before or in conjunction with an analysis to help confirm assumptions and guide further study.
The tallest stacks of dots represent the most common values in your dataset. This region is where most values tend to fall.
It’s the central tendency of your dataset. The width of the distribution indicates the amount of variability. Broader
distributions signify greater variability.
In the dot plot below, the center is near 50. Most values are close to 50, and values further away are rarer.
Develop an idea of the data’s variability by looking at the distance between the minimum and maximum bins. In the dot plot below,
the spread of values is notably different. The values for group A mostly fall between 40 – 60, while for group B that range is 20 – 90.
Skewed Distributions
Determine whether your data tapers off symmetrically from the center or if it is skewed. The height data below follow a roughly
symmetric distribution.
In right-skewed distributions, most values fall on the left side of the distribution, and a long tail stretches to the right, as shown below.
Most of the body fat percentages are relatively low, with a few that are unusually high.
Conversely, for left-skewed distributions, most values fall on the right side of the distribution, and a long tail reaches to the left.
Outliers
Dot plots are a simple but effective way to identify outliers. Just look for values that stand out from the others! After you find them, you’ll need to decide what to do with them.
Multimodal Distributions
Multimodal distributions have more than one peak. This type of distribution stands out in a dot plot, but it can be easy to miss if you
focus on summary statistics, such as the mean and standard deviation.
Scatter Plots
Scatter plots are the graphs that present the relationship between two variables in a data-set. It represents data points on
a two-dimensional plane or on a Cartesian system. The independent variable or attribute is plotted on the X-axis, while
the dependent variable is plotted on the Y-axis.
A scatter plot is also called a scatter chart, scattergram, or scatter plot, XY graph.
Scatter plots instantly report a large volume of data. It is beneficial in the following situations –
The line drawn in a scatter plot, which is near to almost all the points in the plot is known as “line of best fit” or “trend
line“. See the graph below for an example.
Sample scatterplot with a trendline
Types of correlation
The scatter plot explains the correlation between two attributes or variables. It represents how closely the two variables are
connected. There can be three such situations to see the relation between the two variables –
1. Positive Correlation
2. Negative Correlation
3. No Correlation
The line of best fit
While each point in a scatterplot represents a specific observation, the line of best fit describes the
general trend based on all of the points.
For a given data point, we expect to see a difference between its value and the value predicted by the line
of best fit.
We can also interpret the slope and intercept of the line of best fit the same way we interpret line graphs:
given time period. They can be used to show a pattern or trend in the data and are useful for making
The horizontal axis always shows the time period, and the vertical axis represents the variable being
A q-q plot is a plot of the quantiles of the first data set against
the quantiles of the second data set.