Test 2
Test 2
In the previous lesson, you saw that frequency histograms often do a poor job
displaying a continuous variable's distribution, since the frequency with which values
fall within particular intervals is very sensitive to the intervals' bounds. We need to find
another approach, one that takes the interval length into account. What if we display
frequency not as the height of a column, but as its area?
To find the area of a rectangle, we multiply its length by its height. So the frequency of
the continuous variable defined this way is the interval length multiplied by its column
height. The height of the column is called the frequency density.
A histogram built in this way is called a density histogram. It differs from the
frequency histograms we've been plotting up to this point in that its vertical axis shows
the frequency density, instead of just frequency.
With a density histogram, you can estimate how many values fall into any given
interval, not just the bins selected to build the histogram. Take two values anywhere on
the horizontal axis, even if they're several bins apart and fall in the middle of intervals,
rather than on their boundaries, and then find the area of that part of the density
histogram that lies between them. The result is the number of values within this
interval.
We can also use the area under a curve to find the frequency density. This works the
same way: the area under the curve between two values corresponds to the frequency
of the values located in the interval.
For example, on the graph of the normal distribution, you can see that the majority of
values are found between the dotted lines:
Just as with the density histogram, the values are on the horizontal axis, and the area
under the curve gives the frequency of values in the interval.
We'll come back to the idea that the area under the curve corresponds to the frequency
for the interval later in the course. It will come in handy when you're testing statistical
hypotheses.
Tradução
Histogramas de densidade