Histogram
Histogram
We tally the counts of the values falling in each bin and then make the plot by drawing
rectangles whose bases are the bin intervals and whose heights are the counts.
In Python we can use the function plt.hist. For example, Figure 1.3 shows a histogram of the
226 ages in nutri, constructed via the following Python code. weights = np. ones_like
(nutri.age)/nutri.age.count () plt.hist(nutri.age ,bins =9, weights=weights , facecolor ='cyan',
edgecolor ='black', linewidth =1) plt.xlabel('age') plt.ylabel('Proportion of Total') plt.show ()
Importing, Summarizing, and Visualizing Data 11 Here 9 bins were used. Rather than using
raw counts (the default), the vertical axis here gives the percentage in each class, defined by count
total . This is achieved by choosing the “weights” parameter to be equal to the vector with entries
1/266, with length 226.
Various plotting parameters have also been changed. 65 70 75 80 85 90 age 0.00 0.05 0.10
0.15 0.20 Proportion of Total Figure 1.3:
Histogram of 'age'. Histograms can also be used for discrete features, although it may be
necessary to explicitly specify the bins and placement of the ticks on the axes. 1.5.2.3 Empirical
Cumulative Distribution Function The empirical cumulative distribution function, denoted by Fn, is a
step function which empirical cumulative distribution function jumps an amount k/n at observation
the fraction of observations less than or equal to x, i.e., Fn(x) = number of xi ⩽ x n = 1 n Xn i=1 1 {xi
values, where k is the number of tied observations at that value. For observations x1, . . . , xn, Fn(x) is
⩽ x} , (1.2) where 1 denotes the indicator function; that is, 1 {xi ⩽ x} is equal to 1 when xi ⩽ x and 0
indicator otherwise.
To produce a plot of the empirical cumulative distribution function we can use the plt.step
function. The result for the age data is shown in Figure 1.4. The empirical cumulative distribution
function for a discrete quantitative variable is obtained in the same way. x = np.sort(nutri.age) y =
np.linspace (0,1,len(nutri.age)) plt.xlabel('age') plt.ylabel('Fn(x)') plt.step(x,y) plt.xlim(x.min(),x.max())