Statistical Learning
Statistical Learning
We tally the counts of the values falling in each bin and then make the plot by drawing
rectangles whose bases are the bin intervals and whose heights are the counts.
In Python we can use the function plt.hist. For example, Figure 1.3 shows a histogram of the
226 ages in nutri, constructed via the following Python code. weights = np. ones_like
(nutri.age)/nutri.age.count () plt.hist(nutri.age ,bins =9, weights=weights , facecolor ='cyan',
edgecolor ='black', linewidth =1) plt.xlabel('age') plt.ylabel('Proportion of Total') plt.show ()
Importing, Summarizing, and Visualizing Data 11 Here 9 bins were used. Rather than using
raw counts (the default), the vertical axis here gives the percentage in each class, defined by count
total . This is achieved by choosing the “weights” parameter to be equal to the vector with entries
1/266, with length 226.
Various plotting parameters have also been changed. 65 70 75 80 85 90 age 0.00 0.05 0.10
0.15 0.20 Proportion of Total Figure 1.3:
Histogram of 'age'. Histograms can also be used for discrete features, although it may be
necessary to explicitly specify the bins and placement of the ticks on the axes. 1.5.2.3 Empirical
Cumulative Distribution Function The empirical cumulative distribution function, denoted by Fn, is a
step function which empirical cumulative distribution function jumps an amount k/n at observation
the fraction of observations less than or equal to x, i.e., Fn(x) = number of xi ⩽ x n = 1 n Xn i=1 1 {xi
values, where k is the number of tied observations at that value. For observations x1, . . . , xn, Fn(x) is
⩽ x} , (1.2) where 1 denotes the indicator function; that is, 1 {xi ⩽ x} is equal to 1 when xi ⩽ x and 0
indicator otherwise.
To produce a plot of the empirical cumulative distribution function we can use the plt.step
function. The result for the age data is shown in Figure 1.4. The empirical cumulative distribution
function for a discrete quantitative variable is obtained in the same way. x = np.sort(nutri.age) y =
np.linspace (0,1,len(nutri.age)) plt.xlabel('age') plt.ylabel('Fn(x)') plt.step(x,y) plt.xlim(x.min(),x.max())
STATISTICAL LEARNING The purpose of this chapter is to introduce the reader to some
common concepts and themes in statistical learning. We discuss the difference between supervised
and unsupervised learning, and how we can assess the predictive performance of supervised
learning. We also examine the central role that the linear and Gaussian properties play in the
modeling of data. We conclude with a section on Bayesian learning. The required probability and
statistics background is given in Appendix C. 2.1 Introduction Although structuring and visualizing
data are important aspects of data science, the main challenge lies in the mathematical analysis of
the data. When the goal is to interpret the model and quantify the uncertainty in the data, this
analysis is usually referred to as statistical learning. In contrast, when the emphasis is on making
predictions using large-scale statistical data, then it is common to speak about machine learning or
data mining. learning machine learning data mining There are two major goals for modeling data: 1)
to accurately predict some future quantity of interest, given some observed data, and 2) to discover
unusual or interesting patterns in the data. To achieve these goals, one must rely on knowledge from
three important pillars of the mathematical sciences. Function approximation. Building a
mathematical model for data usually means understanding how one data variable depends on
another data variable. The most natural way to represent the relationship between variables is via a
mathematical function or map. We usually assume that this mathematical function is not completely
known, but can be approximated well given enough computing power and data. Thus, data scientists
have to understand how best to approximate and represent functions using the least amount of
computer processing and memory. Optimization. Given a class of mathematical models, we wish to
find the best possible model in that class. This requires some kind of efficient search or optimization
procedure. The optimization step can be viewed as a process of fitting or calibrating a function to
observed data. This step usually requires knowledge of optimization algorithms and efficient
computer coding or programming. 19 20 Supervised and Unsupervised Learning Probability and
Statistics. In general, the data used to fit the model is viewed as a realization of a random process or
numerical vector, whose probability law determines the accuracy with which we can predict future
observations. Thus, in order to quantify the uncertainty inherent in making predictions about the
future, and the sources of error in the model, data scientists need a firm grasp of probability theory
and statistical inference.