0% found this document useful (0 votes)
19 views12 pages

Chapter 1

Uploaded by

pakchungyiu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views12 pages

Chapter 1

Uploaded by

pakchungyiu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Statistical Deep Learning

with Python and R


By Kaiser FAN and Phillip YAM

Department of Statistics, Faculty of Science, Chinese University of Hong Kong

Credit to the photo editor https://www5.lunapic.com/.


Photo taken by the authors over the River Cam, Cambridge.
Chapter 1

A JOURNEY FROM MACHINE LEARNING TO DEEP LEARN-


ING

CONTENTS
1.1 A brief history on Artificial Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Machine Learning - Learn from data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.1 Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.2 Unsupervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3 Machine Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.1 Parameters vs. Hyperparameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.2 Classification vs. Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.3 Model-Based vs. Instance-Based Learnings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.4 Shallow vs. Deep Learnings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4 How to use this book? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

“We know the past but cannot control it. We control the future but cannot know it.” by Claude Shannon 1

1.1 A brief history on Artificial Intelligence


Artificial Intelligence was kicked off in the 1950’s, and its development can be divided into three phrases.
In the first stage, it mainly focused on the logical mechanism via fuzzy logic. In light of the vast amount of
data available from the internet and contemporary clinical studies, AI has moved to the stream of statistical
and machine learning in the previous two decades. Getting into the third stage, in just the past few years,
we get into the paradigm of deep learning.

The main challenge faced by AI study is to teach a computer how to resolve an apparently trivial task to
humans but cannot be tackled by simply using preassigned algorithm or routine logical instructions; for
instance, distinguishing a cat from a dog is obvious to a human, but writing an algorithm to do this task,
so that all aspects are taken into account at once, would be very complicated. Particularly, the most recent
proposal on the use of machine learning is to handle problems similar to this; indeed, with the availability
of hand-held electronic devices, such as smart phones and smart watches, collecting huge amounts of data
on human behavior is far easier nowadays, and this can help to train the machines to learn to mimic us
on how to solve different matter. In the primitive models in statistical learning, most of them are only
composed with a few layers of complexity, and therefore they lack the ability to pick up the more subtle
latent information embedded deeply in the ocean of data. Facing at this bottleneck, to overcome this, with
1
Shannon, C. E. (1959). Coding theorems for a discrete source with a fidelity criterion. IRE Nat. Conv. Rec, 4(142-163),
1.

6
Page 7 A Journey from Machine Learning to Deep Learning

the latest advance in computational power and the availability of labeled data, scholars turn to strengthening
the network approach which leads us to the most recently popular topic - Deep Learning.

Figure 1.1.1: Relations among AI, Machine Learning and Deep Learning

First Wave (1950-1975): Mechanical logical reasoning


In 1950, Alan Turing proposed the influential yet controversial Turing test in his paper Computing Machin-
ery and Intelligence [11]. In the test, one of the two humans serves as an examiner and communicates with
the second human and a computer through text messages, where the last two are kept away out of the sight
of the examiner. The computer is considered to possess artificial intelligence if the examiner is unable to
distinguish the responses between the human and the computer.

In 1951, Marvin Minsky built the first Stochastic Neural Analog Reinforcement Calculator (SNARC). The
machine, essentially a neural network consisting of 40 neurons, enables human to first simulate the transmis-
sion of neural signals. To honor his contribution, Minsky received the Turing Award, the most prestigious
prize in computer science, in 1969.

In 1955, Allen Newell, Herbert Simon, and Cliff Shaw [6] wrote a compute program called the Logic Theo-
rist to mimic the problem-solving skills of humans. This program successfully proved 38 out of 52 theorems
from Principia Mathematica by Whitehead and Russell (1927).

For the formal origin of AI, the first workshop on Artificial Intelligence held in Dartmouth in the summer
of 1956 is commonly regarded as the date of birth of AI, and it was attended by the representative scholars
in information science and intelligence such as John McCarthy, Marvin Minsky, and Claude Shannon. The
workshop covered topics including neural networks, natural language processing, abstraction, and creativity.
After this series of talks, scientists and engineers have been constantly dreaming of a hypothetical machine
that can exhibit behavior at least as skillful and flexible as humans do, can reason, and can possess the
human soul and mind; researchers often refer to this collective wisdom and research program as General
(Strong) Artificial Intelligence.

Amazed by its unlimited potentials, AI had started to flourish. During this time, some contemporaries
optimistically foresaw that a machine completely driven by AI would come to birth in 20 years time. In
1963, the MIT initialized the Project on Mathematics and Computation (Project MAC), with Minsky and
McCarthy joining at a later time, in which they promoted a series of research topics on image and speech
recognitions. From 1964 to 1966, Joseph Weizenbaum built the world’s first natural language processing
computer program; meanwhile, on the other side of the globe, Waseda University in Japan announced the
invention of the first biped walking robot.

However, the hunger of scientists had yet to be satisfied. Criticism on AI began to rise starting in the
1970’s; indeed, the rapidly growing demand on computational power could not be fulfilled at that time.

All rights reserved. Do not distribute without permission from the authors, Kaiser Fan and Phillip Yam.
Page 8 Statistical Machine Learning and Deep Learning with Python and R

In addition, the variety and complexity of demanding problems in image and natural language processings
had created severe hurdles given the contemporary technological conditions. Reaching this bottleneck, the
public awareness and grant funding started to rapidly decline in the mid of 1970’s, AI development then fell
into decay.

Second Wave (1980-1987): Rise and fall of the expert system


Stepping into the 80’s, the breakthroughs in expert systems and artificial neural networks drew the public’s
attention back to AI. Expert systems can be dated back to the 60’s, being introduced in a project led
by Edward Feigenbaum [2], who was advocated as the “father of expert systems”. An expert system is a
computer program that simulates the judgment and behavior of a human with expertise collected before in
a particular field under a set of prescribed rules. In the 70’s, researchers at Stanford University invented
a system called MYCIN, which diagnosed a person’s blood, to identify bacteria causing infection such as
bacteremia and meningitis so as to recommend the appropriate dose of antibiotics, based on around 600
manually assigned rules. In the 80’s, the Carnegie Mellon University invented an expert system called XCON
(eXpert CONfiger) [1] for the Digital Equipment Corporation, which could automatically select the combi-
nations of computer components on behalf of a customer’s needs; that XCON had helped the corporation
save over 40 million US dollars annually at that time.

With the success of expert systems, the purpose of developing AI started to deviate from its original goal
of obtaining general intelligence, instead, the interest now is to develop more tailor made system to solve
target practical problems in specific areas. In 1982, John Hopfield proposed a new network model which
was later called the Hopfield network, a kind of recurrent artificial neural network [4], which incorporates
the mechanism of associative memory (the ability to learn and remember the relationship between unrelated
items). In 1986, David Rumelhart, Geoffrey Hinton and Ronald Williams jointly published the paper Learn-
ing representation by back-propagating errors [7] in which they proved empirically the method of backward
propagation can help train a multi-layer neural network such that it can learn the appropriate inherent
representations of an arbitrary mapping of input to output.

During this new wave of passion for AI, Japan’s Ministry of International Trade and Industry initial-
ized a project of building a “fifth-generation computer” in 1982 [8]. It aimed to create a machine with
supercomputer-like performance through large scale simultaneous sychronized calculations, in order to pro-
vide a platform for future developments in AI. However, after spending over 50 billion Japanese yen in 10
years time, the project could still not meet the planned target. In the late 80’s, negative impressions on AI
started to grow in the industry, as it failed to meet the expectations of the tremendous investments that
had been made, AI had once again faded out of people’s mind.

Third Wave (2011-now): Deep learning


After the previous two waves, researchers had given up their idealistic thoughts and AI had emerged a solu-
tion to solving practical concrete problems rather than a general approach and on a case-by-case basis. The
extensive use of mathematics has opened up the interdisciplinary collaboration between AI researchers and
scholars from other disciplines; and new sophisticated models and more effective algorithms are subsequently
developed, for instance, statistical learning theory, the support vector machine and probabilistic graphical
model, and now more prevalent deep neural networks, to name a few.

Stepping into the 21st century, the rapid globalization and development of the internet have significantly
boosted the volume of available digital information. On the other hand, the computing capability of Graph-
ics Processing Unit (GPU), first appearing in the 1990’s and then gaining popularity over the next two

All rights reserved. Do not distribute without permission from the authors, Kaiser Fan and Phillip Yam.
Page 9 A Journey from Machine Learning to Deep Learning

decades, has been proliferating; for instance, the calculation speed of a NVIDIA23 Tesla V100 GPU4 has
exceeded 10 trillion FLOPs (floating-point operations per second), surpassing the world’s fastest supercom-
puter in 2001.

With the rapid development of effective big data collection and computing technology, AI has achieved
major breakthroughs. The multi-layer neural network AlexNet5 invented by researchers at the University of
Toronto won the 2012 ImageNet Large Scale Visual Recognition Challenge (ILSVRC). AlexNet outdid, to
a large extent, the first runner-up in the challenge, where its algorithm was based on convolutional neural
networks in machine learning. Henceforth, deep learning based on multi-layer neural networks has been
applied to various areas. For instance, with an advance in deep reinforcement learning, AlphaGo recently
developed by Google has defeated several Go world champions [9]. All of these have captured public atten-
tion on the potentials of deep learning and brought back the revenge of AI.

Figure 1.1.2: The robot “R2-D2” in the Figure 1.1.3: T-800 “Model 101” in the
Star Wars series. movie The Terminator.

Figure 1.1.4: Which of these robots would you prefer to see?

1.2 Machine Learning - Learn from data


Machine learning, also known as statistical machine learning, is devoted to building statistical models, based
on data, for analysis and making predictions. Instead of executing manually assigned commands, machine
learning solves problems by utilizing the inherent insight and structure within the input data. The main
purpose of machine learning is to generalize, that is, to learn the rules from any hidden patterns embadded
in the collected data, and then apply this recently acquired “laws” to new scenarios for making decisions or
predictions.

Machine learning originated from the early stages of artificial intelligence, and it evolved gradually and
brought in new inspirations into different sub-branches of pattern recognition and computer learning the-
ories. It is an interdisciplinary subject that involves statistics, linear algebra, optimization and numerical
analysis. According to variations in purposes and methodologies, machine learning is classified into super-
vised learning, unsupervised learning and reinforcement learning. Assume that
1. there are N samples in dataset, and

2
Originally, the founders first thought of “NV” standing for “Next Version”, then they added “invidia” referring to the
Latin word envy.
3
We here again would like to express our gratitude to NVIDIA for supporting the joint-institute with CUHK.
4
The product NVIDIA Tesla V100 GPU can be found in https://www.nvidia.com/en-gb/data-center/tesla-v100/.
5
AlexNet is the name of a convolutional neural network (CNN), designed by Alex Krizhevsky in collaboration with Ilya
Sutskever and Geoffrey Hinton, who was Krizhevsky’s Ph.D. advisor.

All rights reserved. Do not distribute without permission from the authors, Kaiser Fan and Phillip Yam.
Page 10 Statistical Machine Learning and Deep Learning with Python and R

2. xm represents the out-sample feature vector, i.e. not in the training dataset, for m > N .

1.2.1 Supervised Learning


In supervised learning, models are trained using labeled data. Each datapoint in the dataset consists of a
feature vector (input) and their respective labels (output). Common learning scheme is called classification
if the output variable is discrete-valued, and regression if it is continuous.

We start the learning procedure by choosing a suitable model. Common supervised learning models include
logistic regression, generalized linear models, classification and regression trees, support vector machine
(SVM), K-nearest neighbors (KNN), naive Bayes classifiers, and many common Deep Neural Networks.
The model is tested by comparing the predicted values against the actual labels, so that the model can be
adjusted accordingly. The training process is repeated until sufficient accuracy is obtained. The learning is
supervised by the feedback obtained from the values of the actual labels; Once the training is finished, new
data can be input into the model for predictions.
1. Dataset: A collection of labeled examples {xn , yn }N n=1 , where

(a) xn (Input): A D-dimensional feature vector, i.e.


xn = (x(1) (2) (D)
n , xn , · · · , xn ) .
(b) yn (Label): Anything, e.g. an element belonging to a finite set of classes {1, · · · , C}, a real
number, a vector, a matrix, a tree, or a graph.
2. Goal: Produce a model that allows “correctly” guessing the label ym from the new feature vector xm .

Example 1.2.1. Spam Detection: Suppose that we have 10,000 email messages, each label with “spam” or
“not spam”. However, these email messages cannot be directly used in the model, these labels and passages
in the emails are not numbers! Hence, each email message has to be converted into a feature vector. One
common way is called bag of words: Let say the bag (dictionary) contains 20,000 alphabetically sorted
words, then
1. the first feature has a value of 1 if the email message contains the word “a”; 0 otherwise;
2. the second feature has a value of 1 if the email message contains the word “aaron”; 0 otherwise;
3. ..
.
4. the th
 20, 000 feature
th
has a value of 1 if the email message contains the word “zulu”; 0 otherwise.
1, if the nth message contains “zulu”

1, if the n message contains “a”
x(1)
n = · · · x(20,000)
n = .
0, otherwise 0, otherwise
Similarly, the output labels have to be converted into  numbers. For example,
1, if the nth message is spam
yn = 1{the nth message is spam} = .
0, otherwise
where 1{·} is the indicator function. This example will be further discussed in support vector machine,
random forest, naive bayes classifier, and CIBer.

1.2.2 Unsupervised Learning


In practice, labeled data are not always available, this leads us to the setting of unsupervised learning.
Only available information is the feature vectors of datapoints, these are analyzed in order to find out the
hidden patterns (inner structures) or clusters (organizations) within the data source. Typical approaches
of unsupervised learning include principal component analysis, recommended systems, K-means clustering,
dimension reduction, and feature extraction, etc.

The performance of traditional unsupervised learning in feature extraction for any complex data structure
may not be too appealing; alternatively, deep learning has proven its strong unsupervised learning abilities,

All rights reserved. Do not distribute without permission from the authors, Kaiser Fan and Phillip Yam.
Page 11 A Journey from Machine Learning to Deep Learning

especially in the field of computer vision, or when there are some natural ordering metric, or algebraic
structures among feature variables of datapoints; particularly, it is through the Convolutional layers in
Convolutional Neural Network (CNN) and the feedback mechansim via backpropagation in the Deep NN
(latter) part of CNN. Recently, some reasearch also suggests semi-supervised learning, falling in between
supervised and unsupervised learning. It makes use of unlabeled data together with a small amount of
labeled data, striking a balance between the learning performance and the costs in obtaining labeled data.
1. Dataset: A collection of unlabeled examples {xn }N n=1 .
2. Goal: Produce a model that transforms the feature vector xn into the real-valued output yn or a
vector output yn . For example, in the following cases, the model returns:
(a) Clustering: The identity of the cluster for each group of feature vectors in the dataset, i.e.
yn ∈ {1, · · · , C} ,
where C is the total number of clusters. K-means clustering is one of such method in clustering
K different subgroups, where K is a hyperparameter; see Section ??.
(b) Dimension Reduction: A new feature matrix Y ∈ RNY ×DY that has a smaller dimension than
the input feature matrix X ∈ RN ×D . Principal component analysis (PCA) reduces the dimension
of the input feature matrix through looking for the dominant eigenvalues of covariance matrix of
feature vectors; see Section ??.
(c) Outlier Detection: A real-valued number y` that indicates how x` is different from a “typical”
examples in the dataset {xn }N n=1 . The Mahalanobis distance [3] Dn for the independent and
identically distributed (iid) datum xn is such that
Dn2 = (xn − x̄N )> S −1 (xn − x̄N ) ∼ χ2D , n = 1, 2, · · · , N
approximately, where x̄N and S are respectively the mean and the covariance matrix of x1 , · · · , xN ,
and χ2D is the Chi-squared distribution with D degrees of freedom; especially if N is large enough,
these Dn2 ’s, for n = 1, · · · , N , are also approximately independent of each other.

1.2.2.1 Reinforcement Learning


Reinforcement learning (RL) finds extensive applications. Consider a two-layer pendulum, which is a classi-
cal problem in non-linear control. In control theory, the first step to tackle the problem is to build a precise
mathematical model to describe the pendulum system. Then, based on the model and theories in non-linear
systems, the control strategy is designed. However, building a model and designing a control can be very
complicated. Tackling the problem using methods in reinforcement learning, on the other hand, does not
require a mathematical model nor control. What we need is a learning algorithm and let the simulated
pendulum system to learn on its own.

A major usage of RL is to mimic human behaviour, and with the rapidly developing research and algorithms,
they can often outdo humans. For instance, a robot can learn to get up on itself after falling in a simulated
environment. Not to mention the eye-catching Go match in 2017, the computer program AlphaGo developed
by Google defeated Ke Jie, the world champion in Go. Other applications include machine translation (MT)
and predictive text etc.

There are three historical moments in the development of reinforcement learning. First, Sutton et al. (1998)
published the text Reinforcement Learning: An Introduction. The book summarizes the development of
different algorithms in reinforcement learning by 1998. By that time, much emphasis was paid on Q-
learning with tables. Concurrently, algorithms such as direct policy search have already been proposed.
For instance, the algorithm REINFORCE proposed in Williams (1992) directly updates the policy weights
by evaluating the policy gradient. The second time-point was in 2013, when Deep Q Network was first
suggested for gaming by DeepMind; also see Mnih et al. (2013). Deep Q Network integrates reinforcement
learning and deep neural network to form deep reinforcement learning. During 1998-2013, various policy-
based algorithms have also been developed. The third moment, and also the most compelling breakthrough

All rights reserved. Do not distribute without permission from the authors, Kaiser Fan and Phillip Yam.
Page 12 Statistical Machine Learning and Deep Learning with Python and R

in RL, has to be the development of AlphaGo by Google [9]. The RL-trained computer program earned 2
consecutive wins in Go matches over the world champions during 2016-2017.

1.3 Machine Learning Algorithms


1.3.1 Parameters vs. Hyperparameters
1. Parameters: Variables that define the model, which aims to learn their true values by some learning
algorithms. Parameters are directly estimated by the learning algorithm based on the training dataset.
The goal of machine learning is to find such values of parameters, based on some dataset, that optimize
the model in a certain sense.
2. Hyperparameters: Variables that control the learning process, e.g. the speed of convergence to the
optimal solution and the accuracy of the estimates. Hyperparameters are not learnt by the algorithm
itself from the training dataset, they have to be set a priori by the data analyst before running the
algorithm; see Section 4.5.

1.3.2 Classification vs. Regression


1. Classification: Assigning a binary or multiclass predicted label yn to an unlabeled observation xn .
2. Regression: Predicting a real-valued label yn for an unlabeled observation xn .

1.3.3 Model-Based vs. Instance-Based Learnings


1. Model-Based Learning: Use the training dataset to create a model that has parameters learnt
from the dataset. Most supervised learning algorithms are model-based, e.g. SVM, logistic regression,
random forest, DNNs. Once the model is built (i.e. the optimal parameters are found), the training
dataset can be discarded.
2. Instance-Based Learning: Use the dataset growing in time as the model. This learning approach
is similar to the online learning, the model is used to predict any new instances, and then the model
is updated, but in online learning, the dataset is in sequential order, while instance-based learning has
no such restriction. One example of instance-based learning algorithm is the K-Nearest Neighbors
(KNN), it looks at the K closest neighborhoods of an input in the dataset and make a label prediction
through majority vote among these K neighbours.

1.3.4 Shallow vs. Deep Learnings


1. Shallow Learning: Learns the parameters directly from the feature vector of the training dataset.
Most supervised learning algorithms are shallow: Artificial Neural Netwrok (ANN)
2. Deep Learning: Learns the parameters directly from the outputs of the preceding layers, usually with
more than one hidden layers in between the input and the output layers: Deep Neural Network (DNN).
Those preceding layers are normally used as filtering out the hidden patterns or inner organisations
from the data points.

1.4 How to use this book?


The main purpose of this book is to deliver the fundamental, mathematical, and statistical principles to
common machine and deep learnings learners, with the aid of some examples and problems arisen in business
sector through R and Python. Though codings involved may not be the most effective in reaching the
optimal buyout, but it surely can motivate and inspire readers in constructing an efficient enough model.

All rights reserved. Do not distribute without permission from the authors, Kaiser Fan and Phillip Yam.
Page 13 A Journey from Machine Learning to Deep Learning

Unfortunately, in this book, you may not found the materials about deep thinking, in the sense of the book
(see Figure 1.4.1) by professor Kawakami of design studies.

Figure 1.4.1: Kyoto University’s Deep Thinking Method (Japanese) by Hiroshi Kawakami.

“Deep Thinking” by Kawakami is a book on thought that deepens one’s thinking ability and cultivate his/her
skill of analyzing problems and then to propose solution strategies; it is more about scientific methods and
philosophical argument training, which will not be covered in the present book. Instead, we introduce
various practically useful mathematical and statistical models behind a wide range of machine learners. In
1997, Garry Kimovich Kasparov, who was once a world champion chess grandmaster, lost a match to the
IBM supercomputer “Deep Blue” under a limited time constraint. After 20 years, in 2017, he published a
book and named it as “Deep Thinking: Where Machine Intelligence Ends and Human Creativity Begins”
(see Figure 1.4.2).

Figure 1.4.2: Deep Thinking: Where Machine Intelligence Ends and Human Creativity Begins by Garry
Kasparov.

All rights reserved. Do not distribute without permission from the authors, Kaiser Fan and Phillip Yam.
Page 14 Statistical Machine Learning and Deep Learning with Python and R

In his book, Kasparov revealed experience and strategies playing against the Deep Blue. Although there
were plenty of critisms against artificial intelligence during that time, Kasparov believed that artificial intel-
ligence could bring humans to another height, and predicted the future development of artificial intelligence.
Similar to his idea, we hope that our book can bridge our readers to understand the common existing ma-
chine and deep learners, and to foster the future development of artificial intelligence.

In addition, we may not introduce any materials related to deep diving (see Figure 1.4.3), but only motivating
more about self-driving/autonomous driving (“deep” driving; see Figure 1.4.4) in the due course.

Figure 1.4.3: Deep diving. Photo by Figure 1.4.4: Deep driving. Photo by Alex
http://divemagazine.co.uk/travel/ Kendall https://www.youtube.com/watch?v=
7529-fresh-wrecks. CxanE_W46ts.

In particular, Tesla Inc., a U.S. based company which builds electric car, uses deep learning to develop
an autopilot system. This autopilot system has already been equiped in the Tesla Model 3 (see Figure
1.4.5). However, this autopilot technology can only perform several functions, including but not limited to
accelerating, braking, and steering. The Tesla drivers still need to take control of the car. The U.S. National
Highway Traffic Safety Administration gives a definition to a Level 5 self-driving cars:

“An automated driving system (ADS) on the vehicle can do all the driving in all circumstances.
The human occupants are just passengers and need never be involved in driving.” 6

While a Level 2 self-driving car is defined as:

“An advanced driver assistance system (ADAS) on the vehicle can itself actually control both
steering and braking/accelerating simultaneously under some circumstances. The human driver
must continue to pay full attention (monitor the driving environment) at all times and perform
the rest of the driving task.” 6

Therefore, the current Tesla’s autopilot system only suits the Level 2 requirement. There is still a long
journey for Tesla to improve its system; indeed, in March 2016, an incident was reported on Twitter that
the Tesla’s autopilot system mistakenly recognized the salt lines, which was caused in advance of a massive
snowstorm, as the normal traffic broken white lines on the highway, see Figure 1.4.6.

6
Retreived from https://www.nhtsa.gov/technology-innovation/automated-vehicles-safety

All rights reserved. Do not distribute without permission from the authors, Kaiser Fan and Phillip Yam.
Page 15 A Journey from Machine Learning to Deep Learning

Figure 1.4.5: White Tesla Model 3. Figure 1.4.6: Original picture on Twit-
Photo by https://en.wikipedia.org/ ter: Salt lines confuse Tesla’s autopilot
wiki/Tesla_Model_3. system. Photo by https://twitter.com/
amywebb/status/841292068488118273.

All rights reserved. Do not distribute without permission from the authors, Kaiser Fan and Phillip Yam.
Page 16 Statistical Machine Learning and Deep Learning with Python and R

Bibliography
[1] Bachant, J. and Soloway, E. (1989). The engineering of xcon. Communications of the ACM, 32(3):311–
319.

[2] Buchanan, B. G. and Feigenbaum, E. A. (1980). The stanford heuristic programming project: Goals
and activities. AI Magazine, 1(1):25–25.

[3] Chandra, M. P. (1936). On the generalised distance in statistics. In Proceedings of the National Institute
of Sciences of India, volume 2, pages 49–55.

[4] Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective computational
abilities. Proceedings of the national academy of sciences, 79(8):2554–2558.

[5] Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., and Riedmiller, M.
(2013). Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.

[6] Newell, A. and Simon, H. (1956). The logic theory machine–a complex information processing system.
IRE Transactions on information theory, 2(3):61–79.

[7] Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986). Learning representations by back-
propagating errors. nature, 323(6088):533–536.

[8] Shapiro, E. Y. (1983). The fifth generation project—a trip report. Communications of the ACM,
26(9):637–641.

[9] Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J.,
Antonoglou, I., Panneershelvam, V., Lanctot, M., et al. (2016). Mastering the game of go with deep
neural networks and tree search. nature, 529(7587):484–489.

[10] Sutton, R. S., Barto, A. G., et al. (1998). Introduction to reinforcement learning.

[11] Turing, A. M. (1950). Computing machinery and intelligence. Mind, 59(236):433–460.

[12] Whitehead, A. and Russell, B. (1927). Principia Mathematica. Number 1 in Cambridge mathematical
library. Cambridge University Press.

[13] Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement
learning. Machine learning, 8(3):229–256.

All rights reserved. Do not distribute without permission from the authors, Kaiser Fan and Phillip Yam.

You might also like