Machine Learning: A Bayesian and Optimization Perspective 2nd Edition Sergios Theodoridis All Chapter Instant Download
Machine Learning: A Bayesian and Optimization Perspective 2nd Edition Sergios Theodoridis All Chapter Instant Download
com
https://textbookfull.com/product/machine-
learning-a-bayesian-and-optimization-
perspective-2nd-edition-sergios-theodoridis/
textbookfull
More products digital (pdf, epub, mobi) instant
download maybe you interests ...
https://textbookfull.com/product/academic-press-library-in-
signal-processing-contents-vol-1-4-1st-edition-sergios-
theodoridis/
https://textbookfull.com/product/academic-press-library-in-
signal-processing-volume-5-image-and-video-compression-and-
multimedia-sergios-theodoridis/
https://textbookfull.com/product/linear-algebra-and-optimization-
for-machine-learning-a-textbook-charu-c-aggarwal/
https://textbookfull.com/product/machine-learning-and-iot-a-
biological-perspective-1st-edition-shampa-sen-editor/
Hyperparameter Optimization in Machine Learning: Make
Your Machine Learning and Deep Learning Models More
Efficient 1st Edition Tanay Agrawal
https://textbookfull.com/product/hyperparameter-optimization-in-
machine-learning-make-your-machine-learning-and-deep-learning-
models-more-efficient-1st-edition-tanay-agrawal/
https://textbookfull.com/product/hyperparameter-optimization-in-
machine-learning-make-your-machine-learning-and-deep-learning-
models-more-efficient-1st-edition-tanay-agrawal-2/
https://textbookfull.com/product/algorithmic-trading-methods-
applications-using-advanced-statistics-optimization-and-machine-
learning-techniques-2nd-edition-robert-kissell/
https://textbookfull.com/product/machine-learning-with-
tensorflow-2nd-edition-chris-a-mattmann/
https://textbookfull.com/product/supervised-machine-learning-
optimization-framework-and-applications-with-sas-and-r-first-
edition-kolosova/
Machine Learning
A Bayesian and Optimization
Perspective
Machine Learning
A Bayesian and Optimization
Perspective
2nd Edition
Sergios Theodoridis
Department of Informatics and Telecommunications
National and Kapodistrian University of Athens
Athens, Greece
Shenzhen Research Institute of Big Data
The Chinese University of Hong Kong
Shenzhen, China
Academic Press is an imprint of Elsevier
125 London Wall, London EC2Y 5AS, United Kingdom
525 B Street, Suite 1650, San Diego, CA 92101, United States
50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States
The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom
Copyright © 2020 Elsevier Ltd. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical,
including photocopying, recording, or any information storage and retrieval system, without permission in writing from the
publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our
arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found
at our website: www.elsevier.com/permissions.
This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may
be noted herein).
Notices
Knowledge and best practice in this field are constantly changing. As new research and experience broaden our
understanding, changes in research methods, professional practices, or medical treatment may become necessary.
Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any
information, methods, compounds, or experiments described herein. In using such information or methods they should be
mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility.
To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any
injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or
operation of any methods, products, instructions, or ideas contained in the material herein.
ISBN: 978-0-12-818803-3
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465
MATLAB® Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 468
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469
CHAPTER 10 Sparsity-Aware Learning: Algorithms and Applications . . . . . . . . . . . . . . . . . 473
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473
10.2 Sparsity Promoting Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474
10.2.1 Greedy Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474
10.2.2 Iterative Shrinkage/Thresholding (IST) Algorithms . . . . . . . . . . . . . . 480
10.2.3 Which Algorithm? Some Practical Hints . . . . . . . . . . . . . . . . . . . . . . 487
10.3 Variations on the Sparsity-Aware Theme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492
10.4 Online Sparsity Promoting Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499
10.4.1 LASSO: Asymptotic Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . 500
10.4.2 The Adaptive Norm-Weighted LASSO . . . . . . . . . . . . . . . . . . . . . . . . 502
10.4.3 Adaptive CoSaMP Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504
10.4.4 Sparse-Adaptive Projection Subgradient Method . . . . . . . . . . . . . . . . 505
10.5 Learning Sparse Analysis Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 510
10.5.1 Compressed Sensing for Sparse Signal Representation
in Coherent Dictionaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512
10.5.2 Cosparsity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513
10.6 A Case Study: Time-Frequency Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 516
Gabor Transform and Frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 516
Time-Frequency Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517
Gabor Frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 518
Time-Frequency Analysis of Echolocation Signals Emitted by Bats . . 519
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523
MATLAB® Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525
CHAPTER 11 Learning in Reproducing Kernel Hilbert Spaces . . . . . . . . . . . . . . . . . . . . . . 531
11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 532
11.2 Generalized Linear Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 532
11.3 Volterra, Wiener, and Hammerstein Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 533
11.4 Cover’s Theorem: Capacity of a Space in Linear Dichotomies . . . . . . . . . . . . . 536
11.5 Reproducing Kernel Hilbert Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539
11.5.1 Some Properties and Theoretical Highlights . . . . . . . . . . . . . . . . . . . . 541
11.5.2 Examples of Kernel Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543
11.6 Representer Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 548
11.6.1 Semiparametric Representer Theorem . . . . . . . . . . . . . . . . . . . . . . . . 550
11.6.2 Nonparametric Modeling: a Discussion . . . . . . . . . . . . . . . . . . . . . . . 551
11.7 Kernel Ridge Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551
11.8 Support Vector Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554
11.8.1 The Linear -Insensitive Optimal Regression . . . . . . . . . . . . . . . . . . . 555
11.9 Kernel Ridge Regression Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 561
11.10 Optimal Margin Classification: Support Vector Machines . . . . . . . . . . . . . . . . . 562
xiv Contents
xxi
Preface
Machine learning is a name that is gaining popularity as an umbrella and evolution for methods that
have been studied and developed for many decades in different scientific communities and under differ-
ent names, such as statistical learning, statistical signal processing, pattern recognition, adaptive signal
processing, image processing and analysis, system identification and control, data mining and infor-
mation retrieval, computer vision, and computational learning. The name “machine learning” indicates
what all these disciplines have in common, that is, to learn from data, and then make predictions. What
one tries to learn from data is their underlying structure and regularities, via the development of a
model, which can then be used to provide predictions.
To this end, a number of diverse approaches have been developed, ranging from optimization of cost
functions, whose goal is to optimize the deviation between what one observes from data and what the
model predicts, to probabilistic models that attempt to model the statistical properties of the observed
data.
The goal of this book is to approach the machine learning discipline in a unifying context, by pre-
senting major paths and approaches that have been followed over the years, without giving preference
to a specific one. It is the author’s belief that all of them are valuable to the newcomer who wants to
learn the secrets of this topic, from the applications as well as from the pedagogic point of view. As the
title of the book indicates, the emphasis is on the processing and analysis front of machine learning and
not on topics concerning the theory of learning itself and related performance bounds. In other words,
the focus is on methods and algorithms closer to the application level.
The book is the outgrowth of more than three decades of the author’s experience in research and
teaching various related courses. The book is written in such a way that individual (or pairs of) chapters
are as self-contained as possible. So, one can select and combine chapters according to the focus he/she
wants to give to the course he/she teaches, or to the topics he/she wants to grasp in a first reading. Some
guidelines on how one can use the book for different courses are provided in the introductory chapter.
Each chapter grows by starting from the basics and evolving to embrace more recent advances.
Some of the topics had to be split into two chapters, such as sparsity-aware learning, Bayesian learning,
probabilistic graphical models, and Monte Carlo methods. The book addresses the needs of advanced
graduate, postgraduate, and research students as well as of practicing scientists and engineers whose
interests lie beyond black-box approaches. Also, the book can serve the needs of short courses on spe-
cific topics, e.g., sparse modeling, Bayesian learning, probabilistic graphical models, neural networks
and deep learning.
Second Edition
The first edition of the book, published in 2015, covered advances in the machine learning area up to
2013–2014. These years coincide with the start of a real booming in research activity in the field of deep
learning that really reshaped our related knowledge and revolutionized the field of machine learning.
The main emphasis of the current edition was to, basically, rewrite Chapter 18. The chapter now covers
a review of the field, starting from the early days of the perceptron and the perceptron rule, until
the most recent advances, including convolutional neural networks (CNNs), recurrent neural networks
(RNNs), adversarial examples, generative adversarial networks (GANs), and capsule networks.
xxiii
xxiv Preface
Also, the second edition covers in a more extended and detailed way nonparametric Bayesian meth-
ods, such as Chinese restaurant processes (CRPs) and Indian buffet processes (IBPs). It is the author’s
belief that Bayesian methods will gain in importance in the years to come. Of course, only time can
tell whether this will happen or not. However, the author’s feeling is that uncertainty is going to be
a major part of the future models and Bayesian techniques can be, at least in principle, a reasonable
start. Concerning the other chapters, besides the (omnipresent!) typos that have been corrected, changes
have been included here and there to make the text easier to read, thanks to suggestions by students,
colleagues, and reviewers; I am deeply indebted to all of them.
Most of the chapters include MATLAB® exercises, and the related code is freely available from
the book’s companion website. Furthermore, in the second edition, all the computer exercises are also
given in Python together with the corresponding code, which are also freely available via the website
of the book. Finally, some of the computer exercises in Chapter 18 that are related to deep learning,
and which are closer to practical applications, are given in Tensorflow.
The solutions manual as well lecture slides are available from the book’s website for instructors.
In the second edition, all appendices have been moved to the website associated with the book, and
they are freely downloadable. This was done in an effort to save space in a book that is already more
than 1100 pages. Also, some sections dedicated to methods that were present in various chapters in the
first edition, which I felt do not constitute basic knowledge and current mainstream research topics,
while they were new and “fashionable” in 2015, have been moved, and they can be downloaded from
the companion website of the book.
http://textbooks.elsevier.com/web/Manuals.aspx?isbn=9780128188033
Writing a book is an effort on top of everything else that must keep running in parallel. Thus, writing
is basically an early morning, after five, and over the weekends and holidays activity. It is a big effort
that requires dedication and persistence. This would not be possible without the support of a number of
people—people who helped in the simulations, in the making of the figures, in reading chapters, and
in discussing various issues concerning all aspects, from proofs to the structure and the layout of the
book.
First, I would like to express my gratitude to my mentor, friend, and colleague Nicholas Kaloupt-
sidis, for this long-lasting and fruitful collaboration.
The cooperation with Kostas Slavakis over the more recent years has been a major source of inspi-
ration and learning and has played a decisive role for me in writing this book.
I am indebted to the members of my group, and in particular to Yannis Kopsinis, Pantelis Bouboulis,
Simos Chouvardas, Kostas Themelis, George Papageorgiou, Charis Georgiou, Christos Chatzichristos,
and Emanuel Morante. They were next to me the whole time, especially during the difficult final stages
of the completion of the manuscript. My colleagues Aggelos Pikrakis, Kostas Koutroumbas, Dimitris
Kosmopoulos, George Giannakopoulos, and Spyros Evaggelatos gave a lot of their time for discussions,
helping in the simulations and reading chapters.
Without my two sabbaticals during the spring semesters of 2011 and 2012, I doubt I would have
ever finished this book. Special thanks go to all my colleagues in the Department of Informatics and
Telecommunications of the National and Kapodistrian University of Athens.
During my sabbatical in 2011, I was honored to be a holder of an Excellence Chair in Carlos III
University of Madrid and spent the time with the group of Anibal Figuieras-Vidal. I am indebted to
Anibal for his invitation and all the fruitful discussions and the bottles of excellent red Spanish wine we
had together. Special thanks go to Jerónimo Arenas-García and Antonio Artés-Rodríguez, who have
also introduced me to aspects of traditional Spanish culture.
During my sabbatical in 2012, I was honored to be an Otto Mønsted Guest Professor at the Technical
University of Denmark with the group of Lars Kai Hansen. I am indebted to him for the invitation and
our enjoyable and insightful discussions, as well as his constructive comments on chapters of the book
and the visits to the Danish museums on weekends. Also, special thanks go to Morten Mørup and the
late Jan Larsen for the fruitful discussions.
The excellent research environment of the Shenzhen Research Institute of Big Data of the Chinese
University of Hong Kong ignited the spark and gave me the time to complete the second edition of the
book. I am deeply indebted to Tom Luo who offered me this opportunity and also introducing me to
the secrets of Chinese cooking.
A number of colleagues were kind enough to read and review chapters and parts of the book
and come back with valuable comments and criticism. My sincere thanks go to Tulay Adali, Kostas
Berberidis, Jim Bezdek, Soterios Chatzis, Gustavo Camps-Valls, Rama Chellappa, Taylan Cemgil
and his students, Petar Djuric, Paulo Diniz, Yannis Emiris, Mario Figuieredo, Georgios Giannakis,
Mark Girolami, Dimitris Gunopoulos, Alexandros Katsioris, Evaggelos Karkaletsis, Dimitris Katselis,
Athanasios Liavas, Eleftherios Kofidis, Elias Koutsoupias, Alexandros Makris, Dimitirs Manatakis,
xxv
xxvi Acknowledgments
Elias Manolakos, Petros Maragos, Francisco Palmieri, Jean-Christophe Pesquet, Bhaskar Rao, George
Retsinas, Ali Sayed, Nicolas Sidiropoulos, Paris Smaragdis, Isao Yamada, Feng Yin, and Zhilin Zhang.
Finally, I would like to thank Tim Pitts, the Editor at Academic Press, for all his help.
Notation
I have made an effort to keep a consistent mathematical notation throughout the book. Although every
symbol is defined in the text prior to its use, it may be convenient for the reader to have the list of major
symbols summarized together. The list is presented below:
• Vectors are denoted with boldface letters, such as x.
• Matrices are denoted with capital letters, such as A.
• The determinant of a matrix is denoted as det{A}, and sometimes as |A|.
• A diagonal matrix with elements a1 , a2 , . . . , al in its diagonal is denoted as A = diag{a1 , a2 , . . . , al }.
• The identity matrix is denoted as I .
• The trace of a matrix is denoted as trace{A}.
• Random variables are denoted with roman fonts, such as x, and their corresponding values with
mathmode letters, such as x.
• Similarly, random vectors are denoted with roman boldface, such as x, and the corresponding values
as x. The same is true for random matrices, denoted as X and their values as X.
• Probability values for discrete random variables are denoted by capital P , and probability density
functions (PDFs), for continuous random variables, are denoted by lower case p.
• The vectors are assumed to be column-vectors. In other words,
⎡ ⎤ ⎡ ⎤
x1 x(1)
⎢ x2 ⎥ ⎢ x(2) ⎥
⎢ ⎥ ⎢ ⎥
x = ⎢ . ⎥ , or x = ⎢ . ⎥ .
⎣ .. ⎦ ⎣ .. ⎦
xl x(l)
That is, the ith element of a vector can be represented either with a subscript, xi , or as x(i).
• Matrices are written as
⎡ ⎤ ⎡ ⎤
x11 x12 . . . x1l X(1, 1) X(1, 2) . . . X(1, l)
⎢ .. ⎥ ⎢ ⎥
X = ⎣ ... ..
.
..
. . ⎦ , or X = ⎣
..
.
..
.
..
.
..
. ⎦.
xl1 xl2 . . . xll X(l, 1) X(l, 2) . . . X(l, l)
xxvii
CHAPTER
INTRODUCTION
1
CONTENTS
1.1 The Historical Context................................................................................................ 1
1.2 Artificial Intelligence and Machine Learning..................................................................... 2
1.3 Algorithms Can Learn What Is Hidden in the Data............................................................... 4
1.4 Typical Applications of Machine Learning........................................................................ 6
Speech Recognition .......................................................................................... 6
Computer Vision............................................................................................... 6
Multimodal Data .............................................................................................. 6
Natural Language Processing ............................................................................... 7
Robotics ........................................................................................................ 7
Autonomous Cars ............................................................................................. 7
Challenges for the Future.................................................................................... 8
1.5 Machine Learning: Major Directions............................................................................... 8
1.5.1 Supervised Learning .................................................................................. 8
Classification .................................................................................................. 9
Regression..................................................................................................... 11
1.6 Unsupervised and Semisupervised Learning ..................................................................... 11
1.7 Structure and a Road Map of the Book ............................................................................ 12
References................................................................................................................... 16
computers and communications (internet), and it is characterized by the convergence of the physical,
digital, and biological spheres.
The terms artificial intelligence (AI) and machine learning are used and spread more and more to
denote the type of automation technology that is used in the production (industry), in the distribution of
goods (commerce), in the service sector, and in our economic transactions (e.g., banking). Moreover,
these technologies affect and shape the way we socialize and interact as humans via social networks,
and the way we entertain ourselves, involving games and cultural products such as music and movies.
A distinct qualitative difference of the fourth, compared to the previous industrial revolutions, is
that, before, it was the manual skills of humans that were gradually replaced by “machines.” In the
one that we are currently experiencing, mental skills are also replaced by “machines.” We now have
automatic answering software that runs on computers, less people are serving us in banks, and many
jobs in the service sector have been taken over by computers and related software platforms. Soon, we
are going to have cars without drivers and drones for deliveries. At the same time, new jobs, needs,
and opportunities appear and are created. The labor market is fast changing and new competences and
skills are and will be required in the future (see, e.g., [22,23]).
At the center of this historical happening, as one of the key enabling technologies, lies a discipline
that deals with data and whose goal is to extract information and related knowledge that is hidden in
it, in order to make predictions and, subsequently, take decisions. That is, the goal of this discipline is
to learn from data. This is analogous to what humans do in order to reach decisions. Learning through
the senses, personal experience, and the knowledge that propagates from generation to generation is
at the heart of human intelligence. Also, at the center of any scientific field lies the development of
models (often called theories) in order to explain the available experimental evidence. In other words,
data comprise a major source of learning.
els. In turn, more and more applications adopted such algorithmic techniques. “Learning from data”
became the new trend and the term machine learning prevailed as an umbrella for such techniques.
Moreover, the big difference was made with the use and “rediscovery” of what is today known
as deep neural networks. These models offered impressive predictive accuracies that had never been
achieved by previous models. In turn, these successes paved the way for the adoption of such models in
a wide range of applications and also ignited intense research, and new versions and models have been
proposed. These days, another term that is catching up is “data science,” indicating the emphasis on
how one can develop robust machine learning and computational techniques that deal efficiently with
large-scale data.
However, the main rationale, which runs the spine of all the methods that come under the machine
learning umbrella, remains the same and it has been around for many decades. The main concept is
to estimate a set of parameters that describe the model, using the available data and, in the sequel,
to make predictions based on low-level information and signals. One may easily argue that there is
not much intelligence built in such approaches. No doubt, deep neural networks involve much more
“intelligence” than their predecessors. They have the potential to optimize the representation of their
low-level input information to the computer.
The term “representation” refers to the way in which related information that is hidden in the input
data is quantified/coded so that it can be subsequently processed by a computer. In the more technical
jargon, each piece of such information is known as a feature (see also Section 1.5.1). As discussed in
detail in Chapter 18, where neural networks (NNs) are defined and presented in detail, what makes these
models distinctly different from other data learning methods is their multilayer structure. This allows
for the “building” up of a hierarchy of representations of the input information at various abstraction
levels. Every layer builds upon the previous one and the higher in hierarchy, the more abstract the
obtained representation is. This structure offers to neural networks a significant performance advantage
over alternative models, which restrict themselves to a single representation layer. Furthermore, this
single-level representation was rather hand-crafted and designed by the users, in contrast to the deep
networks that “learn” the representation layers from the input data via the use of optimality criteria.
Yet, in spite of the previously stated successes, I share the view that we are still very far from what
an intelligent machine should be. For example, once trained (estimating the parameters) on one data
set, which has been developed for a specific task, it is not easy for such models to generalize to other
tasks. Although, as we are going to see in Chapter 18, advances in this direction have been made, we are
still very far from what human intelligence can achieve. When a child sees one cat, readily recognizes
another one, even if this other cat has a different color or if it turns around. Current machine learning
systems need thousands of images with cats, in order to be trained to “recognize” one in an image. If a
human learns to ride a bike, it is very easy to transfer this knowledge and learn to ride a motorbike or
even to drive a car. Humans can easily transfer knowledge from one task to another, without forgetting
the previous one. In contrast, current machine learning systems lack such a generalization power and
tend to forget the previous task once they are trained to learn a new one. This is also an open field of
research, where advances have also been reported.
Furthermore, machine learning systems that employ deep networks can even achieve superhuman
prediction accuracies on data similar to those with which they have been trained. This is a significant
achievement, not to be underestimated, since such techniques can efficiently be used for dedicated
jobs; for example, to recognize faces, to recognize the presence of various objects in photographs, and
also to annotate images and produce text that is related to the content of the image. They can recognize
4 CHAPTER 1 INTRODUCTION
speech, translate text from one language to another, detect which music piece is currently playing in the
bar, and whether the piece belongs to the jazz or to the rock musical genre. At the same time, they can
be fooled by carefully constructed examples, known as adversarial examples, in a way that no human
would be fooled to produce a wrong prediction (see Chapter 18).
Concerning AI, the term “artificial intelligence” was first coined by John McCarthy in 1956 when
he organized the first dedicated conference (see, e.g., [20] for a short history). The concept at that time,
which still remains a goal, was whether one can build an intelligent machine, realized on software and
hardware, that can possess human-like intelligence. In contrast to the field of machine learning, the
concept for AI was not to focus on low-level information processing with emphasis on predictions, but
on the high-level cognitive capabilities of humans to reason and think. No doubt, we are still very far
from this original goal. Predictions are, indeed, part of intelligence. Yet, intelligence is much more than
that. Predictions are associated with what we call inductive reasoning. Yet what really differentiates
human from the animals intelligence is the power of the human mind to form concepts and create
conjectures for explaining data and more general the World in which we live. Explanations comprise
a high-level facet of our intelligence and constitute the basis for scientific theories and the creation of
our civilization. They are assertions concerning the “why” ’s and the “how” ’s related to a task, e.g.,
[5,6,11].
To talk about AI, at least as it was conceived by pioneers such as Alan Turing [16], systems should
have built-in capabilities for reasoning and giving meaning, e.g., in language processing, to be able
to infer causality, to model efficient representations of uncertainty, and, also, to pursue long-term
goals [8]. Possibly, towards achieving these challenging goals, we may have to understand and imple-
ment notions from the theory of mind, and also build machines that implement self-awareness. The
former psychological term refers to the understanding that others have their own beliefs and intentions
that justify their decisions. The latter refers to what we call consciousness. As a last point, recall that
human intelligence is closely related to feelings and emotions. As a matter of fact, the latter seem
to play an important part in the creative mental power of humans (e.g., [3,4,17]). Thus, in this more
theoretical perspective AI still remains a vision for the future.
The previous discussion should not be taken as an attempt to get involved with philosophical the-
ories concerning the nature of human intelligence and AI. These topics comprise a field in itself, for
more than 60 years, which is much beyond the scope of this book. My aim was to make the newcomer
in the field aware of some views and concerns that are currently being discussed.
In the more practical front, for the early years, the term AI was used to refer to techniques built
around knowledge-based systems that sought to hard-code knowledge in terms of formal languages,
e.g., [13]. Computer “reasoning” was implemented via a set of logical inference rules. In spite of the
early successes, such methods seem to have reached a limit, see, e.g., [7]. It was the alternative path of
machine learning, via learning from data, that gave a real push into the field. These days, the term AI
is used as an umbrella to cover all methods and algorithmic approaches that are related to the machine
intelligence discipline, with machine learning and knowledge-based techniques being parts of it.
Pare the dark rind from a very fresh cocoa-nut, and grate it down
small on an exceedingly clean, bright grater; weigh it, and allow two
ounces for each quart of soup. Simmer it gently for one hour in the
stock, which should then be strained closely from it, and thickened
for table.
Veal stock, gravy-soup, or broth, 5 pints; grated cocoa-nut, 5 oz., 1
hour. Flour of rice, 5 oz.; mace, 1/2 teaspoonful; little cayenne and
salt; mixed with 1/4 pint of cream: 10 minutes.
Or: gravy-soup, or good beef broth, 5 pints: 1 hour. Rice flour, 5
oz.; soy and lemon-juice, each 1 tablespoonful; finely pounded
sugar, 1 oz.; cayenne, 1/4 teaspoonful; sherry, 2 glassesful.
Obs.—When either cream or wine is objected to for these soups, a
half-pint of the stock should be reserved to mix the thickening with.
CHESTNUT SOUP.
Strip the outer rind from some fine, sound Spanish chestnuts,
throw them into a large pan of warm water, and as soon as it
becomes too hot for the fingers to remain in it, take it from the fire, lift
out the chestnuts, peel them quickly, and throw them into cold water
as they are done; wipe, and weigh them; take three quarters of a
pound for each quart of soup, cover them with good stock, and stew
them gently for upwards of three quarters of an hour, or until they
break when touched with a fork; drain, and pound them smoothly, or
bruise them to a mash with a strong spoon, and rub them through a
fine sieve reversed; mix with them by slow degrees the proper
quantity of stock; add sufficient mace, cayenne, and salt to season
the soup, and stir it often until it boils. Three quarters of a pint of rich
cream, or even less, will greatly improve it. The stock in which the
chestnuts are boiled can be used for the soup when its sweetness is
not objected to; or it may in part be added to it.
Chestnuts, 1-1/2 lb.: stewed from 2/3 to 1 hour. Soup, 2 quarts;
seasoning of salt, mace, and cayenne: 1 to 3 minutes. Cream, 3/4
pint (when used).
JERUSALEM ARTICHOKE, OR PALESTINE SOUP.
Scrape very clean, and cut away all blemishes from some highly-
flavoured red carrots; wash, and wipe them dry, and cut them into
quarter-inch slices. Put into a large stewpan three ounces of the best
butter, and when it is melted, add two pounds of the sliced carrots,
and let them stew gently for an hour without browning; pour to them
then four pints and a half of brown gravy soup, and when they have
simmered from fifty minutes to an hour, they ought to be sufficiently
tender. Press them through a sieve or strainer with the soup; add
salt, and cayenne if required; boil the whole gently for five minutes,
take off all the scum, and serve the soup as hot as possible.
Butter, 3 oz.; carrots, 2 lbs.: 1 hour. Soup, 4-1/2 pints: 50 to 60
minutes. Salt, cayenne: 5 minutes.
COMMON TURNIP SOUP.
Wash and wipe the turnips, pare and weigh them; allow a pound
and a half for every quart of soup. Cut them in slices about a quarter
of an inch thick. Melt four ounces of butter in a clean stewpan, and
put in the turnips before it begins to boil; stew them gently for three
quarters of an hour, taking care that they shall not brown, then have
the proper quantity of soup ready boiling, pour it to them, and let
them simmer in it for three quarters of an hour. Pulp the whole
through a coarse sieve or soup strainer, put it again on the fire, keep
it stirred until it has boiled three minutes or four, take off the scum,
add salt and pepper if required, and serve it very hot. Turnips, 3 lbs.;
butter, 4 oz.: 3/4 hour. Soup, 2 quarts: 3/4 hour. Last time: three
minutes.
A QUICKLY MADE TURNIP SOUP.
Pare and slice into three pints of veal or mutton stock or of good
broth, three pounds of young mild turnips; stew them gently from
twenty-five to thirty minutes, or until they can be reduced quite to
pulp; rub the whole through a sieve, and add to it another quart of
stock, a seasoning of salt and white pepper, and one lump of sugar:
give it two or three minutes’ boil, skim and serve it. A large white
onion when the flavour is liked may be sliced and stewed with the
turnips. A little cream improves much the colour of this soup.
Turnips, 3 lbs.; soup, 5 pints: 25 to 30 minutes.
POTATO SOUP.
(Soupe à la Bourguignon.)
Clear the fat from five pints of good mutton broth, bouillon, or shin
of beef stock, and strain it through a fine sieve; add to it when it
boils, a pound and a half of good cooking apples, and stew them
down in it very softly to a smooth pulp; press the whole through a
strainer, add a small teaspoonful of powdered ginger and plenty of
pepper, simmer the soup for a couple of minutes, skim, and serve it
very hot, accompanied by a dish of rice, boiled as for curries.
Broth, 5 pints; apples, 1-1/2 lb.: 25 to 40 minutes. Ginger, 1
teaspoonful; pepper, 1/2 teaspoonful: 2 minutes.
PARSNEP SOUP.
Slice into five pints of boiling veal stock or strong colourless broth,
a couple of pounds of parsneps, and stew them as gently as
possible from thirty minutes to an hour; when they are perfectly
tender, press them through a sieve, strain the soup to them, season,
boil, and serve it very hot. With the addition of cream, parsnep soup
made by this receipt resembles in appearance the Palestine soup.
Veal stock or broth, 5 pints; parsneps, 2 lbs.: 30 to 60 minutes.
Salt and cayenne: 2 minutes.
WESTERFIELD WHITE SOUP.
Break the bone of a knuckle of veal in one or two places, and put it
on to stew, with three quarts of cold water to the five pounds of meat;
when it has been quite cleared from scum, add to it an ounce and a
half of salt, and one mild onion, twenty corns of white pepper, and
two or three blades of mace, with a little cayenne pepper. When the
soup is reduced one-third by slow simmering strain it off, and set it
by till cold; then free it carefully from the fat and sediment, and heat it
again in a very clean stewpan. Mix with it when it boils, a pint of thick
cream smoothly blended with an ounce of good arrow-root, two
ounces of very fresh vermicelli previously boiled tender in water
slightly salted and well drained from it, and an ounce and a half of
almonds blanched and cut in strips: give it one minute’s simmer, and
serve it immediately, with a French roll in the tureen.
Veal, 5 lbs.; water, 3 quarts; salt, 1-1/2 oz.; 1 mild onion; 20 corns
white pepper; 2 large blades of mace: 5 hours or more. Cream, 1
pint; almonds, 1-1/2 oz.; vermicelli, 1 oz.: 1 minute. Little thickening if
needed.
Obs.—We have given this receipt without any variation from the
original, as the soup made by it—of which we have often partaken—
seemed always much approved by the guests of the hospitable
country gentleman from whose family it was derived, and at whose
well-arranged table it was very commonly served; but we would
suggest the suppression of the almond spikes, as they seem
unsuited to the preparation, and also to the taste of the present day.
A RICHER WHITE SOUP.
Pound very fine indeed six ounces of sweet almonds, then add to
them six ounces of the breasts of roasted chickens or partridges,
and three ounces of the whitest bread which has been soaked in a
little veal broth, and squeezed very dry in a cloth. Beat these
altogether to an extremely smooth paste; then pour to them boiling
and by degrees, two quarts of rich veal stock; strain the soup
through a fine hair sieve, set it again over the fire, add to it a pint of
thick cream, and serve it, as soon as it is at the point of boiling.
When cream is very scarce, or not easily to be procured, this soup
may be thickened sufficiently without it, by increasing the quantity of
almonds to eight or ten ounces, and pouring to them, after they have
been reduced to the finest paste, a pint of boiling stock, which must
be again wrung from them through a coarse cloth with very strong
pressure: the proportion of meat and bread also should then be
nearly doubled. The stock should be well seasoned with mace and
cayenne before it is added to the other ingredients.
Almonds, 6 oz.; breasts of chickens or partridges, 6 oz.; soaked
bread, 3 oz.; veal stock, 2 quarts; cream, 1 pint.
Obs. 1.—Some persons pound the yolks of four or five hard-boiled
eggs with the almonds, meat, and bread for this white soup; French
cooks beat smoothly with them an ounce or two of whole rice,
previously boiled from fifteen to twenty minutes.
Obs. 2.—A good plain white soup maybe made simply by adding
to a couple of quarts of pale veal stock or strong well-flavoured veal
broth, a thickening of arrow-root, and from half to three quarters of a
pint of cream. Four ounces of macaroni boiled tender and well-
drained may be dropped into it a minute or two before it is dished,
but the thickening may then be diminished a little.
MOCK TURTLE SOUP.
After having taken out the brain and washed and soaked the head
well, pour to it nine quarts of cold water, bring it gently to boil, skim it
very clean, boil it if large an hour and a half, lift it out, and put into the
liquor eight pounds of neck of beef lightly browned in a little fresh
butter, with three or four thick slices of lean ham, four large onions
sliced, three heads of celery, three large carrots, a large bunch of
savoury herbs, the rind of a lemon pared very thin, a dessertspoonful
of peppercorns, two ounces of salt, and after the meat has been
taken from the head, all the bones and fragments. Stew these gently
from six to seven hours, then strain off the stock and set it into a very
cool place, that the fat may become firm enough on the top to be
cleared off easily. The skin and fat of the head should be taken off
together and divided into strips of two or three inches in length, and
one in width; the tongue may be carved in the same manner, or into
dice. Put the stock, of which there ought to be between four and five
quarts, into a large soup or stewpot; thicken it when it boils with four
ounces of fresh butter[29] mixed with an equal weight of fine dry
flour, a half-teaspoonful of pounded mace, and a third as much of
cayenne (it is better to use these sparingly at first, and to add more
should the soup require it, after it has boiled some little time); pour in
half a pint of sherry, stir the whole together until it has simmered for
a minute or two, then put in the head, and let it stew gently from an
hour and a quarter to an hour and a half: stir it often, and clear it
perfectly from scum. Put into it just before it is ready for table three
dozens of small forcemeat-balls; the brain cut into dice (after having
been well soaked, scalded,[30] and freed from the film), dipped into
beaten yolk of egg, then into the finest crumbs mixed with salt, white
pepper, a little grated nutmeg, fine lemon-rind, and chopped parsley
fried a fine brown, well drained and dried; and as many egg-balls,
the size of a small marble, as the yolks of four eggs will supply. (See
Chapter VIII). This quantity will be sufficient for two large tureens of
soup; when the whole is not wanted for table at the same time, it is
better to add wine only to so much as will be required for immediate
consumption, or if it cannot conveniently be divided, to heat the wine
in a small saucepan with a little of the soup, to turn it into the tureen,
and then to mix it with the remainder by stirring the whole gently after
the tureen is filled. Some persons simply put in the cold wine just
before the soup is dished, but this is not so well.
29. When the butter is considered objectionable, the flour, without it, may be
mixed to the smoothest batter possible, with a little cold stock or water, and
stirred briskly into the boiling soup: the spices should be blended with it.
30. The brain should be blanched, that is, thrown into boiling water with a little
salt in it, and boiled from five to eight minutes, then lifted out and laid into
cold water for a quarter of an hour: it must be wiped very dry before it is fried.
Whole calf’s head with skin on, boiled 1-1/2 hour. Stock: neck of
beef, browned in butter, 8 lbs.; lean of ham, 1/2 to 3/4 lb.; onions, 4;
large carrots, 3; heads of celery, 3; large bunch herbs; salt, 2 oz. (as
much more to be added when the soup is made as will season it
sufficiently); thin rind, 1 lemon; peppercorns, 1 dessertspoonful;
bones and trimmings of head: 8 hours. Soup: stock, 4 to 5 quarts;
flour and butter for thickening, of each 4 oz.; pounded mace, half-
teaspoonful; cayenne, third as much (more of each as needed);
sherry, half pint: 2 to 3 minutes. Flesh of head and tongue, nearly or
quite 2 lbs.: 1-1/4 to 1-1/2 hour. Forcemeat-balls, 36; the brain cut
and fried; egg-balls, 16 to 24.
Obs.—When the brain is not blanched it must be cut thinner in the
form of small cakes, or it will not be done through by the time it has
taken enough colour: it may be altogether omitted without much
detriment to the soup, and will make an excellent corner dish if
gently stewed in white gravy for half an hour, and served with it
thickened with cream and arrow-root to the consistency of good
white sauce, then rather highly seasoned, and mixed with plenty of
minced parsley, and some lemon-juice.
GOOD CALF’S HEAD SOUP.
(Not expensive.)
Stew down from six to seven pounds of the thick part of a shin of
beef with a little lean ham, or a slice of hung beef, or of Jewish beef,
trimmed free from the smoky edges, in five quarts of water until
reduced nearly half, with the addition, when it first begins to boil, of
an ounce of salt, a large bunch of savoury herbs, one large onion, a
head of celery, three carrots, two or three turnips, two small blades
of mace, eight or ten cloves, and a few white or black peppercorns.
Let it boil gently that it may not be too much reduced, for six or seven
hours, then strain it into a clean pan and set it by for use. Take out
the bone from half a calf’s head with the skin on (the butcher will do
this if desired), wash, roll, and bind it with a bit of tape or twine, and
lay it into a stewpan, with the bones and tongue; cover the whole
with the beef stock, and stew it for an hour and a half; then lift it into
a deep earthen pan and let it cool in the liquor, as this will prevent
the edges from becoming dry or discoloured. Take it out before it is
quite cold; strain, and skim all the fat carefully from the stock; and
heat five pints in a large clean saucepan, with the head cut into small
thick slices or into inch-squares. As quite the whole will not be
needed, leave a portion of the fat, but add every morsel of the skin to
the soup, and of the tongue also. Should the first of these not be
perfectly tender, it must be simmered gently till it is so; then stir into
the soup from six to eight ounces of fine rice-flour mixed with a
quarter-teaspoonful of cayenne, twice as much freshly pounded
mace, half a wineglassful of mushroom catsup,[31] and sufficient
cold broth or water to render it of the consistence of batter; boil the
whole from eight to ten minutes; take off the scum, and throw in two
glasses of sherry; dish the soup and put into the tureen some
delicately and well fried forcemeat-balls made by the receipt No. 1,
2, or 3, of Chapter VIII. A small quantity of lemon-juice or other acid
can be added at pleasure. The wine and forcemeat-balls may be
omitted, and the other seasonings of the soup a little heightened. As