0% found this document useful (0 votes)
94 views39 pages

40 Algorithms Every Data Scientist Should Know Jurgen Weichenberger Huw Kwon

The document outlines a comprehensive guide to 40 essential AI and ML algorithms for data scientists, covering both foundational and advanced concepts in supervised, unsupervised, semi-supervised, and reinforcement learning. It aims to provide practical support for both new and experienced data scientists, detailing algorithm features, mathematical foundations, and best practices for building AI solutions. The book also explores applications in natural language processing, computer vision, and large-scale algorithms, while looking ahead to the future of quantum machine learning.

Uploaded by

kritika_mini1977
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
94 views39 pages

40 Algorithms Every Data Scientist Should Know Jurgen Weichenberger Huw Kwon

The document outlines a comprehensive guide to 40 essential AI and ML algorithms for data scientists, covering both foundational and advanced concepts in supervised, unsupervised, semi-supervised, and reinforcement learning. It aims to provide practical support for both new and experienced data scientists, detailing algorithm features, mathematical foundations, and best practices for building AI solutions. The book also explores applications in natural language processing, computer vision, and large-scale algorithms, while looking ahead to the future of quantum machine learning.

Uploaded by

kritika_mini1977
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

40 Algorithms

Every Data Scientist


Should Know
Navigating through essential AI and ML algorithms

Jürgen Weichenberger
Huw Kwon

www.bpbonline.com
ii 

First Edition 2025

Copyright © BPB Publications, India

ISBN: 978-93-55519-832

All Rights Reserved. No part of this publication may be reproduced, distributed or transmitted in
any form or by any means or stored in a database or retrieval system, without the prior written
permission of the publisher with the exception to the program listings which may be entered,
stored and executed in a computer system, but they can not be reproduced by the means of
publication, photocopy, recording, or by any electronic and mechanical means.

LIMITS OF LIABILITY AND DISCLAIMER OF WARRANTY


The information contained in this book is true to correct and the best of author’s and publisher’s
knowledge. The author has made every effort to ensure the accuracy of these publications, but
publisher cannot be held responsible for any loss or damage arising from any information in
this book.

All trademarks referred to in the book are acknowledged as properties of their respective
owners but BPB Publications cannot guarantee the accuracy of this information.

www.bpbonline.com
 iii

Dedicated to
My beloved wife: Li and
My Daughter Sophia
– Jürgen

Ross McDonald, Matthew Jones, Edward Huang, Neil Graham,


Raj Dash, Mark Somers, Philip Treleaven, Mark Rowland, Matt
O’Kane, Andy Huang, Huw Tindall, Luigi Masoero, Michael Glinski,
Gerard Crispie, Rob Handcock, George Marcotte
and the three professors named Alan at the University
– Huw
iv 

About the Authors

z Jürgen Weichenberger has been working in Artificial Intelligence and Machine


Learning Development for more than 20 years, playing central roles in numerous
projects as a technical leader and chief data scientist, delivering projects using
numerous different algorithms and even developing entire new algorithms for big
companies, including well succeeded projects in Europe, Asia, and North America.
Currently, he is a Head of AI Strategy and Innovation at Schneider Electric and
a Senior Advisor at P.E.T. Consulting. He is also an accomplished postgraduate
completing a degree in Executive General Management and holds three master’s
degrees focused on Applied Computer Science, Bioinformatics, and Cybernetics.
He has many certifications for various types of Artificial Intelligence Technologies.
Furthermore, the author participates as a speaker in international AI Conferences
and writes technical articles on AI Technologies, Algorithms and related topics.
Based on all his contributions to AI communities worldwide, he was awarded over
20 AI patents.

z Huw Kwon is a well-known authority in AI and ML within the management


consulting world, bringing over two decades of academic rigor and truly extensive
practical experience to the field as a first-hand practitioner. With a strong foundation
in management science and advanced computational algorithms, his work is unified
by a singular focus: leveraging AI to create tangible, sustained, and lasting real-
world impact.
Huw’s career spans senior roles in banking and consulting at some of the world’s
most prestigious firms, including Ernst & Young, Accenture, and McKinsey. In these
roles, he has been pivotal in transforming businesses by developing AI capabilities
and leading complex, large-scale transformation programs across industries such as
financial services, automotive, and high-performance manufacturing.
As a member of several advisory boards and a recognized thought leader, Huw
frequently contributes to industry forums and publications, shaping the discourse
on the future of AI. Beyond consulting, he is a driving force in AI innovation,
working closely with academic institutions and startups to push the boundaries of
what AI and ML can achieve in practical applications.
 v

About the Reviewer

Dr. Zachary Elewitz is a data scientist with over a decade of experience, currently serving
as the Head of AI at Fortune Brands Innovations. He holds two AI-related patents, sits on
Texas A&M Commerceʼs Venture College Board, and serves in several AI-related groups,
including the National Institute of Standards and Technology Generative AI Public
Working Group. In his spare time, he is pursuing a Masters in Viking studies and enjoys
snowboarding, indoor bouldering, and playing the guitar.
vi 

Acknowledgements

 Jürgen: I want to express my deepest gratitude to my family and friends for their
unwavering support and encouragement throughout this book’s writing, especially
my wife Li and my daughter Sophia.

I am also grateful to BPB Publications for their guidance and expertise in bringing
this book to fruition. It was a long journey of revising this book, with valuable
participation and collaboration of reviewers, technical experts, and editors.

I would also like to acknowledge the valuable contributions of my colleagues and


co-workers during many years working in the AI industry, who have taught me so
much and provided valuable feedback on my work.

Finally, I would like to thank all the readers who have taken an interest in this
book and for their support in making it a reality. Your encouragement has been
invaluable.

 Huw: Your guidance and support have been invaluable on my journey to solve real-
world problems with Data and AI. Each of you has been statistically significant in
this pursuit of mine, and for that, I am deeply grateful.
 vii

Preface

Building Artificial Intelligence and Machine Learning Solutions is a complex task that
requires a comprehensive understanding of the latest technologies and alogorithms
available to us. Artificial Intelligence has become an increasingly powerful tool over the
last couple of years and as such the amount of algorithsm available to us have explode.

This book is designed to provide a comprehensive guide through the world of Artificial
Intelligence Algorithms and be a practical and hands-on support to every new data
scientist as well as experienced data scientists. It covers a wide range of topics, including
the basic definition of Artificial Intelligence and Machine Learning, basic data concepts,
and basic and advanced algorithms for supervised, unsupervised, semi-supervised, and
reinforcement learning algorithms.

Throughout the book, you will learn about the key features of every algorithm, their
mathematical foundation, and how to use them to build Artificial Intelligence solutions
that are efficient, reliable, and easy to maintain. You will also learn about best practices and
design patterns for Artificial Intelligence solutions and will be provided with numerous
practical examples to help you understand the algorithms.

This book is intended for new data scientists who want to learn which algorithms are
available and how to build Artificial Intelligence solutions with them. It is also helpful for
experienced data scientists who want to expand their knowledge of these algorithms and
improve their skills in building robust and reliable Artificial Intelligence solutions.

With this book, you will gain the knowledge and skills to become a proficient data scientist
and be able to build Artificial Intelligence solutions we hope you will find this book
informative and helpful.

Chapter 1: Fundamentals – Introduction into the world of AI and ML algorithms covering


a little historical extract to the origins of AI and how it has developed to what we know
today. Every modern AI/ML algorithm follows a basic structure which assures that
the training process will converge and the inference will deliver a reasonable result.
Furthermore, it will cover the process of retraining an algorithm to refit its parameters and
hyperparameters.

Chapter 2: Typical Data Structures – An AI/ML algorithm can neither be trained nor run
an inference without being fed with the right data structure. The process of preparing the
data is known as feature engineering and requires the right school of thought.
viii 

Chapter 3: 40 AI/ML Algorithms Overview – Introduction to 40 AI/ML algorithms,


including the classification and structure for the 40 algorithms, in the following chapters.

Chapter 4: Basic Supervised Learning Algorithms – Chapters 4-11 will cover the 40
algorithms which comprise the core of the book.

This chapter covers essential supervised learning algorithms in machine learning. It


starts with Linear Regression, a widely used method for modeling the linear relationship
between input features and continuous target variables. It then introduces Logistic
Regression, commonly used for binary classification by estimating the probability of class
membership. The chapter also explores Decision Trees, which partition data based on
feature values and are applicable to both regression and classification tasks, often forming
the basis of Random Forests—an ensemble model combining multiple decision trees for
more accurate predictions. Lastly, Naive Bayes is discussed, a probabilistic algorithm
based on Bayes’ theorem, known for its efficiency in tasks like text classification and spam
filtering.

Chapter 5: Advanced Supervised Learning Algorithms – This chapter explores advanced


supervised learning algorithms, starting with k-Nearest Neighbors (k-NN), which predicts
based on the majority vote of nearest neighbors and is effective for non-linear boundaries.
It covers Support Vector Machines (SVMs), which find an optimal hyperplane to separate
classes or predict values by maximizing class margins. The chapter also examines Gradient
Boosting Machines (GBM), an ensemble method that combines weak models to build
a strong predictor by focusing on errors from previous models. XGBoost, an optimized
gradient boosting method known for its performance and scalability, is also discussed.
Finally, it introduces Neural Networks, complex models inspired by the brain that learn
intricate patterns through layers of artificial neurons, driving advancements in areas like
image recognition and natural language processing.

Chapter 6: Basic Unsupervised Learning Algorithms – This chapter explores basic


unsupervised learning algorithms, beginning with K-means Clustering, which groups
data into clusters based on proximity. It also covers Hierarchical Clustering, which builds
a tree-like structure of clusters by merging or splitting based on similarity. Principal
Component Analysis (PCA) is introduced as a technique for reducing dimensionality
while preserving key features by identifying principal components. t-Distributed
Stochastic Neighbor Embedding (t-SNE) is discussed for visualizing high-dimensional
data in lower dimensions, emphasizing local structures. Finally, Association Rule Mining
with the A priori Algorithm is examined for discovering relationships between items in a
dataset by identifying frequent item sets and generating association rules.
 ix

Chapter 7: Advanced Unsupervised Learning Algorithms – This chapter covers


advanced unsupervised learning algorithms, including Density-Based Spatial Clustering
of Applications with Noise (DBSCAN), which identifies clusters of varying shapes
and handles noise. It discusses Gaussian Mixture Models (GMM), which model data
as a mixture of Gaussian distributions to uncover subpopulations. Autoencoders are
introduced for unsupervised representation learning and dimensionality reduction. The
chapter also explores Anomaly Detection, which identifies rare or unusual instances
using various techniques. Finally, Latent Dirichlet Allocation (LDA) is covered for topic
modeling, discovering hidden topics in documents, and assigning topic distributions.

Chapter 8: Basic Reinforcement Learning Algorithms – This chapter covers basic


reinforcement learning algorithms, starting with Q-Learning, which estimates optimal
action values for state-action pairs through iterative updates. It then introduces Deep
Q-Networks (DQN), which uses deep neural networks to handle high-dimensional
state spaces. Policy Gradient Methods optimize policy parameters directly to maximize
rewards with algorithms like REINFORCE and Proximal Policy Optimization (PPO).
The chapter also explores Advantage Actor-Critic (A2C), which combines policy gradient
and value-based methods for stable learning. Finally, Trust Region Policy Optimization
(TRPO) improves policies iteratively while staying close to the original policy using trust
regions.

Chapter 9: Advanced Reinforcement Learning Algorithms – This chapter covers advanced


reinforcement learning algorithms, starting with Asynchronous Advantage Actor-Critic
(A3C), which uses parallel agents to improve sample efficiency and learning speed.
Proximal Policy Optimization (PPO) is discussed next, using a trust region approach
for stable policy updates. Deep Deterministic Policy Gradient (DDPG) combines deep
Q-networks with actor-critic methods for continuous action spaces, while Twin Delayed
Deep Deterministic Policy Gradient (TD3) enhances DDPG by addressing overestimation
with multiple critics and delayed updates. Finally, Soft Actor-Critic (SAC) is introduced,
optimizing both reward and exploration using the maximum entropy framework.

Chapter 10: Basic Semi-Supervised Learning Algorithms – This chapter covers basic semi-
supervised learning algorithms, including Self-training, where a model iteratively adds
high-confidence predictions from unlabeled data to its training set. Co-training involves
multiple models training on different data views and refining each other’s predictions.
Multi-view Learning enhances learning by using various data representations to ensure
prediction agreement. Expectation-Maximization (EM) estimates parameters and missing
labels in probabilistic models. Finally, Graph-based Methods propagate labels from labeled
to unlabeled data using the data’s structure, with techniques like Label Propagation and
Manifold Regularization.
x 

Chapter 11: Advanced Semi-Supervised Learning Algorithms – This chapter covers


advanced semi-supervised learning algorithms, including Transductive Support Vector
Machines (TSVM), which uses both labeled and unlabeled data to learn decision
boundaries. Co-regularization combines different regularization strategies to maintain
consistency and reduce sensitivity to noisy labels. Deep Generative Models, such as
VAEs and GANs, learn from both labeled and unlabeled data to generate new samples
and representations. Virtual Adversarial Training (VAT) adds robustness to models by
addressing adversarial perturbations from both data types. Tri-training trains three models
on different labeled feature subsets, using their consistent predictions on unlabeled data to
expand the labeled set.

Chapter 12: Natural Language Processing – Natural Language Processing (NLP) is a


subfield of computer science, artificial intelligence, and computational linguistics that
focuses on the interaction between computers and humans in natural language. NLP
enables computers to process, understand, and generate natural language, which is the
language used by humans to communicate with each other.

NLP involves developing algorithms and computational models that can analyze,
interpret, and generate human language, including tasks such as language translation,
sentiment analysis, text summarization, speech recognition, and language generation.

The goal of NLP is to enable computers to understand and respond to natural language
in the same way that humans do, allowing for more natural and intuitive communication
between humans and machines. NLP has applications in a wide range of fields, including
machine translation, information retrieval, customer service, healthcare, and education.

Chapter 13: Computer Vision – Computer vision is a field of artificial intelligence and
computer science that focuses on enabling computers to interpret and understand the
visual world around them, similar to how humans perceive and process visual information.

Computer vision involves developing algorithms and computational models that can
analyze and interpret images and videos. This includes tasks such as object detection,
image classification, facial recognition, scene understanding, and image segmentation.

The field of computer vision has made significant progress in recent years, with the
development of deep learning algorithms and convolutional neural networks, which have
led to breakthroughs in tasks such as image recognition and object detection.

Computer vision has many applications in various industries, including healthcare,


transportation, retail, and entertainment. For example, it can be used for medical image
analysis, self-driving cars, visual search in e-commerce, and augmented reality in gaming
and entertainment.
 xi

Chapter 14: Large-Scale Algorithms – Large-scale algorithms are computational methods


designed to handle massive amounts of data, such as those generated by modern digital
technologies. These algorithms typically involve processing large datasets in parallel
or distributed systems and require specialized hardware and software architectures to
achieve high performance.

The development of large-scale algorithms has become increasingly important in recent


years due to the exponential growth of data generated by various sources such as social
media, scientific simulations, and internet of things (IoT) devices. Large-scale algorithms
are needed to handle these large and complex datasets efficiently and effectively.

Examples of large-scale algorithms include distributed machine learning algorithms such


as Spark MLlib and TensorFlow, graph processing algorithms such as Apache Giraph and
GraphX, and parallel processing algorithms such as Hadoop MapReduce and Apache
Flink. These algorithms are widely used in various industries, such as finance, healthcare,
and social media to analyze large datasets and make data-driven decisions.

Chapter 15: Outlook into the Future: Quantum Machine Learning – Quantum machine
learning is an emerging field that combines quantum computing and machine learning.
Quantum computing uses the principles of quantum mechanics to perform certain
computations much faster than classical computing. Machine learning, on the other hand,
involves developing algorithms that can learn patterns and insights from data.

Quantum machine learning aims to leverage the power of quantum computing to develop
more efficient algorithms for machine learning tasks, such as classification, clustering, and
regression. These algorithms could potentially provide significant speedup and better
accuracy compared to classical machine learning algorithms.

There are various approaches to quantum machine learning, including quantum-


inspired classical algorithms, quantum-enhanced classical algorithms, and fully quantum
algorithms. Some of the challenges in quantum machine learning include designing
quantum algorithms that can take advantage of the unique properties of quantum
computing, such as superposition and entanglement, and developing hardware and
software infrastructure for quantum computing that can support large-scale machine
learning tasks.

Quantum machine learning has the potential to revolutionize many industries, including
finance, healthcare, and cybersecurity, by providing faster and more accurate predictions
and insights from large datasets.
xii 

Code Bundle and Coloured Images

Please follow the link to download the


Code Bundle and the Coloured Images of the book:

https://rebrand.ly/8qenuj3
The code bundle for the book is also hosted on GitHub at
https://github.com/bpbpublications/40-Algorithms-Every-Data-Scientist-Should-Know.
In case there’s an update to the code, it will be updated on the existing GitHub repository.

We have code bundles from our rich catalogue of books and videos available at
https://github.com/bpbpublications. Check them out!

Errata
We take immense pride in our work at BPB Publications and follow best practices to en-
sure the accuracy of our content to provide with an indulging reading experience to our
subscribers. Our readers are our mirrors, and we use their inputs to reflect and improve
upon human errors, if any, that may have occurred during the publishing processes in-
volved. To let us maintain the quality and help us reach out to any readers who might be
having difficulties due to any unforeseen errors, please write to us at :
[email protected]

Your support, suggestions and feedbacks are highly appreciated by the BPB Publications’
Family.

Did you know that BPB offers eBook versions of every book published, with PDF
and ePub files available? You can upgrade to the eBook version at www.bpbonline.
com and as a print book customer, you are entitled to a discount on the eBook copy.
Get in touch with us at :
[email protected] for more details.

At www.bpbonline.com, you can also read a collection of free technical articles,


sign up for a range of free newsletters, and receive exclusive discounts and offers
on BPB books and eBooks.
 xiii

Piracy
If you come across any illegal copies of our works in any form on the internet,
we would be grateful if you would provide us with the location address or
website name. Please contact us at [email protected] with a link to
the material.

If you are interested in becoming an author


If there is a topic that you have expertise in, and you are interested in either
writing or contributing to a book, please visit www.bpbonline.com. We have
worked with thousands of developers and tech professionals, just like you, to
help them share their insights with the global tech community. You can make
a general application, apply for a specific hot topic that we are recruiting an
author for, or submit your own idea.

Reviews
Please leave a review. Once you have read and used this book, why not leave
a review on the site that you purchased it from? Potential readers can then see
and use your unbiased opinion to make purchase decisions. We at BPB can
understand what you think about our products, and our authors can see your
feedback on their book. Thank you!

For more information about BPB, please visit www.bpbonline.com.

Join our book's Discord space


Join the book's Discord Workspace for Latest updates, Offers, Tech happenings around the
world, New Release and Sessions with the Authors:

https://discord.bpbonline.com
xiv 

Table of Contents

1. Fundamentals.......................................................................................................................... 1
Introduction............................................................................................................................. 1
Structure .................................................................................................................................. 1
Objectives................................................................................................................................. 2
Fundamentals of AI and ML................................................................................................. 2
Defining AI and ML................................................................................................................ 3
Artificial Intelligence........................................................................................................... 3
Machine learning................................................................................................................ 4
History of AI and ML............................................................................................................. 4
Classic examples of AI and ML........................................................................................... 6
AI and ML algorithms............................................................................................................ 9
Examples of AI and ML algorithms.................................................................................. 10
Structure of a typical AI and ML algorithm..................................................................... 15
Conclusion............................................................................................................................. 16
Points to remember............................................................................................................... 16

2. Typical Data Structures....................................................................................................... 19


Introduction .......................................................................................................................... 19
Structure................................................................................................................................. 19
Objectives............................................................................................................................... 19
Introducing data structures................................................................................................. 20
Examples of typical data structures.................................................................................. 21
Arrays....................................................................................................................... 21
Matrices.................................................................................................................... 22
Tensors...................................................................................................................... 24
Linked lists................................................................................................................ 26
Graphs....................................................................................................................... 28
Hash tables................................................................................................................ 30
 xv

Queues...................................................................................................................... 32
Trees.......................................................................................................................... 34
Knowledge graph....................................................................................................... 36
Conclusion............................................................................................................................. 39
Points to remember............................................................................................................... 40
Exercises................................................................................................................................. 40
Exercise 1: Data preparation for sentiment analysis......................................................... 40
Exercise 2: Data preparation for image classification....................................................... 41

3. 40 AI/ML Algorithms Overview........................................................................................ 43


Introduction........................................................................................................................... 43
Structure................................................................................................................................. 43
Objectives............................................................................................................................... 44
Classification of AI and ML algorithms............................................................................. 44
Supervised learning algorithms........................................................................................ 44
Unsupervised learning algorithms.................................................................................... 46
Reinforcement learning algorithms................................................................................... 47
Semi-supervised learning algorithms................................................................................ 48
Overview of the 40 AI/ML algorithms.............................................................................. 49
Supervised learning algorithms........................................................................................ 49
Unsupervised learning algorithms.................................................................................... 50
Reinforcement learning algorithms................................................................................... 51
Semi-supervised learning algorithms................................................................................ 52
Conclusion............................................................................................................................. 53
Points to remember............................................................................................................... 53

4. Basic Supervised Learning Algorithms........................................................................... 55


Introduction........................................................................................................................... 55
Structure................................................................................................................................. 55
Objectives............................................................................................................................... 55
Introduction to supervised learning................................................................................... 56
Linear regression............................................................................................................... 57
Mathematical foundation.......................................................................................... 59
xvi 

Advantages of linear regression................................................................................ 60


Disadvantages of linear regression........................................................................... 61
Real-world applications............................................................................................ 61
Real-world coding example....................................................................................... 62
Logistic regression............................................................................................................. 64
Mathematical foundation.......................................................................................... 66
Advantages of logistic regression.............................................................................. 67
Disadvantages of logistic regression......................................................................... 68
Real-world applications............................................................................................ 69
Real-world coding example....................................................................................... 69
Decision trees.................................................................................................................... 71
Mathematical foundation.......................................................................................... 73
Advantages................................................................................................................ 74
Disadvantages........................................................................................................... 75
Real-world applications............................................................................................ 75
Real-world coding example....................................................................................... 76
Random forests.................................................................................................................. 77
Mathematical foundation.......................................................................................... 79
Advantages................................................................................................................ 81
Disadvantages........................................................................................................... 81
Real-world applications............................................................................................ 82
Real-world coding example....................................................................................... 83
Support Vector Machines.................................................................................................. 84
Mathematical foundation.......................................................................................... 87
Advantages of SVM algorithms............................................................................... 88
Disadvantages of SVM algorithms........................................................................... 89
Real-world applications............................................................................................ 90
Real-world coding example....................................................................................... 91
Conclusion............................................................................................................................. 92
Points to remember............................................................................................................... 92
 xvii

Exercises................................................................................................................................. 93
Linear regression exercise: Predict house prices................................................................ 93
Logistic regression exercise: Predicting customer churn.................................................. 95
Decision tree exercise: Diagnosing plant diseases............................................................. 97
Random forest exercise: Predicting wine quality.............................................................. 98
SVM exercise: Classifying handwritten digits............................................................... 101

5. Advanced Supervised Learning Algorithms................................................................ 103


Introduction......................................................................................................................... 103
Structure............................................................................................................................... 103
Objectives............................................................................................................................. 103
Introduction into advanced supervised learning........................................................... 104
Naive Bayes.......................................................................................................................... 104
Mathematical foundation................................................................................................ 106
Bayesʼ theorem........................................................................................................ 106
Application to classification.................................................................................... 107
Independence assumption....................................................................................... 107
Types of Naïve Bayes....................................................................................................... 107
Parameter estimation...................................................................................................... 108
Advantages and disadvantages....................................................................................... 108
Advantages of Naive Bayes algorithms.................................................................. 108
Disadvantages of Naive Bayes algorithms.............................................................. 109
Real-world applications................................................................................................... 109
Real-world coding example............................................................................................. 110
k-Nearest Neighbors........................................................................................................... 112
Mathematical foundation................................................................................................ 114
Basic idea................................................................................................................. 114
Distance metrics..................................................................................................... 114
Making decisions.................................................................................................... 114
Choosing k............................................................................................................... 116
Complexity of the algorithm................................................................................... 116
xviii 

Advantages and disadvantages....................................................................................... 116


Advantages of k-NN algorithms............................................................................. 116
Disadvantages of k-NN algorithms........................................................................ 117
Real world applications................................................................................................... 117
Real-world coding example............................................................................................. 118
Neural networks.................................................................................................................. 120
Neural network architecture........................................................................................... 120
Forward propagation............................................................................................... 121
Backpropagation...................................................................................................... 121
Training process.............................................................................................................. 121
Mathematical foundation................................................................................................ 123
Basic neuron model................................................................................................. 123
Activation functions............................................................................................... 124
Feedforward mechanism:......................................................................................... 124
Cost function........................................................................................................... 124
Backpropagation...................................................................................................... 125
Optimization........................................................................................................... 125
Regularization......................................................................................................... 125
Advantages and disadvantages ...................................................................................... 125
Advantages of Neural network algorithms............................................................. 125
Disadvantages of neural network algorithms......................................................... 126
Real-world applications................................................................................................... 127
Real-world coding example............................................................................................. 127
Gradient Boosting Machines............................................................................................. 130
Mathematical foundation................................................................................................ 132
Boosting.................................................................................................................. 132
Objective function................................................................................................... 132
Loss function........................................................................................................... 133
Regularization......................................................................................................... 133
Gradient boosting.................................................................................................... 133
 xix

Shrinkage................................................................................................................ 133
Stopping criteria..................................................................................................... 134
Advantages and disadvantages of GBM algorithms....................................................... 134
Advantages of GBM algorithms............................................................................. 134
Disadvantages of GBM algorithms......................................................................... 135
Real-world applications for GBM algorithms................................................................. 135
Real-world GBM coding example................................................................................... 137
XGBoost................................................................................................................................ 138
Mathematical foundation................................................................................................ 141
Objective function................................................................................................... 141
Regularization......................................................................................................... 141
Taylor expansion for approximation....................................................................... 142
Optimal leaf weights............................................................................................... 142
Pruning................................................................................................................... 142
Handling missing values........................................................................................ 142
Column block and parallelization........................................................................... 143
Advantages and disadvantages of XGBoost algorithms ................................................ 143
Advantages of XGBoost algorithms........................................................................ 143
Disadvantages of XGBoost algorithms................................................................... 144
Real-world applications for XGBoost algorithms........................................................... 144
Real-world XGBoost coding example.............................................................................. 145
Conclusion........................................................................................................................... 147
Points to remember............................................................................................................. 147
Exercises and solutions...................................................................................................... 148
Naive Bayes exercise: Classifying email messages as spam or not spam........................ 148
k-NN exercise: Classifying types of flowers based on measurements............................. 150
Neural Network exercise: Handwritten digit classification with the MNIST dataset... 152
GBM exercise: Predicting house prices with the Boston Housing dataset..................... 154
XGBoost exercise: Predicting iabetes Ouotcomes with the
Pima Indians Diabetes dataset................................................................................... 156
xx 

6. Basic Unsupervised Learning Algorithms.................................................................... 159


Introduction......................................................................................................................... 159
Structure............................................................................................................................... 159
Objectives............................................................................................................................. 159
Introduction to unsupervised learning............................................................................ 160
Key concepts in unsupervised learning........................................................................... 160
Popular basic unsupervised learning algorithms............................................................ 160
K-means clustering............................................................................................................. 161
Mathematical foundation................................................................................................ 163
Properties and considerations......................................................................................... 164
Advantages and disadvantages....................................................................................... 164
Real world applications................................................................................................... 166
Real world coding example.............................................................................................. 166
Hierarchical clustering....................................................................................................... 168
Mathematical foundation................................................................................................ 170
Properties and considerations......................................................................................... 171
Advantages and disadvantages....................................................................................... 172
Real world applications................................................................................................... 173
Real world coding example.............................................................................................. 174
Principal Component Analysis......................................................................................... 176
Mathematical foundation ............................................................................................... 178
Properties of PCA............................................................................................................ 178
Limitations...................................................................................................................... 179
Advantages and disadvantages....................................................................................... 179
Real world applications................................................................................................... 180
Real world principal coding example.............................................................................. 181
t-Distributed Stochastic Neighbor Embedding.............................................................. 183
Mathematical foundation................................................................................................ 185
Conditional probabilities in high-dimensional space.............................................. 185
Conditional probabilities in low-dimensional space............................................... 185
Symmetrized version............................................................................................... 185
 xxi

Cost function........................................................................................................... 186


Gradient descent..................................................................................................... 186
t-Distribution.......................................................................................................... 186
Key points........................................................................................................................ 186
Limitations and considerations....................................................................................... 186
Advantages and disadvantages....................................................................................... 187
Advantages of t-SNE algorithms............................................................................ 187
Disadvantages of t-SNE algorithms....................................................................... 187
Real-world applications................................................................................................... 188
Real-world coding example............................................................................................. 189
Association Rule Mining: A priori Algorithm................................................................ 191
Mathematical foundation................................................................................................ 193
Basic terminologies................................................................................................. 193
A priori principle.................................................................................................... 193
Advantages and disadvantages....................................................................................... 194
Advantages.............................................................................................................. 195
Disadvantages......................................................................................................... 195
Real world applications................................................................................................... 196
Real-world coding example............................................................................................. 196
Conclusion........................................................................................................................... 198
Points to remember............................................................................................................. 198
Exercises and solutions...................................................................................................... 199
Using K-means clustering to segment mall customers based on spending habits.......... 199
Categorizing iris flowers based on features using hierarchical clustering................. 201
Dimensionality reduction for breast cancer data visualization using PCA................... 203
Visualizing MNIST dataset in 2D using t-SNE............................................................ 205
Analyzing retail transactions using Association Rule Mining...................................... 207

7. Advanced Unsupervised Learning Algorithms............................................................ 209


Introduction......................................................................................................................... 209
Structure .............................................................................................................................. 209
Objectives............................................................................................................................. 209
xxii 

Introduction to unsupervised learning............................................................................ 210


Density-Based Spatial Clustering of Applications with Noise..................................... 211
Mathematical foundations............................................................................................... 212
Main concepts of DBSCAN.................................................................................... 212
DBSCAN algorithm............................................................................................... 213
Advantages and disadvantages....................................................................................... 213
Advantages of DBSCAN........................................................................................ 214
Disadvantages of DBSCAN.................................................................................... 214
Real-world applications................................................................................................... 215
Real-world coding example............................................................................................. 215
Gaussian Mixture Models.................................................................................................. 217
Expectation-Maximization for GMM............................................................................. 218
Mathematical foundations............................................................................................... 219
Gaussian distribution............................................................................................. 220
Mixture of Gaussians.............................................................................................. 220
Expectation-Maximization for GMMs................................................................... 220
Advantages and disadvantages....................................................................................... 221
Advantages.............................................................................................................. 221
Disadvantages......................................................................................................... 222
Real-world applications................................................................................................... 222
Real-world coding example............................................................................................. 223
Autoencoders....................................................................................................................... 224
Defining an autoencoder......................................................................................... 224
Loss function........................................................................................................... 225
Applications............................................................................................................ 225
Simple Autoencoder with TensorFlow and Keras................................................... 225
Mathematical foundations............................................................................................... 227
Objective function................................................................................................... 227
Encoder and decoder functions............................................................................... 228
Variational Autoencoders....................................................................................... 228
 xxiii

Bottleneck and sparsity........................................................................................... 228


Regularization and noise........................................................................................ 229
Importance of Autoencoders in Generative AI................................................................ 229
Advantages and disadvantages....................................................................................... 230
Advantages.............................................................................................................. 230
Disadvantages......................................................................................................... 231
Real-world applications................................................................................................... 232
Real-world coding example............................................................................................. 232
Using Autoencoders for image abstraction............................................................. 232
Anomaly detection: Outlier detection.............................................................................. 234
Coding example using Scikit-learn: Isolation forest....................................................... 235
Mathematical foundation................................................................................................ 236
Advantages and disadvantages....................................................................................... 238
Advantages.............................................................................................................. 238
Disadvantages......................................................................................................... 238
Real world applications................................................................................................... 239
Real-world coding example............................................................................................. 240
Latent Dirichlet Allocation................................................................................................ 242
Mathematical foundation................................................................................................ 244
Basics....................................................................................................................... 244
LDA generative process.......................................................................................... 244
Variable Definitions................................................................................................ 244
Goal......................................................................................................................... 244
Gibbs sampling for LDA......................................................................................... 245
Application.............................................................................................................. 245
Advantages and disadvantages....................................................................................... 245
Advantages of LDA................................................................................................. 245
Disadvantages of LDA............................................................................................ 246
Real-world applications................................................................................................... 247
Real-world coding example............................................................................................. 247
xxiv 

Conclusion........................................................................................................................... 249
Points to remember............................................................................................................. 249
Exercises and solutions...................................................................................................... 250
DBSCAN Exercise: Clustering geographical data.......................................................... 250
GMM exercise: Clustering customer spending data...................................................... 252
Autoencoder exercise: Image denoising........................................................................... 254
Anomaly detection exercise: Detecting fraudulent transactions.................................... 256
LDA exercise: Topic modelling on news articles............................................................. 258

8. Basic Reinforcement Learning Algorithms................................................................... 261


Introduction......................................................................................................................... 261
Structure............................................................................................................................... 261
Objectives............................................................................................................................. 262
Introduction to reinforcement learning........................................................................... 262
Reinforcement learning process....................................................................................... 262
Key elements.................................................................................................................... 263
Reinforcement learning algorithms................................................................................. 263
Q-learning............................................................................................................................ 264
Mathematical foundation................................................................................................ 267
Q-learning update rule........................................................................................... 267
Intuition.................................................................................................................. 268
Convergence............................................................................................................ 268
Advantages and disadvantages....................................................................................... 268
Advantages.............................................................................................................. 268
Disadvantages......................................................................................................... 269
Real world applications................................................................................................... 270
Real-world coding example............................................................................................. 270
Deep Q-Networks............................................................................................................... 273
Mathematical foundation................................................................................................ 276
Mathematical relationship with Q-learning........................................................... 277
Challenges and extensions...................................................................................... 277
Advantages and disadvantages....................................................................................... 277
 xxv

Advantages of DQNs.............................................................................................. 278


Disadvantages of DQNs......................................................................................... 278
Real world applications................................................................................................... 279
Real-world coding example............................................................................................. 279
Policy Gradient Methods................................................................................................... 282
Reinforce algorithm......................................................................................................... 282
Mathematical foundation................................................................................................ 284
Intuition.................................................................................................................. 285
Objective function................................................................................................... 285
Policy Gradient theorem......................................................................................... 285
Reinforce.................................................................................................................. 285
Challenges and extensions...................................................................................... 286
Advantages and disadvantages....................................................................................... 286
Advantages of PGMs.............................................................................................. 286
Disadvantages of PGMs......................................................................................... 287
Real world applications................................................................................................... 287
Real world coding example.............................................................................................. 288
Advantage Actor-Critic...................................................................................................... 290
Mathematical foundation................................................................................................ 294
Intuition and advantages........................................................................................ 294
Advantages and disadvantages....................................................................................... 295
Real-world applications................................................................................................... 296
Real-world coding example............................................................................................. 296
Trust Region Policy Optimization ................................................................................... 299
Mathematical foundation................................................................................................ 302
Objective function................................................................................................... 302
Surrogate objective function................................................................................... 302
Trust region constraint........................................................................................... 303
Optimization........................................................................................................... 303
Intuition.................................................................................................................. 303
xxvi 

Advantages and disadvantages....................................................................................... 303


Real-world applications.......................................................................................... 304
Real world coding example..................................................................................... 305
Conclusion........................................................................................................................... 306
Points to remember............................................................................................................. 307
Exercises and solutions...................................................................................................... 308
Q-learning exercise: Navigate a grid world.................................................................... 308
DQN exercise: CartPole balancing with DQN............................................................... 310
Policy gradient exercise: Solve the LunarLander environment...................................... 312
A2C exercise: MountainCar continuous control............................................................ 314
TRPO exercise: Robot locomotion using TRPO............................................................. 315

9. Advanced Reinforcement Learning Algorithms.......................................................... 319


Introduction......................................................................................................................... 319
Structure............................................................................................................................... 319
Objectives............................................................................................................................. 320
Introduction into Reinforcement learning....................................................................... 320
Advanced Reinforcement Learning algorithms............................................................. 320
Asynchronous Advantage Actor-Critic.......................................................................... 321
Mathematical foundation........................................................................................ 324
Advantages and disadvantages............................................................................... 325
Real world applications........................................................................................... 326
Real-world coding example..................................................................................... 326
Proximal Policy Optimization........................................................................................ 328
Mathematical foundation........................................................................................ 330
Advantages and disadvantages............................................................................... 331
Real-world applications.......................................................................................... 332
Real-world coding example..................................................................................... 332
Deep Deterministic Policy Gradient............................................................................... 334
Mathematical foundation........................................................................................ 335
Advantages and disadvantages............................................................................... 336
 xxvii

Real-world applications.......................................................................................... 337


Real-world coding example..................................................................................... 338
Twin Delayed Deep Deterministic Policy Gradient........................................................ 339
Mathematical foundation........................................................................................ 340
Advantages and disadvantages............................................................................... 341
Real-world applications.......................................................................................... 342
Real-world coding example..................................................................................... 343
Soft Actor-Critic.............................................................................................................. 344
Mathematical foundation........................................................................................ 345
Advantages and disadvantages............................................................................... 347
Real-world applications.......................................................................................... 348
Real-world coding example..................................................................................... 348
Conclusion........................................................................................................................... 349
Points to remember............................................................................................................. 350
Exercises and solutions...................................................................................................... 351
A3C exercise: Implementing asynchronous training for the CartPole game.................. 351
PPO exercise: Balancing the Lunar Lander with PPO................................................... 353
DDPG exercise: Navigating a Pendulum using DDPG................................................ 355
TD3 exercise: Controlling a Bipedal Robot with TD3.................................................... 357
Soft Actor-Critic exercise: Training an Agent to Balance a Pendulum.......................... 359

10. Basic Semi-Supervised Learning Algorithms............................................................... 361


Introduction......................................................................................................................... 361
Structure............................................................................................................................... 362
Objectives............................................................................................................................. 362
Introduction to semi-supervised learning....................................................................... 362
Need for semi-supervised learning.................................................................................. 362
Techniques in semi-supervised learning.......................................................................... 363
Challenges....................................................................................................................... 363
Self-training.......................................................................................................................... 364
Mathematical foundation................................................................................................ 366
xxviii 

Intuition.................................................................................................................. 366
Risks........................................................................................................................ 367
Advantages and disadvantages....................................................................................... 367
Real-world applications................................................................................................... 368
Real-world coding example............................................................................................. 369
Scenario................................................................................................................... 369
Co-training........................................................................................................................... 371
Mathematical foundation................................................................................................ 373
Assumptions........................................................................................................... 373
The Algorithm......................................................................................................... 374
Mathematical justification...................................................................................... 374
Advantages and disadvantages....................................................................................... 375
Real-world applications................................................................................................... 376
Real-world coding example............................................................................................. 376
Multi-view learning............................................................................................................ 378
Coding example using co-training.................................................................................. 379
Mathematical foundation................................................................................................ 380
Co-training.............................................................................................................. 380
Multiple kernel learning......................................................................................... 380
Canonical correlation analysis based methods........................................................ 381
Shared and individual feature learning:................................................................. 381
Joint and individual feature learning...................................................................... 381
Advantages and disadvantages....................................................................................... 381
Real-world applications................................................................................................... 383
Real-world coding example............................................................................................. 383
Scenario................................................................................................................... 383
Expectation-Maximization................................................................................................. 385
Mathematical foundation........................................................................................ 387
Advantages and Disadvantages.............................................................................. 388
Real-world applications.......................................................................................... 389
 xxix

Real-world coding example..................................................................................... 390


Graph-based methods........................................................................................................ 391
Label propagation as a graph-based semi-supervised learning example......................... 392
Mathematical foundation................................................................................................ 393
Objective function................................................................................................... 393
Laplacian matrix..................................................................................................... 394
Label propagation.................................................................................................... 394
Algorithm steps....................................................................................................... 394
Regularization and modifications........................................................................... 394
Advantages and disadvantages....................................................................................... 394
Real world applications................................................................................................... 396
Real world coding example.............................................................................................. 396
Conclusion........................................................................................................................... 398
Points to remember............................................................................................................. 399
Exercises and solutions...................................................................................................... 400
Exercise: Implementing a self-training algorithm for text classification........................ 400
Exercise: Implementing a o-training algorithm for sentiment analysis......................... 401
Exercise: Implementing a multi-view learning algorithm for
image and text classification...................................................................................... 403
Exercise: Implementing the Expectation-Maximization algorithm
for Gaussian Mixture Models.................................................................................... 404
Exercise: Implementing a graph-based semi-supervised learning algorithm.................. 406

11. Advanced Semi-Supervised Learning Algorithms...................................................... 409


Introduction......................................................................................................................... 409
Structure .............................................................................................................................. 409
Objectives............................................................................................................................. 410
Introduction to semi-supervised learning....................................................................... 410
Transductive Support Vector Machines........................................................................... 410
Mathematical foundation................................................................................................ 412
Formulation............................................................................................................ 412
Transductive inference............................................................................................ 412
xxx 

Mathematical properties......................................................................................... 413


Advantages and disadvantages....................................................................................... 413
Real-world applications................................................................................................... 414
Real-world coding example............................................................................................. 415
Co-regularization: Label propagation.............................................................................. 416
Mathematical foundation................................................................................................ 418
Objective function................................................................................................... 419
Advantages and disadvantages....................................................................................... 420
Real world applications................................................................................................... 421
Real-world coding example............................................................................................. 422
Deep generative models..................................................................................................... 423
Mathematical foundation................................................................................................ 426
Variational Autoencoders in semi-supervised learning:......................................... 426
Generative Adversarial Networks in semi-supervised learning............................. 426
Unified objective for semi-supervised deep generative models............................... 426
Advantages and disadvantages....................................................................................... 427
Real-world applications................................................................................................... 428
Real-world coding example............................................................................................. 429
Virtual Adversarial Training.............................................................................................. 431
Mathematical foundation................................................................................................ 434
Mathematical notation............................................................................................ 434
Objective function................................................................................................... 434
Loss function........................................................................................................... 434
Algorithm................................................................................................................ 434
Advantages and disadvantages............................................................................... 435
Real world applications................................................................................................... 436
Real world coding example.............................................................................................. 437
Tri-training........................................................................................................................... 439
Mathematical foundation................................................................................................ 441
Ensemble learning................................................................................................... 441
Bootstrapping.......................................................................................................... 442
 xxxi

Voting mechanism................................................................................................... 442


Iterative refinement................................................................................................. 442
Consensus............................................................................................................... 442
Error rate................................................................................................................. 442
Advantages and disadvantages....................................................................................... 442
Real world applications................................................................................................... 444
Real-world coding example............................................................................................. 444
Conclusion........................................................................................................................... 446
Points to remember............................................................................................................. 446
Exercises............................................................................................................................... 447
Exercise on Transductive Support Vector Machines....................................................... 447
Exercise on co-regularization for semi-supervised learning........................................... 449
Exercise on deep generative models................................................................................. 450
Exercise on Virtual Adversarial Training for semi-supervised learning........................ 452
Exercise on tri-training for semi-supervised learning.................................................... 453

12. Natural Language Processing........................................................................................... 455


Introduction ........................................................................................................................ 455
Structure............................................................................................................................... 455
Objectives............................................................................................................................. 455
Natural Language Processing........................................................................................... 456
Natural Language Understanding.................................................................................. 457
Python coding example for NLU ........................................................................... 458
Natural Language Generation........................................................................................ 459
Python coding example for NLG............................................................................ 460
Large Language Models.................................................................................................. 461
History of LLMs...................................................................................................... 461
What are LLMs?..................................................................................................... 462
Python coding example for LLMs........................................................................... 463
Generative AI.................................................................................................................. 464
Python coding example for Generative AI.............................................................. 465
xxxii 

Mathematical foundations of Natural Language Processing.......................................... 466


Mathematical foundation of NLU.......................................................................... 466
Mathematical foundation of Natural Language Generation.................................. 469
Advantages and disadvantages........................................................................................ 470
Real-world applications..................................................................................................... 472
Coding example.................................................................................................................. 473
Conclusion........................................................................................................................... 474
Points to remember............................................................................................................. 475
Exercise................................................................................................................................. 476
Exercise: Sentiment analysis on movie reviews.............................................................. 476

13. Computer Vision................................................................................................................ 479


Introduction ........................................................................................................................ 479
Structure .............................................................................................................................. 479
Objectives............................................................................................................................. 479
Computer vision.................................................................................................................. 480
Mathematical foundation................................................................................................ 482
Convolutional Neural Networks..................................................................................... 483
Spiking Neural Networks: Outlook into the future of CV.............................................. 485
Advantages and disadvantages of computer vision...................................................... 486
Real-world applications for computer vision................................................................. 487
Coding example for computer vision.............................................................................. 488
Example for Convolutional Neural Networks................................................................. 490
Example for Spiking Neural Networks .......................................................................... 492
Conclusion........................................................................................................................... 494
Points to remember............................................................................................................. 494
Exercise................................................................................................................................. 495
Exercise: Image classification with Computer Vision..................................................... 495

14. Large-Scale Algorithms..................................................................................................... 497


Introduction......................................................................................................................... 497
Structure............................................................................................................................... 497
Objectives............................................................................................................................. 497
 xxxiii

Large-scale algorithms....................................................................................................... 498


MapReduce...................................................................................................................... 499
What is MapReduce?.............................................................................................. 501
Distributed machine learning......................................................................................... 502
Graph processing algorithms........................................................................................... 504
Large-scale optimization.................................................................................................. 505
Mathematical foundation of large-scale algorithms........................................................ 507
Advantages and disadvantages........................................................................................ 508
Real-world applications..................................................................................................... 510
Coding example.................................................................................................................. 511
Conclusion........................................................................................................................... 513
Points to remember............................................................................................................. 513
Exercise................................................................................................................................. 514
Exercise: Word count using MapReduce........................................................................ 514

15. Outlook into the Future: Quantum Machine Learning.............................................. 517


Introduction ........................................................................................................................ 517
Structure .............................................................................................................................. 517
Objectives............................................................................................................................. 518
A Short Introduction to Quantum and Quantum Computing..................................... 518
Quantum Machine Learning............................................................................................. 519
Challenges and limitations.............................................................................................. 520
Mathematical foundation................................................................................................ 521
Quantum Machine Learning algorithms......................................................................... 522
Quantum Support Vector Machine ............................................................................... 522
Quantum Neural Networks............................................................................................ 524
Quantum Principal Component Analysis...................................................................... 525
Quantum k-Means Clustering........................................................................................ 527
Quantum Boltzmann Machine....................................................................................... 528
Quantum Genetic Algorithms........................................................................................ 530
Advantages of Quantum Machine Learning.................................................................. 531
Disadvantages of Quantum Machine Learning............................................................. 532
xxxiv 

Real-world applications..................................................................................................... 533


Coding example.................................................................................................................. 533
Conclusion........................................................................................................................... 535
Points to remember............................................................................................................. 535
Exercise................................................................................................................................. 536
Exercise: Implementing a Simple Quantum Circuit for Quantum Machine Learning..... 536

Index..............................................................................................................................539-553
Chapter 1
Fundamentals

Introduction
This chapter of the book will cover the fundamentals of artificial intelligence (AI)
and machine learning (ML). We would start by laying out the fundamentals and their
definitions to create a common understanding of the field. We will dive into the world of
AI and ML by defining the fields and their impact on the world inside and outside of AI.
We will as well include the critical concepts and what kind of industry problems could
be solved with AI and ML. We will close out the chapter with simple examples to make a
differentiation between an AI/ML application and an AI/ML algorithm.

Structure
The chapter covers the following topics:
• Fundamentals of AI and ML
• Defining AI and ML
• History of AI and ML
o Classic examples of AI and ML
• AI and ML algorithms
o Examples of AI and ML algorithms
o Structure of a typical AI and ML algorithm
2  40 Algorithms Every Data Scientist Should Know

Objectives
By the end of this chapter, you will gain a comprehensive understanding of AI and ML as
general concepts and their underlying fundamentals. Additionally, you will learn about
the origins of AI and ML and be exposed to some basic examples. Furthermore, you will
grasp the concept of basic data structures associated with these fields.

Fundamentals of AI and ML
The fundamentals of AI and ML encompass a wide range of concepts and techniques.
Here are some key fundamentals of AI and ML:
• Data: High-quality data is essential for AI and ML. It serves as the foundation for
training and evaluating models. Understanding the data, its quality, structure, and
representation is crucial for successful AI and ML applications.
• Algorithms: Algorithms are mathematical and computational procedures used
to solve specific problems or perform tasks. In AI and ML, algorithms are used
to train models, make predictions, and make decisions based on data. Examples
include decision trees, neural networks, Support Vector Machines (SVM), and
clustering algorithms.
• Feature engineering: Feature engineering involves selecting, transforming, and
creating relevant features from raw data to improve the performance of ML
models. This process helps extract meaningful information and patterns from the
data, making it easier for models to learn and make accurate predictions.
• Model training: Model training is the process of feeding labeled data into an
algorithm or model to learn patterns and relationships. During training, the model
adjusts its internal parameters to minimize the difference between predicted and
actual outputs. This process often involves optimization techniques, such as
gradient descent, to find the best parameter values.
• Model evaluation: Evaluating the performance of ML models is crucial to ensure
their effectiveness and generalization. Various metrics, such as accuracy, precision,
recall, and F1-score, are used to assess the modelʼs predictive capabilities. Cross-
validation techniques, such as k-fold cross-validation, help estimate the modelʼs
performance on unseen data.
• Generalization and overfitting: Generalization refers to a modelʼs ability to
perform well on unseen data. Overfitting occurs when a model becomes overly
complex and performs well on the training data but fails to generalize to new data.
Techniques such as regularization and early stopping are employed to prevent
overfitting and promote better generalization.
• Model deployment: Deploying ML models involves making them available for
use in real-world applications. This includes optimizing the model for efficiency,
scalability, and compatibility with the target environment. The model deployment
Fundamentals  3

also involves monitoring the modelʼs performance and retraining or updating it


when necessary.
• Ethics and bias: As AI and ML systems have a societal impact, understanding the
ethical implications and addressing potential biases is crucial. Ensuring fairness,
transparency, and accountability in AI systems is an essential consideration.
Ethical considerations involve issues such as data privacy, algorithmic bias, and
the potential impact of AI on various stakeholders.
These fundamentals provide a solid foundation for understanding and developing AI
and ML applications. Mastering these concepts allows practitioners to build robust and
effective AI systems. This book will focus on the algorithms that are forming the core of
every modern AI and ML application.

Defining AI and ML
Let us now discuss AI and ML in detail.

Artificial Intelligence
Artificial Intelligence is a broad field in computer science that focuses on the creation
of systems capable of performing tasks that would typically require human intelligence.
This includes tasks like understanding natural language, recognizing patterns, solving
problems, learning from experience, and making decisions.
Examples of AI in use today by data scientists include:
• Natural Language Processing (NLP): NLP algorithms are used to create systems
like Siri, Google Assistant, and ChatGPT (which you currently interact with) that
can understand and generate human language.
• Computer vision: Algorithms in this domain are designed to interpret and
understand the visual world. For instance, Facebook uses computer vision AI to
recognize and tag faces in images.
• Recommendation systems: Websites like Amazon and Netflix use AI to recommend
products or movies based on a userʼs past behavior and the behavior of similar
users.
• Predictive analytics: Many industries use AI to predict future outcomes, like
predicting stock prices in finance or disease outbreaks in healthcare.
• Autonomous vehicles: Companies like Tesla use AI to enable cars to navigate and
understand the world around them.
These examples only scratch the surface of AIʼs potential. Its reach is continually expanding,
making it a crucial tool in a modern data scientistʼs arsenal.
4  40 Algorithms Every Data Scientist Should Know

Machine learning
Machine learning is a subset of AI that gives computers the ability to learn from data and
make decisions or predictions without being explicitly programmed to do so. This process
involves the development of algorithms that can process large amounts of data, learn
patterns within that data, and use this learned information to predict future outcomes or
behavior. This learning is accomplished by improving the performance of the system over
time as it is exposed to more data.
There are three main types of ML, supervised learning, unsupervised learning, and
reinforcement learning, which are discussed below:
• Supervised learning involves training a model on a labeled dataset, that is,
a dataset where the outcome or target variable is known. The model learns the
relationship between the features and the target and can then predict the outcome
for new, unseen data. For example, a bank might use supervised learning to
predict whether a loan applicant will default based on their previous loan history
and financial profile.
• Unsupervised learning involves training a model on an unlabeled dataset, that
is, a dataset where the outcome or target variable is not known. The goal is to
discover hidden patterns or intrinsic structures within the data. Common uses
include clustering and dimensionality reduction. For example, a retail company
might use unsupervised learning to segment its customers into different groups
based on their buying behavior.
• Reinforcement learning involves training a model to make a series of decisions
by rewarding or punishing the model (the agent) based on the actions it takes in an
environment to reach a goal. The model learns to perform actions that maximize
some reward over time. This is often used in robotics, gaming, and navigation.
For example, reinforcement learning has been used to train AI to play and win
complex games like Go and Chess.
Modern data scientists need to understand these concepts and techniques to build and
deploy effective ML models. Moreover, they often need to use different types of machine
learning in concert, depending on the task at hand. They should also be aware of new
trends in ML, such as Deep Learning, transfer learning, and active learning, which have led
to significant advancements in fields like computer vision, natural language processing,
and recommender systems.

History of AI and ML
The development of AI and ML has been an incremental journey spanning several decades.
The evolution of these fields has been influenced by various domains like mathematics,
statistics, computer science, cognitive psychology, and neuroscience, as discussed in the
following points:
Fundamentals  5

• The 1950s - Birth of AI and ML: The birth of AI as a distinct field happened
during a summer conference at Dartmouth College in 1956, which was attended
by pioneers like John McCarthy, Marvin Minsky, Allen Newell, and Herbert Simon.
Here, they proposed that every feature of learning or any other feature of intelligence
can in principle be so precisely described that a machine can be made to simulate it.ʼ
Even before this, in 1950, Alan Turing introduced the concept of machine intelligence
with the Turing Test, a measure of a machineʼs ability to exhibit intelligent behavior
equivalent to, or indistinguishable from, that of a human.
In 1959, Arthur Samuel developed a program that could play checkers and learn
from its mistakes, marking one of the first self-learning programs and a seminal
moment in ML.
• The 1960s - Growth and consolidation: In the 1960s, AI research focused on
problem-solving and symbolic methods. AI programs like DENDRAL and ELIZA
were developed during this time.
ML saw a significant development in 1967 with the creation of the Nearest
Neighbor algorithm, which started basic pattern recognition.
• The 1970s - AI Winter and rule-based systems: The mid-1970s marked the
beginning of the first AI Winter, a period of disappointment resulting from the
overhyping of AI capabilities and subsequent cuts in funding. The focus shifted
towards expert systems – rule-based systems that tried to mimic the decision-
making of human experts.
• The 1980s – Revival and ML expansion: In the 1980s, AI saw a revival with the
rise of ML. The development of the backpropagation algorithm enabled more
efficient training of neural networks, and the advent of SVM led to significant
progress in ML.
• The 1990s – AI and ML Maturity: The 1990s saw ML mature into a field of its own,
with the growth of decision tree algorithms, reinforcement learning, and Bayesian
networks. AI and ML began to be used in practical applications, from data mining
to industrial robotics.
• The 2000s - The data boom and rise of deep learning: The explosion of data in the
2000s, due to the rise of the internet and, later, social media, alongside advancements
in computational power and storage, created the perfect conditions for AI and ML
to flourish. Deep learning, a subset of ML, started to become feasible, driven by the
development of new neural network architectures.
• The 2010s - AI and ML breakthroughs: This decade witnessed rapid progress
in AI and ML. The development of advanced neural network architectures, like
Convolutional Neural Networks (CNNs) and Recurrent Neural Networks
(RNNs), led to breakthroughs in image and speech recognition and natural
language processing.

You might also like