100% found this document useful (2 votes)

95 views

Download ebooks file Beginning Anomaly Detection Using Python Based Deep Learning Implement Anomaly Detection Applications with Keras and PyTorch 2nd Edition Suman Kalyan Adari all chapters

Keras

Uploaded by

hindskraalkc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (2 votes)

95 views

Download ebooks file Beginning Anomaly Detection Using Python Based Deep Learning Implement Anomaly Detection Applications with Keras and PyTorch 2nd Edition Suman Kalyan Adari all chapters

Keras

Uploaded by

hindskraalkc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 55

Visit https://ebookgate.

com to download the full version and

explore more ebooks

Beginning Anomaly Detection Using Python Based Deep

Learning Implement Anomaly Detection Applications
with Keras and PyTorch 2nd Edition Suman Kalyan
Adari
_____ Click the link below to download _____
https://ebookgate.com/product/beginning-anomaly-
detection-using-python-based-deep-learning-implement-
anomaly-detection-applications-with-keras-and-
pytorch-2nd-edition-suman-kalyan-adari/

Explore and download more ebooks at ebookgate.com

Here are some recommended products that might interest you.
You can download now and explore!

Deep Learning with PyTorch Second Edition MEAP V03 Howard

Huang

https://ebookgate.com/product/deep-learning-with-pytorch-second-
edition-meap-v03-howard-huang/

ebookgate.com

Python Machine Learning Machine Learning and Deep Learning

with Python scikit learn and TensorFlow 2nd Edition
Sebastian Raschka
https://ebookgate.com/product/python-machine-learning-machine-
learning-and-deep-learning-with-python-scikit-learn-and-
tensorflow-2nd-edition-sebastian-raschka/
ebookgate.com

Deep Learning with TensorFlow Explore neural networks with

Python 1st Edition Zaccone

https://ebookgate.com/product/deep-learning-with-tensorflow-explore-
neural-networks-with-python-1st-edition-zaccone/

ebookgate.com

Deep Reinforcement Learning with Python RLHF for Chatbots

and Large Language Models 2nd Edition Nimish Sanghi

https://ebookgate.com/product/deep-reinforcement-learning-with-python-
rlhf-for-chatbots-and-large-language-models-2nd-edition-nimish-sanghi/

ebookgate.com
General Relativity and the Pioneers Anomaly 1st Edition
Marcelo Samuel Berman

https://ebookgate.com/product/general-relativity-and-the-pioneers-
anomaly-1st-edition-marcelo-samuel-berman/

ebookgate.com

Deep Learning for Numerical Applications with SAS 1ed.

Edition Henry Bequet

https://ebookgate.com/product/deep-learning-for-numerical-
applications-with-sas-1ed-edition-henry-bequet/

ebookgate.com

Detection Algorithms for Wireless Communications With

Applications to Wired and Storage Systems 1st Edition
Gianluigi Ferrari
https://ebookgate.com/product/detection-algorithms-for-wireless-
communications-with-applications-to-wired-and-storage-systems-1st-
edition-gianluigi-ferrari/
ebookgate.com

Image Analysis Classification and Change Detection in

Remote Sensing With Algorithms for ENVI IDL and Python 3rd
Edition Morton John Canty
https://ebookgate.com/product/image-analysis-classification-and-
change-detection-in-remote-sensing-with-algorithms-for-envi-idl-and-
python-3rd-edition-morton-john-canty/
ebookgate.com

Single Photon Generation and Detection Physics and

Applications 1st Edition Alan Migdall

https://ebookgate.com/product/single-photon-generation-and-detection-
physics-and-applications-1st-edition-alan-migdall/

ebookgate.com
Beginning Anomaly
Detection Using
Python-Based Deep
Learning
Implement Anomaly Detection
Applications with Keras and PyTorch
Second Edition

Suman Kalyan Adari

Sridhar Alla
Beginning Anomaly Detection Using Python-Based Deep Learning: Implement
Anomaly Detection Applications with Keras and PyTorch, Second Edition
Suman Kalyan Adari Sridhar Alla
Tampa, FL, USA Delran, NJ, USA

ISBN-13 (pbk): 979-8-8688-0007-8 ISBN-13 (electronic): 979-8-8688-0008-5

https://doi.org/10.1007/979-8-8688-0008-5

Copyright © 2024 by Suman Kalyan Adari, Sridhar Alla

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the
material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now
known or hereafter developed.
Trademarked names, logos, and images may appear in this book. Rather than use a trademark symbol with
every occurrence of a trademarked name, logo, or image we use the names, logos, and images only in an
editorial fashion and to the benefit of the trademark owner, with no intention of infringement of the
trademark.
The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not
identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to
proprietary rights.
While the advice and information in this book are believed to be true and accurate at the date of publication,
neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or
omissions that may be made. The publisher makes no warranty, express or implied, with respect to the
material contained herein.
Managing Director, Apress Media LLC: Welmoed Spahr
Acquisitions Editor: Celestin Suresh John
Development Editor: James Markham
Coordinating Editor: Gryffin Winkler
Cover designed by eStudioCalamar
Cover image by Tony [email protected]
Distributed to the book trade worldwide by Apress Media, LLC, 1 New York Plaza, New York, NY 10004,
U.S.A. Phone 1-800-SPRINGER, fax (201) 348-4505, e-mail [email protected], or visit
www.springeronline.com. Apress Media, LLC is a California LLC and the sole member (owner) is Springer
Science + Business Media Finance Inc (SSBM Finance Inc). SSBM Finance Inc is a Delaware corporation.
For information on translations, please e-mail [email protected]; for reprint,
paperback, or audio rights, please e-mail [email protected].
Apress titles may be purchased in bulk for academic, corporate, or promotional use. eBook versions and
licenses are also available for most titles. For more information, reference our Print and eBook Bulk Sales
web page at http://www.apress.com/bulk-sales.
Any source code or other supplementary material referenced by the author in this book is available to
readers on GitHub (https://github.com/Apress). For more detailed information, please visit
https://www.apress.com/gp/services/source-code.
Paper in this product is recyclable
Table of Contents
About the Authors�� ix

About the Technical Reviewers�� xi

Acknowledgments�� xiii

Introduction��xv

Chapter 1: Introduction to Anomaly Detection�� 1

What Is an Anomaly?�� 1
Anomalous Swans�� 1
Anomalies as Data Points�� 4
Anomalies in a Time Series�� 6
Categories of Anomalies�� 12
Data Point–Based Anomalies�� 12
Context-Based Anomalies�� 13
Pattern-Based Anomalies�� 14
Anomaly Detection�� 14
Outlier Detection�� 15
Noise Removal�� 15
Novelty Detection�� 16
Event Detection�� 16
Change Point Detection�� 16
Anomaly Score Calculation�� 17
The Three Styles of Anomaly Detection�� 17
Where Is Anomaly Detection Used?�� 18
Data Breaches�� 18
Identity Theft�� 19
Manufacturing�� 20

iii
Table of Contents

Networking�� 21
Medicine�� 21
Video Surveillance�� 22
Environment�� 22
Summary�� 22

Chapter 2: Introduction to Data Science�� 23

Data Science�� 24
Dataset�� 24
Pandas, Scikit-Learn, and Matplotlib�� 27
Data I/O�� 28
Data Manipulation�� 33
Data Analysis�� 75
Visualization�� 79
Data Processing�� 85
Feature Engineering and Selection�� 96
Summary�� 103

Chapter 3: Introduction to Machine Learning�� 105

Machine Learning�� 106
Introduction to Machine Learning�� 106
Data Splitting�� 111
Modeling and Evaluation�� 112
Overfitting and Bias-Variance Tradeoff�� 121
Hyperparameter Tuning�� 130
Validation�� 132
Summary�� 134

Chapter 4: Traditional Machine Learning Algorithms�� 135

Traditional Machine Learning Algorithms�� 136
Isolation Forest�� 136
One-Class Support Vector Machine�� 159
Summary�� 182

iv
Table of Contents

Chapter 5: Introduction to Deep Learning�� 183

Introduction to Deep Learning�� 185
What Is Deep Learning?�� 185
The Neuron�� 187
Activation Functions�� 189
Neural Networks�� 203
Loss Functions�� 210
Gradient Descent and Backpropagation�� 213
Loss Curve�� 224
Regularization�� 227
Optimizers�� 228
Multilayer Perceptron Supervised Anomaly Detection�� 238
Simple Neural Network: Keras�� 243
Simple Neural Network: PyTorch�� 251
Summary�� 260

Chapter 6: Autoencoders�� 261

What Are Autoencoders?�� 262
Simple Autoencoders�� 264
Sparse Autoencoders�� 281
Deep Autoencoders�� 284
Convolutional Autoencoders�� 286
Denoising Autoencoders�� 294
Variational Autoencoders�� 304
Summary�� 320

Chapter 7: Generative Adversarial Networks�� 321

What Is a Generative Adversarial Network?�� 322
Generative Adversarial Network Architecture�� 325
Wasserstein GAN�� 327
WGAN-GP�� 329
Anomaly Detection with a GAN�� 330
Summary�� 343
v
Table of Contents

Chapter 8: Long Short-Term Memory Models�� 345

Sequences and Time Series Analysis�� 346
What Is an RNN?�� 349
What Is an LSTM?�� 350
LSTM for Anomaly Detection�� 355
Examples of Time Series�� 377
art_daily_no_noise.csv�� 378
art_daily_nojump.csv�� 379
art_daily_jumpsdown.csv�� 381
art_daily_perfect_square_wave.csv�� 384
art_load_balancer_spikes.csv�� 386
ambient_temperature_system_failure.csv�� 387
ec2_cpu_utilization.csv�� 389
rds_cpu_utilization.csv�� 390
Summary�� 392

Chapter 9: Temporal Convolutional Networks�� 393

What Is a Temporal Convolutional Network?�� 394
Dilated Temporal Convolutional Network�� 399
Anomaly Detection with the Dilated TCN�� 404
Encoder-Decoder Temporal Convolutional Network�� 421
Anomaly Detection with the ED-TCN�� 424
Summary�� 441

Chapter 10: Transformers�� 443

What Is a Transformer?�� 443
Transformer Architecture�� 446
Transformer Encoder�� 447
Transformer Decoder�� 452
Transformer Inference�� 455
Anomaly Detection with the Transformer�� 455
Summary�� 480

vi
Table of Contents

Chapter 11: Practical Use Cases and Future Trends of Anomaly Detection �� 481
Anomaly Detection�� 482
Real-World Use Cases of Anomaly Detection�� 485
Telecom�� 485
Banking�� 487
Environmental�� 488
Health Care�� 490
Transportation�� 493
Social Media�� 494
Finance and Insurance�� 495
Cybersecurity�� 496
Video Surveillance�� 499
Manufacturing�� 500
Smart Home�� 503
Retail�� 504
Implementation of Deep Learning–Based Anomaly Detection�� 504
Future Trends�� 506
Summary�� 508

Index�� 511

vii
About the Authors
Suman Kalyan Adari is currently a machine learning research engineer. He obtained a
B.S. in computer science at the University of Florida and an M.S. in computer science,
specializing in machine learning, at Columbia University. He has been conducting
deep learning research in adversarial machine learning since his freshman year at the
University of Florida and has presented at the IEEE Dependable Systems and Networks
workshop on Dependable and Secure Machine Learning held in Portland, Oregon,
USA in June 2019. Currently, he works on various anomaly detection tasks spanning
behavioral tracking and geospatial trajectory modeling.
He is quite passionate about deep learning, and specializes in various fields ranging
from video processing to generative modeling, object tracking, time-series modeling,
and more.

Sridhar Alla is the co-founder and CTO of Bluewhale, which helps organizations big
and small in building AI-driven big data solutions and analytics, as well as SAS2PY, a
powerful tool to automate migration of SAS workloads to Python-based environments
using Pandas or PySpark. He is a published author of books and an avid presenter
at numerous Strata, Hadoop World, Spark Summit, and other conferences. He also
has several patents filed with the US PTO on large-scale computing and distributed
systems. He has extensive hands-on experience in several technologies, including Spark,
Flink, Hadoop, AWS, Azure, TensorFlow, Cassandra, and others. He spoke on anomaly
detection using deep learning at Strata SFO in March 2019 and at Strata London in
October 2019. He was born in Hyderabad, India, and now lives in New Jersey with his
wife, Rosie, his daughters, Evelyn and Madelyn, and his son, Jayson. When he is not busy
writing code, he loves to spend time with his family and also training, coaching, and
organizing meetups.

ix
About the Technical Reviewers
Puneet Sinha has accumulated more than 12 years of work
experience in developing and deploying end-to-end models
in credit risk, multiple marketing optimization, A/B testing,
demand forecasting and brand evaluation, profit and price
analyses, anomaly and fraud detection, propensity modeling,
recommender systems, upsell/cross-sell models, modeling
response to incentives, price optimization, natural language
processing, and OCR using ML/deep learning algorithms.

Shubho Mohanty is a product thinker and creator, bringing

two decades of experience in the “concept-to-market”
life cycle of some of the unique, innovative, and highly
successful industry-first products and platforms in the data
and security spaces.
Shubho holds 12+ US patents in data, analytics, and
cloud security. He has also been awarded IDG CIO100, 2020
for strategizing and developing a technology innovation
ecosystem.
He currently serves as the Chief Product Officer at Calibo, where he leads the
product vision, strategy, innovation, and development of Calibo’s enterprise PaaS. Prior
to Calibo, Shubho was the Global VP of Product & Engineering at CDK Global (formerly,
ADP Inc). He has also served in various product leadership roles in organizations like
Symantec and Microsoft. He also co-founded Ganos, a B2B data start-up.
He received his B.Tech. in Electrical Engineering from National Institute of
Technology (NIT), India. He is a mentor to many high-repute start-up programs where
he guides young entrepreneurs to solve some of the most pressing challenges. He is also
an influential speaker at leading technology and industry forums.

xi
Acknowledgments
Suman Kalyan Adari
I would like to thank my parents, Krishna and Jyothi, my sister, Niha, and my loving dog,
Pinky, for supporting me throughout the entire process of writing this book as well as my
various other endeavors.

Sridhar Alla
I would like to thank my wonderful, loving wife, Rosie Sarkaria, and my beautiful,
loving children, Evelyn, Madelyn, and Jayson, for all their love and patience during the
many months I spent writing this book. I would also like to thank my parents, Ravi and
Lakshmi Alla, for their blessings and all the support and encouragement they continue
to bestow upon me.

xiii
Introduction
Congratulations on your decision to explore the exciting world of anomaly detection
using deep learning!
Anomaly detection involves finding patterns that do not adhere to what is
considered as normal or expected behavior. Businesses could lose millions of dollars
due to abnormal events. Consumers could also lose millions of dollars. In fact, there are
many situations every day where people’s lives are at risk and where their property is at
risk. If your bank account gets cleaned out, that’s a problem. If your water line breaks,
flooding your basement, that’s a problem. If all flights at an airport get delayed due to a
technical glitch in the traffic control system, that’s a problem. If you have a health issue
that is misdiagnosed or not diagnosed, that’s a very big problem that directly impacts
your well-being.
In this book, you will learn how anomaly detection can be used to solve business
problems. You will explore how anomaly detection techniques can be used to address
practical use cases and address real-life problems in the business landscape. Every
business and use case is different, so while we cannot copy and paste code and build a
successful model to detect anomalies in any dataset, this book will cover many use cases
with hands-on coding exercises to give you an idea of the possibilities and concepts
behind the thought process. All the code examples in the book are presented in Python
3.8. We choose Python because it is truly the best language for data science, with a
plethora of packages and integrations with scikit-learn, deep learning libraries, etc.
We will start by introducing anomaly detection, and then we will look at legacy
methods of detecting anomalies that have been used for decades. Then we will look
at deep learning to get a taste of it. Then we will explore autoencoders and variational
autoencoders, which are paving the way for the next generation of generative models.
Following that, we will explore generative adversarial networks (GANs) as a way to detect
anomalies, delving directly into generative AI.
Then we’ll look at long short-term memory (LSTM) models to see how temporal data
can be processed. We will cover temporal convolutional networks (TCNs), which are
excellent for temporal data anomaly detection. We will also touch upon the transformer

xv
Introduction

architecture, which has revolutionized the field of natural language processing as

another means for temporal anomaly detection. Finally, we will look at several examples
of anomaly detection in various business use cases.
In addition, all coding examples will be provided in TensorFlow 2/Keras, with
accompanying PyTorch equivalents, on the GitHub repository for this book. You will
combine all this extensive knowledge with hands-on coding using Jupyter notebook-
based exercises to experience the knowledge firsthand and see where you can use these
algorithms and frameworks. Best of luck, and welcome to the world of deep learning!

xvi
CHAPTER 1

Introduction to Anomaly
Detection
In this chapter, you will learn about anomalies in general, the categories of anomalies,
and anomaly detection. You will also learn why anomaly detection is important, how
anomalies can be detected, and the use case for such a mechanism.
In a nutshell, this chapter covers the following topics:

• What is an anomaly?

• Categories of different anomalies

• What is anomaly detection?

• Where is anomaly detection used?

What Is an Anomaly?
Before you get started with learning about anomaly detection, you must first understand
what exactly you are targeting. Generally, an anomaly is an outcome or value that
deviates from what is expected, but the exact criteria for what determines an anomaly
can vary from situation to situation.

Anomalous Swans
To get a better understanding of what an anomaly is, let’s take a look at some swans
sitting by a lake (Figure 1-1).

1
© Suman Kalyan Adari, Sridhar Alla 2024
S. K. Adari and S. Alla, Beginning Anomaly Detection Using Python-Based Deep Learning,
https://doi.org/10.1007/979-8-8688-0008-5_1
Chapter 1 Introduction to Anomaly Detection

Figure 1-1. A couple swans by a lake

Let’s say that we want to observe these swans and make assumptions about the
color of the swans at this particular lake. Our goal is to determine what the normal
color of swans is and to see if there are any swans that are of a different color than this
(Figure 1-2).

Figure 1-2. More swans show up, all of which are white

2
Chapter 1 Introduction to Anomaly Detection

We continue to observe swans for a few years and all of them have been white. Given
these observations, we can reasonably conclude that every swan at this lake should be
white. The very next day, we are observing swans at the lake again. But wait! What’s this?
A black swan has just flown in (Figure 1-3).

Figure 1-3. A black swan appears

Considering our previous observations, we thought that we had seen enough swans
to assume that the next swan would also be white. However, the black swan defies that
assumption entirely, making it an anomaly. It’s not really an outlier, which would be, for
example, a really big white swan or a really small white swan; it’s a swan that’s entirely
a different color, making it an anomaly. In our scenario, the overwhelming majority of
swans are white, making the black swan extremely rare.
In other words, given a swan by the lake, the probability of it being black is very
small. We can explain our reasoning for labeling the black swan as an anomaly with one
of two approaches (though we aren’t limited to only these two approaches).
First, given that a vast majority of swans observed at this particular lake are white, we
can assume that, through a process similar to inductive reasoning, the normal color for a
swan here is white. Naturally, we would label the black swan as an anomaly purely based
on our prior assumptions that all swans are white, considering that we’d only seen white
swans before the black swan arrived.

3
Chapter 1 Introduction to Anomaly Detection

Another way to look at why the black swan is an anomaly is through probability.
Now assume that there is a total of 1000 swans at this lake and only two are black swans;
the probability of a swan being black is 2 / 1000, or 0.002. Depending on the probability
threshold, meaning the lowest probability for an outcome or event that will be accepted
as normal, the black swan could be labeled as anomalous or normal. In our case, we will
consider it an anomaly because of its extreme rarity at this lake.

Anomalies as Data Points

We can extend this same concept to a real-world application. In the following example,
we will take a look at a factory that produces screws and attempt to determine what an
anomaly could be in this context and individual screws are sampled from each batch
and are tested to ensure a certain level of quality is maintained. For each sampled screw,
assume that the density and tensile strength (how resistant the screw is to breaking
under stress) are measured.
Figure 1-4 is an example graph of various sampled screws with the dotted lines
representing the range of densities and tensile strengths allowed. The solid lines form a
bounding box where any value of tensile strength and density inside it is considered good.

Figure 1-4. Density and tensile strength in a batch of screw samples

The intersections of the dotted lines have created several different regions containing
data points. Of interest is the bounding box (solid lines) created from the intersection of
both sets of dotted lines since it contains the data points for samples deemed acceptable
(Figure 1-5). Any data point outside of that specific box will be considered anomalous.
4
Chapter 1 Introduction to Anomaly Detection

Figure 1-5. Data points are identified as “good” or “anomaly” based on their
location

Now that we know which points are and aren’t acceptable, let’s pick out a sample from
a new batch of screws and check its data to see where it falls on the graph (Figure 1-6).

Figure 1-6. A new data point representing the new sample screw is generated,
with the data falling within the bounding box

The data for this sample screw falls within the acceptable range. That means that this
batch of screws is good to use since its density as well as tensile strength is appropriate
for use by the consumer. Now let’s look at a sample from the next batch of screws and
check its data (Figure 1-7).

5
Chapter 1 Introduction to Anomaly Detection

Figure 1-7. A new data point is generated for another sample, but this falls
outside the bounding box

The data falls far outside the acceptable range. For its density, the screw has abysmal
tensile strength and is unfit for use. Since it has been flagged as an anomaly, the
factory can investigate why this batch of screws turned out to be brittle. For a factory of
considerable size, it is important to hold a high standard of quality as well as maintain
a high volume of steady output to keep up with consumer demands. For a monumental
task like that, automation to detect any anomalies to avoid sending out faulty screws is
essential and has the benefit of being extremely scalable.
So far, we have explored anomalies as data points that are either out of place, in the
case of the black swan, or unwanted, in the case of faulty screws. So what happens when
we introduce time as a new variable?

Anomalies in a Time Series

With the introduction of time as a variable, we are now dealing with a notion of
temporality associated with the data sets. What this means is that certain patterns are
dependent on time. For example, daily, monthly, or yearly occurrences are time-series
patterns as they present on regular time intervals.
To better understand time series–based anomalies, let’s look at a few examples.

6
Chapter 1 Introduction to Anomaly Detection

Personal Spending Pattern

Figure 1-8 depicts a random person’s spending habits over the course of a month.

Figure 1-8. Spending habits of a person over the course of a month

Assume the initial spike in expenditures at the start of the month is due to the
payment of bills such as rent and insurance. During the weekdays, our example person
occasionally eats out, and on the weekends goes shopping for groceries, clothes, and
various other items. Also assume that this month does not include any major holidays.
These expenditures can vary from month to month, especially in months with major
holidays. Assume that our person lives in the United States, in which the holiday of
Thanksgiving falls on the last Thursday of the month of November. Many U.S. employers
also include the Friday after Thanksgiving as a day off for employees. U.S. retailers have
leveraged that fact to entice people to begin their Christmas shopping by offering special
deals on what has colloquially become known as “Black Friday.” With that in mind, let’s
take a look at our person’s spending pattern in November (Figure 1-9). As expected, a
massive spike in purchases occurred on Black Friday, some of them quite expensive.

7
Chapter 1 Introduction to Anomaly Detection

Figure 1-9. Spending habits for the same person during the month of November

Now assume that, unfortunately, our person has had their credit card information
stolen, and the criminals responsible for it have decided to purchase various items of
interest to them. Using the same month as in the first example (Figure 1-8; no major
holidays), the graph in Figure 1-10 depicts what could happen.

Figure 1-10. Purchases in the person’s name during the same month as in
Figure 1-8

Let’s assume we have a record of purchases for this user going back many years.
Thanks to this established prior history, this sudden influx in purchases would be
flagged as anomalies. Such a cluster of purchases might be normal for Black Friday or
in the weeks before Christmas, but in any other month without a major holiday, it looks

8
Chapter 1 Introduction to Anomaly Detection

out of place. In this case, our person might be contacted by the credit card company to
confirm whether or not they made the purchases.
Some companies might even flag purchases that follow normal societal trends. What
if that TV wasn’t really bought by our person on Black Friday? In that case, the credit
card company’s software can ask the client directly through a phone app, for example,
whether or not they actually bought the item in question, allowing for some additional
protection against fraudulent purchases.

Taxi Cabs
As another example of anomalies in a time series, let’s look at some sample data for taxi
cab pickups and drop-offs over time for a random city and an arbitrary taxi company and
see if we can detect any anomalies.
On an average day, the total number of pickups can look somewhat like the pattern
shown in Figure 1-11.

Figure 1-11. Number of pickups for a taxi company throughout the day

From the graph, we see that there’s a bit of post-midnight activity that drops off
to near zero during the late-night hours. However, customer traffic picks up suddenly
around morning rush hour and remains high until the early evening, when it peaks
during evening rush hour. This is essentially what an average day looks like.

9
Chapter 1 Introduction to Anomaly Detection

Let’s expand the scope out a bit more to gain some perspective of passenger traffic
throughout the week (Figure 1-12).

Figure 1-12. Number of pickups for a taxi company throughout the week

As expected, most of the pickups occur during the weekday when commuters
must get to and from work. On the weekends, a fair amount of people still go out to get
groceries or just go out somewhere for the weekend.
On a small scale like this, causes for anomalies would be anything that prevents
taxis from operating or incentivizes customers not to use a taxi. For example, say that a
terrible thunderstorm hits on Friday. Figure 1-13 shows that graph.

Figure 1-13. Number of pickups for a taxi company throughout the week, with a
heavy thunderstorm on Friday

10
Chapter 1 Introduction to Anomaly Detection

The thunderstorm likely influenced some people to stay indoors, resulting in a lower
number of pickups than unusual for a weekday. However, these sorts of anomalies are
usually too small-scale to have any noticeable effect on the overall pattern.
Let’s take a look at the data over the entire year, as shown in Figure 1-14.

Figure 1-14. Number of pickups for a taxi company throughout the year

The largest dips occur during the winter months when snowstorms are expected.
These are regular patterns that can be observed at similar times every year, so they
are not an anomaly. But what happens to customer traffic levels when a relatively rare
polar vortex descends on the city in early April and unleashes several intense blizzards?
Figure 1-15 shows the graph.

11
Chapter 1 Introduction to Anomaly Detection

Figure 1-15. Number of pickups for a taxi company throughout the year, with a
polar vortex descending on the city in April

As you can see in Figure 1-15, the intense blizzards severely slowed down all traffic
in the first week of April and burdened the city in the following two weeks. Comparing
this graph to the graph shown in Figure 1-14, there’s a clearly defined anomaly in April
caused by the polar vortex. Since this pattern is extremely rare for the month of April, it
would be flagged as an anomaly.

Categories of Anomalies
Now that you have more perspective of what anomalies can be in various situations, you
can see that they generally fall into these broad categories:

• Data point–based anomalies

• Context-based anomalies

• Pattern-based anomalies

Data Point–Based Anomalies

Data point–based anomalies may seem comparable to outliers in a set of data points.
However, as previously mentioned, anomalies and outliers are not the same thing,
though these terms are sometimes used interchangeably. Outliers are data points that

12
Chapter 1 Introduction to Anomaly Detection

are expected to be present in the data set and can be caused by unavoidable random
errors or from systematic errors relating to how the data was sampled. Anomalies would
be outliers or other values that one doesn’t expect to exist. These data anomalies might
be present wherever a data set of values exist.
As an example of a data set in which data point–based anomalies may exist,
consider a data set of thyroid diagnostic values, where the majority of the data points are
indicative of normal thyroid functionality. In this case, anomalous values represent sick
thyroids. While they are not necessarily outliers, they have a low probability of existing
when taking into account all the normal data.
We can also detect individual purchases totaling excessive amounts and label
them as anomalies since, by definition, they are not expected to occur or have a very
low probability of occurrence. In this case, they are labeled as potentially fraudulent
transactions, and the card holder is contacted to ensure the validity of the purchase.
Basically, we can say this about the difference between anomalies and outliers:
we should expect a data set to include outliers, but we should not expect it to include
anomalies. Though the terms “anomaly” and “outlier” are sometimes interchanged,
anomalies are not always outliers, and not all outliers are anomalies.

Context-Based Anomalies
Context-based anomalies consist of data points that might seem normal at first but
are considered anomalies in their respective contexts. Returning to our earlier personal
spending example, we might expect a sudden surge in purchases near certain holidays,
but these purchases could seem unusual in the middle of August. The person’s high
volume of purchases on Black Friday was not flagged because it is typical spending
behavior for people on Black Friday. However, if the purchases were made in a month
where it is out of place given previous purchase history, it would be flagged as an
anomaly. This might seem similar to the example presented for data point–based
anomalies, but the distinction for context-based anomalies is that the individual purchase
does not have to be expensive. If a person never buys gasoline because they own an
electric car, sudden purchases of gasoline would be out of place given the context. Buying
gasoline is normal behavior for many people, but in this context, it is an anomaly.

13
Chapter 1 Introduction to Anomaly Detection

Pattern-Based Anomalies
Pattern-based anomalies are patterns and trends that deviate from their historical
counterparts, and they often occur in time-series or other sequence-based data. In the
earlier taxi cab company example, the customer pickup counts for the month of April
were pretty consistent with the rest of the year. However, once the polar vortex hit, the
numbers tanked visibly, resulting in a huge drop in the graph, labeled as an anomaly.
Similarly, when monitoring network traffic in the workplace, expected patterns of
network traffic are formed from constant monitoring of data over several months or
even years for some companies. If an employee attempts to download or upload large
volumes of data, it generates a certain pattern in the overall network traffic flow that
could be considered anomalous if it deviates from the employee’s usual behavior.
As another example of pattern-based anomalies, if an external hacker decided to hit
a company’s website with a distributed denial-of-service (DDoS) attack—an attempt
to overwhelm the server that handles network flow to a certain website in an attempt
to bring the entire website down or stop its functionality—every single attempt would
register as an unusual spike in network traffic. All of these spikes are clearly deviants
from normal traffic patterns and would be considered anomalous.

Anomaly Detection
Now that you have a better understanding of the different types of anomalies, we can
proceed to discuss approaches to creating models to detect anomalies. This section presents
a few approaches we can take, but keep in mind we are not limited to just these methods.
Recall our reasoning for labeling the black swan as an anomaly. One reason was
that since all the swans we have seen thus far were white, the single black swan is an
obvious anomaly. A statistical way to explain this reasoning is that as of the most recent
set of observations, we have one black swan and tens of thousands of white swans.
Thus, the probability of occurrence of the black swan is one out of tens of thousands
of all observed swans. Since this probability is so low, it would make the black swan an
anomaly just because we do not expect to see it at all.
The anomaly detection models we will explore in this book follow these approaches
either by training on unlabeled data, training on normal, nonanomalous data, or training
on labeled data for normal and anomalous data. In the context of identifying swans, we
would be told which swans are normal and which swans are anomalies.

14
Chapter 1 Introduction to Anomaly Detection

So, what is anomaly detection? Quite simply, anomaly detection is the process in
which an advanced algorithm identifies certain data or data patterns to be anomalous.
Falling under anomaly detection are the tasks of outlier detection, noise removal, novelty
detection, event detection, change point detection, and anomaly score calculation.
In this book, we will explore all of these as they are all basically anomaly detection
methods. The following tasks of anomaly detection are not exhaustive, but are some of
the more common anomaly detection tasks today.

Outlier Detection
Outlier detection is a technique that aims to detect anomalous outliers within a given
data set. As previously discussed, three methods that can be applied to this situation are
to train a model only on normal data to identify anomalies (by a high reconstruction error,
described next), to model a probability distribution in which anomalies would be labeled
based on their association with really low probabilities, or to train a model to recognize
anomalies by teaching it what an anomaly looks like and what a normal point looks like.
Regarding the high reconstruction error, think of it this way: the model trains on a
set of normal data and learns the patterns corresponding to normal data. When exposed
to an anomalous data point, the patterns do not line up with what the model learned to
associate with normal data. The reconstruction error can be analogous to the deviance in
learned patterns between the anomalous point and the normal points the model trained
on. We will formally go over reconstruction error in Chapter 6. Going back to the example
of the swans, the black swan is different based on the patterns that we learned by observing
swans at the lake, and was thus anomalous because it did not follow the color pattern.

Noise Removal
Noise removal involves filtering out any constant background noise in the data set.
Imagine that you are at a party and you are talking with a friend. There is a lot of
background noise, but your brain focuses on your friend’s voice and isolates it because
that’s what you want to hear. Similarly, a model learns to efficiently encode the original
sound to represent only the essential information. For example, encoding the pitch of
your friend’s voice, the tempo, the vocal inflections, etc. Then, it is able to reconstruct the
original sound without the anomalous interference noise.

15
Chapter 1 Introduction to Anomaly Detection

This can also be a case where an image has been altered in some form, such as by
having perturbations, loss of detail, fog, etc. The model learns an accurate representation
of the original image and outputs a reconstruction without any of the anomalous
elements in the image.

Novelty Detection
Novelty detection is very similar to outlier detection. In this case, a novelty is a data
point outside of the training set the model was exposed to, that was shown to the model
to determine if it is an anomaly or not. The key difference between novelty detection and
outlier detection is that in outlier detection, the job of the model is to determine what is
an anomaly within the training data set. In novelty detection, the model learns what is a
normal data point and what isn’t and tries to classify anomalies in a new data set that it
has never seen before.
Examples can include quality assurance in factories to make sure new batches of
created products are up to par, such as with the example of screws from earlier. Another
case is network security. Incoming traffic data can be monitored to ensure there is no
anomalous behavior going on. Both of these situations involve novelties (new data) to
constantly predict on.

Event Detection
Event detection involves the detection of points in a time-series dataset that deviate
anomalously from the norm. For example, in the taxi cab company example earlier in the
chapter, the polar vortex reduced the customer pickup counts for April. These deviations
from the norm for April were all associated with an anomalous event that occurred
in the dataset, which an event detector algorithm would identify. Another example of
event detection would be the tracking of sea-ice levels at the poles over time, forming a
time series. An event detector algorithm could flag exactly when sea-ice levels deviate
anomalously from the usual norm, such as occurred recently in July 2023, when the sea-
ice levels were detected to be six standard deviations below the established average.

Change Point Detection

Change point detection involves the detection of points in the dataset where their
statistical properties start to consistently deviate from the established norm given by
the rest of the data. In other words, change point detection algorithms measure shifts
16
Chapter 1 Introduction to Anomaly Detection

in statistical trends. A good example is global average temperatures over time. A change
point detection algorithm could identify periods of sustained warming where the
statistical properties start to shift over time. A change point detection algorithm could
identify accelerated warming periods that differ from the normal rate of warming.

Anomaly Score Calculation

Anomaly score calculation involves assigning a score of how anomalous a given data
sample is. Rather than an outright determination of whether or not something is an
anomaly, this is for when we want to assign some type of measure to how deviant from
the normal a data sample may be. In some cases, anomaly score calculation is a direct
prelude to anomaly detection, since you could assign a threshold that determines what
are anomalies and what aren’t by the anomaly score. In other cases, anomaly score
calculation can help with understanding a data set more deeply and aid with data
analysis by providing a more nuanced measure of the normalcy of a data point or a
measure of how much a data sample deviates from the norm.

The Three Styles of Anomaly Detection

There are three overarching “styles” of anomaly detection:

• Supervised anomaly detection

• Semi-supervised anomaly detection

• Unsupervised anomaly detection

Supervised anomaly detection is a technique in which the training data has labels
for both anomalous data points and normal data points. Basically, we tell the model
during the training process if a data point is an anomaly or not. Unfortunately, this
isn’t the most practical method of training, especially because the entire data set needs
to be processed and each data point needs to be labeled. Since supervised anomaly
detection is basically a type of binary classification task, meaning the job of the model is
to categorize data under one of two labels, any classification model can be used for the
task, though not every model can attain a high level of performance. Chapter 9 provides
an example in the context of a temporal convolutional network.

17
Chapter 1 Introduction to Anomaly Detection

Semi-supervised anomaly detection involves partially labeling the training data set.
Exact implementations and definitions for what “semi-supervised” entails may differ, but
the gist of it is that you are working with partially labeled data. In the context of anomaly
detection, this can be a case where only the normal data is labeled. Ideally, the model will
learn what normal data points look like so that it can flag as anomalous data points that
differ from normal data points. Examples of models that can use semi-supervised learning
for anomaly detection include autoencoders, which you will learn about in Chapter 6.
Unsupervised anomaly detection, as the name implies, involves training the model
on unlabeled data. After the training process, the model is expected to know what data
points are normal and what points are anomalous within the data set. Isolation forest, a
model we will explore in Chapter 4, is one such model that can be used for unsupervised
anomaly detection.

Where Is Anomaly Detection Used?

Whether we realize it or not, anomaly detection is being utilized in nearly every facet
of our lives today. Pretty much any task involving data collection of any sort could have
anomaly detection applied to it. Let’s look at some of the most prevalent fields and topics
to which anomaly detection can be applied.

Data Breaches
In today’s age of big data, where huge volumes of information are stored about users
in various companies, information security is vital. Any information breaches must be
reported and flagged immediately, but it is hard to do so manually at such a scale. Data
leaks can range from simple accidents, such as an employee losing a USB stick that
contains sensitive company information that someone picks up and accesses the data
on, to intentional actions, such as an employee intentionally sending data to an outside
party, or an attacker gaining access to a database via an intrusion attack. Several high-
profile data leaks have been widely reported in news media, from Facebook / Meta,
iCloud, and Google security breaches where millions of passwords or photos were
leaked. All of those companies operate on an international scale, requiring automation
to monitor everything in order to ensure the fastest response time to any data breach.
The data breaches might not even need network access. For example, an employee
could email an outside party or another employee with connections to rival companies
about travel plans to meet up and exchange confidential information. Of course, these
18
Chapter 1 Introduction to Anomaly Detection

emails would not be so obvious as to state such intentions directly. However, monitoring
these emails could be helpful as a post-breach analysis to find out anyone suspicious
from within the company, or as part of a real-time monitoring software to ensure data
confidentiality compliance across teams for example. Anomaly detection models can sift
through and process employee emails to flag any suspicious activity by employees. The
software can pick up key words and process them to understand the context and decide
whether or not to flag an employee’s email for review.
The following are a few more examples of how anomaly detection software can
detect an internal data breach:

• Employees may be assigned a specific connection to upload data

to. For example, a mass storage drive that employees should not
frequently access.

• An employee may regularly be accessing data as part of their work

responsibilities. However, if they suddenly start downloading a lot of
data that they shouldn’t, then the anomaly detector may flag this.

• The detector would be looking at many variables, including who

the user is, what data store is being accessed, what volume of data
was transferred, how frequently did the user interact with this data
store, etc. to find out if this recent interaction was a deviation from
established patterns.

In this case, something won’t add up, which the software will detect and then flag
the employee. It could either turn out to be a one-off sanctioned event, which great, the
model did its job but it was ok this time, or it could turn out that the employee somehow
accessed data they shouldn’t have, which would mean there was a data breach.
The key benefit to using anomaly detection in the workspace is how easy it is to scale
up. These models can be used for small companies as well as large-scale international
companies.

Identity Theft
Identity theft is another common problem in today’s society. Thanks to development of
online services allowing for ease of access when purchasing items, the volume of credit
and debit card transactions that take place every day has grown immensely. However,
this development also makes it easier to steal credit and debit card information or bank
account information, allowing the criminals to purchase anything they want if the card
19
Chapter 1 Introduction to Anomaly Detection

isn’t deactivated or if the account isn’t secured again. Because of the huge volume
of transactions, monitoring everything is difficult. However, this is where anomaly
detection can step in and help, since it is highly scalable and can help detect fraud
transactions the moment the request is sent.
As we discussed earlier, context matters. When a payment card transaction is made, the
payment card company’s anomaly detection software takes into account the card holder’s
previous history to determine if the new transaction should be flagged or not. Obviously, a
series of high value purchases made suddenly would raise alarms immediately, but what
if the criminals were smart enough to realize that and just make a series of purchases over
time that won’t put a noticeable hole in the card holder’s account? Again, depending on
the context, the software would pick up on these transactions and flag them again.
For example, let’s say that someone’s grandmother was recently introduced to
Amazon and to the concept of buying things online. One day, unfortunately, she
stumbles upon an Amazon lookalike website and enters her credit card information.
On the other side, some criminal takes it and starts buying random things, but not all
at once so as not to raise suspicion—or so he thought. The grandmother’s identity theft
insurance company starts noticing some recent purchases of batteries, hard drives, flash
drives, and other electronic items. While these purchases might not be that expensive,
they certainly stand out when all the prior purchases made by the grandmother
consisted of groceries, pet food, and various decorative items. Based on this previous
history, the detection software would flag the new purchases and the grandmother
would be contacted to verify these purchases. These transactions might even be flagged
as soon as an attempt to purchase is made. In this case, either the location of the
purchaser or the nature of the transactions could raise alarms and stop the transaction
from being successful.

Manufacturing
We have explored a use case of anomaly detection in manufacturing. Manufacturing
plants usually have a certain level of quality that they must ensure their products meet
before shipping them out. When factories are configured to produce massive quantities
of output at a near constant rate, it becomes necessary to automate the process of
checking the quality of various samples. Similar to the fictitious example of the screw
manufacturer, manufacturing plants in real life might test and use anomaly detection
software to ensure the quality of various metal parts, tools, engines, food, clothes, etc.

20
Chapter 1 Introduction to Anomaly Detection

Networking
Perhaps one of the most important use cases for anomaly detection is in networking.
The Internet is host to a vast array of various websites located on servers all around the
world. Unfortunately, due to the ease of access to the Internet, many individuals access
the Internet for nefarious purposes. Similar to the data leaks that were discussed earlier
in the context of protecting company data, hackers can launch attacks on websites as
well to leak their information.
One such example would be hackers attempting to leak government secrets
through a network attack. With such sensitive information as well as the high volumes
of expected attacks every day, automation is a necessary tool to help cybersecurity
professionals deal with the attacks and preserve state secrets. On a smaller scale, hackers
might attempt to breach a cloud network or a local area network and try to leak data.
Even in smaller cases like this, anomaly detection can help detect network intrusion
attacks as they happen and notify the proper officials. An example data set for network
intrusion anomaly detection is the KDD Cup 1999 data set. This data set contains a
large amount of entries that detail various types of network intrusion attacks as well as a
detailed list of variables for each attack that can help a model identify each type of attack.

Medicine
Anomaly detection also has a massive role to play in the field of medicine. For example,
models can detect subtle irregularities in a patient’s heartbeat in order to classify
diseases, or they can measure brainwave activity to help doctors diagnose certain
conditions. Beyond that, they can help analyze raw diagnostic data for a patient’s organ
and process it in order to quickly diagnose any possible problems, similarly to the
thyroid example discussed earlier.
Anomaly detection can even be used in medical imagery to determine whether a given
image contains anomalous objects. For example, suppose an anomaly detection model
was trained by exposing it only to MRI imagery of normal bones. When shown an image of
a broken bone, it would flag the new image as an anomaly. Similarly, anomaly detection
can even be extended to tumor detection, allowing for the model to analyze every image in
a full-body MRI scan and look for the presence of abnormal growth or patterns.

21
Chapter 1 Introduction to Anomaly Detection

Video Surveillance
Anomaly detection also has uses in video surveillance. Anomaly detection software can
monitor video feeds and flag any videos that capture anomalous action. While this might
seem dystopian, it can certainly help catch criminals and maintain public safety on
busy streets and other transportation systems. For example, this type of software could
potentially identify a mugging in a street at night as an anomalous event and alert the
nearest police department. Additionally, this type of software can detect unusual events
at crossroads, such as an accident or some unusual obstruction, and immediately call
attention to the footage.

Environment
Anomaly detection can be used to monitor environmental conditions as well. For
example, anomaly detection systems are used to monitor heavy-metal levels in rivers
to pick up potential spills or leaks into the water supply. Another example is air quality
monitoring, which can detect anything from seasonal pollen to wildfire smoke coming in
from far away. Additionally, anomaly detection can be utilized to monitor soil health in
agricultural or environmental survey cases to gauge the health of an ecosystem. Any dips
in soil moisture or specific nutrient levels could indicate some kind of problem.

Summary
Generally, anomaly detection is utilized heavily in medicine, finance, cybersecurity,
banking, networking, transportation, and manufacturing, but it is not limited to those fields.
For nearly every case imaginable involving data collection, anomaly detection can be put to
use to help users automate the process of detecting anomalies and possibly removing them.
Many fields in science can benefit from anomaly detection because of the large volume of
raw data collection that goes on. Anomalies that would interfere with the interpretation
of results or otherwise introduce some sort of bias into the data could be detected and
removed, provided that the anomalies are caused by systematic or random errors.
In this chapter, we discussed what anomalies are and why detecting anomalies can
be very important to the data processing we have at our organizations.
Next, Chapter 2 introduces you to core data science concepts that you need to know
to follow along for the rest of the book.

22
CHAPTER 2

Introduction to Data
Science
This chapter introduces you to the basic principles of the data science workflow. These
concepts will help you prepare your data as necessary to be fed into a machine learning
model as well as understand its underlying structure through analysis.
You will learn how to use the libraries Pandas, Numpy, Scikit-Learn, and Matplotlib
to load, manipulate, and analyze data. Furthermore, you will learn how to perform data
I/O; manipulate, transform, and process the data to your liking; analyze and plot the
data; and select/create relevant features for the machine learning modeling task.
In a nutshell, this chapter covers the following topics:

• Data I/O

• Data manipulation

• Data analysis

• Data visualization
• Data processing

• Feature engineering/selection

Note Code examples are provided in Python 3.8. Package versions of all
frameworks used are provided. You will need some type of program to run
Python code, so make sure to set this up beforehand. The code repository for this
book is available at https://github.com/apress/beginning-anomaly-
detection-python-deep-learning-2e/tree/master.

23
© Suman Kalyan Adari, Sridhar Alla 2024
S. K. Adari and S. Alla, Beginning Anomaly Detection Using Python-Based Deep Learning,
https://doi.org/10.1007/979-8-8688-0008-5_2
Chapter 2 Introduction to Data Science

The repository also includes a requirements.txt file to check your packages and their
versions.
Code examples for this chapter are available at https://github.com/apress/
beginning-anomaly-detection-python-deep-learning-2e/blob/master/Chapter%20
2%20Introduction%20to%20Data%20Science/chapter2_datascience.ipynb. Navigate
to “Chapter 2 Introduction to Data Science” and then click chapter2_datascience.
ipynb. The code is provided as a .py file as well, though it is the exported version of the
notebook.
We will be using JupyterLab (https://jupyter.org) to present all of the code
examples.

Data Science
“Data science” is quite the popular term and buzzword nowadays, so what exactly is it?
In recent times, the term “data science” has come to represent a wide range of roles and
responsibilities. Depending on the company, a data scientist can be expected to perform
anything from data processing (often at scale, dipping into “big data” territory), to
statistical analysis and visualization, to training and deploying machine learning models.
In fact, data scientists often perform two or more of these roles at once.
This chapter focuses on concepts from all three roles, walking you through the
process of preparing and analyzing the dataset before you explore the modeling aspect
in Chapter 3.
Be advised that this will be a brief, high-level walkthrough over the most relevant
functionality that these various data science packages offer. Each package is so
comprehensive that a full book would be required to cover it in depth. Therefore, you are
encouraged to explore each package’s online documentation, as well as other guides and
walkthroughs, to learn as much as you can.

Dataset
A popular introductory dataset for budding data scientists is the Titanic Dataset,
available at Kaggle: https://www.kaggle.com/c/titanic. You can also find this dataset
hosted on this book’s repository at https://github.com/apress/beginning-anomaly-
detection-python-deep-learning-2e/blob/master/data/train.csv.

24
Chapter 2 Introduction to Data Science

Kaggle is a great place to find datasets. It also hosts machine learning modeling
competitions, some of which even have prize money. Kaggle is an excellent resource
for practicing your skills, and you are encouraged to do so! If you would like to practice
the concepts that you learn in this book, search Kaggle for various anomaly detection
datasets.
To download the Titanic dataset from the Kaggle web site, follow the instructions
provided next. If you prefer, Kaggle offers an API that you can install through PIP,
available at https://github.com/Kaggle/kaggle-api.

1. Go to https://www.kaggle.com/c/titanic, where you are

greeted by something that looks like Figure 2-1.

Figure 2-1. Overview page for the Titanic dataset on Kaggle (as it looks as of
April, 2023)

2. Click the Data tab. You should see a brief description of the
dataset as well as a Data Explorer, as shown in Figure 2-2.

3. Click Download All. You will be prompted to sign in. Create an

account or log in with an alternate option. After logging in, you
will be returned to the Overview page.

25
Chapter 2 Introduction to Data Science

Figure 2-2. Click Download All to download the data as a zip file

4. Return to the Data tab, scroll down, and click Download All. A zip
file will download.

5. Extract this zip file anywhere that you’d like and make a note of
the directory path.

After you have extracted the zip file, you are almost ready to start processing this
dataset in Python. First, make sure you have an IDE to develop Python code with. To
easily follow the examples in this book, you are recommended to use Jupyter Notebook
or JupyterLab.
Next, make sure you have the following libraries and versions installed, though you
might also want to check the requirements.txt file available on the GitHub repository or
use it to prepare your environment:

• pandas==2.0.0

• numpy==1.22.2

• scikit-learn==1.2.2

• matplotlib==3.7.1

26
Chapter 2 Introduction to Data Science

It is not necessary to have the exact same versions, but keep in mind that older
versions may not contain features we explore later in the book. Newer versions should be
fine unless they are significantly more recent, which might result in reworked syntax or
features and thus introduce incompatibility.
You can easily check the version in Python. Figure 2-3 introduces code to import
these packages and print their versions.

import pandas
import numpy
import sklearn
import matplotlib

print(f"Pandas version: ", pandas.version)

print(f"Numpy version: ", numpy.__version__)
print(f"Scikit-learn version: ", sklearn.__version__)
print(f"Matplotlib version: ", matplotlib.__version__)

Figure 2-3. Code to import pandas, numpy, sklearn, and matplotlib and print
their versions

You should see output similar to that displayed in Figure 2-4.

Pandas version: 2.0.0

Numpy version: 1.22.2
Scikit-learn version: 1.2.2
Matplotlib version: 3.7.1

Figure 2-4. The text output of running the code in Figure 2-3

Pandas, Scikit-Learn, and Matplotlib

Pandas is a data science framework for Python that lets you freely manipulate and
analyze data to your liking. It is an expansive, comprehensive framework offering a lot of
functionality, and should suit your data processing needs. Until you start to scale into the
gigabyte realm with your datasets, Pandas can be quick and efficient.

27
Chapter 2 Introduction to Data Science

Scikit-Learn is a machine learning library that covers a wide range of functionality

related to machine learning modeling, ranging from data processing functions to various
machine learning methods
Matplotlib is a data visualization library that lets you build customized plots and
graphs. It integrates well with pandas and scikit-learn, and even lets you display images.
NumPy is a numerical computation library with highly efficient implementations for
various computational tasks. You can represent your data as numpy arrays and perform
quick, vectorized computations, linear algebra operations, and more. NumPy integrates
very well with many popular Python packages, including pandas, scikit-learn, and
matplotlib.
With all of these packages, you should be able to perform comprehensive data
analysis. However, keep in mind that there are other packages available for Python that
you may want to check out to complement your data analysis, such as statsmodels
(statistical modeling library), seaborn (another plotting library, which we will use in
subsequent chapters), plotly (yet another plotting library but for interactive graphs), and
scipy (scientific computing library, which lets you perform various statistical tests)

Data I/O
With our environment set up, let’s jump straight into the content. Before we conduct any
type of data analysis, we need to actually have data. There are a myriad of ways to load
data in Pandas, but we will keep it simple and load from a csv file.
For the sake of convenience, let’s reimport our libraries with aliases, as shown in
Figure 2-5.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

Figure 2-5. Importing pandas, numpy, and matplotlib’s pyplot as aliases, for
convenience

Once you have executed this code, let’s move on to loading our dataset.

28
Chapter 2 Introduction to Data Science

Data Loading
First, make sure that the path to your dataset is defined, like in Figure 2-6.

# Path to the training data

data_path = '../data/train.csv'

Figure 2-6. Path to the Titanic training dataset defined

You may optionally define this to be the full, absolute path, just to make sure pandas
will find this file if you are having problems. Here, the data folder resides in a directory
level above our notebook, as it contains data files common to every chapter.
To load the data, we will use pd.read_csv(data_path), as shown in Figure 2-7. The
method read_csv() reads a csv file and all the data contained with it. It also can read
from other input formats, including JSON, SAS7BDAT, and Excel files. You can find more
details in the Pandas documentation: https://pandas.pydata.org/docs/reference/
index.html.

# pd.read_csv() returns a Pandas DataFrame object

df = pd.read_csv(data_path)

Figure 2-7. pd.read_csv() takes in the path as a parameter and returns a

pandas DataFrame

If this runs without producing any errors, you now have a pandas dataframe loaded.
An easy way to visualize this dataframe is to use df.head(N), where N is an optional
parameter to define how many rows you want to see. If you don’t pass N, the default is
five displayed rows. Simply run the line of code shown in Figure 2-8.

df.head(2)

Figure 2-8. Calling .head(2) on df to display two rows of df

You should see output that looks like Table 2-1.

29
30
Chapter 2

Table 2-1. Output of Executing the Code in Figure 2-8

PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
Introduction to Data Science

0 1 0 3 Braund, Mr. Owen male 22.0 1 0 A/5 21171 7.2500 NaN S

Harris
1 2 1 1 Cumings, Mrs. female 38.0 1 0 PC 17599 71.2833 C85 C
John Bradley
(Florence Briggs
Th...
Chapter 2 Introduction to Data Science

To get the table’s dimensions, run the following:

df.shape

This returns a tuple (M, N) with the table’s dimensions. It shows M rows and N
columns. For this example, you should see the following printed output:

(891, 12)

Data Saving
To save your dataset, call df.to_csv(save_path), where save_path is a variable that
contains a string path to where you want to save the dataframe. (There are many other
output formats available besides csv.) As an example, run the code shown in Figure 2-9.

df2 = df.head(2)
df2.to_csv('two_rows.csv', index=False)

Figure 2-9. df.head(2) returns two rows of df, which is saved as df2. This is then
saved as a csv to two_rows.csv, with the parameter index=False passed. This
parameter tells pandas not to save the index as part of the csv, so an extra column
is not introduced into the data where there was none before

The parameter index=False tells pandas to not save the dataframe index to the csv.
Pandas creates an index when you load in data, which you can override with a custom
index if it is relevant. You can change this to index=True (which is the default) and see
how that changes the csv output.

DataFrame Creation
Besides loading data from a specific source, you can create a dataframe from
scratch given a list or a dictionary. This is very useful to do when you are conducting
experiments in an iterative manner over several different variable settings and you want
to save the data into a nice table.
The code shown in Figure 2-10 creates a dataframe of arbitrary metrics.

31
Chapter 2 Introduction to Data Science

metric_rows = [ [0.9, 0.2, 0.3], [0.8, 0.3, 0.2] ]

metrics = pd.DataFrame(metric_rows, columns=['Model1', ↵
'Model2', 'Model3'])
metrics

Figure 2-10. Creating a dataframe from a list of lists (two rows, three columns)

Note The ↵ symbol in Figure 2-10 (and subsequent code displays) indicates
that the code has been truncated and that it’s still the same line. So 'Model1',
'Model2', 'Model3']) is the actual ending of this line.

The output should look like Table 2-2.

Table 2-2. Output of Executing the Code in Figure 2-10

Model1 Model2 Model3

0 0.9 0.2 0.3

1 0.8 0.3 0.2

To create a dataframe from a dictionary, execute the code shown in Figure 2-11. In
this format, the keys of the dictionary are the columns themselves, and the values are
lists with the data corresponding to the keys.

metric_dict = {'Model1': [0.9, 0.8], 'Model2': [0.2, 0.3], ↵

'Model3':[0.3, 0.2]}
metrics = pd.DataFrame(metric_dict)
metrics

Figure 2-11. Creating a dataframe from dictionaries

After executing this code, you should once again see the dataframe in Table 2-2, as
this code creates the same output as the code shown in Figure 2-10.
Now that you know the very basics of data I/O in pandas, let’s move on to the many
ways we can manipulate this data to our liking.

32
Random documents with unrelated
content Scribd suggests to you:
expenses, including legal fees. YOU AGREE THAT YOU HAVE NO
REMEDIES FOR NEGLIGENCE, STRICT LIABILITY, BREACH OF
WARRANTY OR BREACH OF CONTRACT EXCEPT THOSE
PROVIDED IN PARAGRAPH 1.F.3. YOU AGREE THAT THE
FOUNDATION, THE TRADEMARK OWNER, AND ANY
DISTRIBUTOR UNDER THIS AGREEMENT WILL NOT BE LIABLE
TO YOU FOR ACTUAL, DIRECT, INDIRECT, CONSEQUENTIAL,
PUNITIVE OR INCIDENTAL DAMAGES EVEN IF YOU GIVE
NOTICE OF THE POSSIBILITY OF SUCH DAMAGE.

1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If you

discover a defect in this electronic work within 90 days of receiving it,
you can receive a refund of the money (if any) you paid for it by
sending a written explanation to the person you received the work
from. If you received the work on a physical medium, you must
return the medium with your written explanation. The person or entity
that provided you with the defective work may elect to provide a
replacement copy in lieu of a refund. If you received the work
electronically, the person or entity providing it to you may choose to
give you a second opportunity to receive the work electronically in
lieu of a refund. If the second copy is also defective, you may
demand a refund in writing without further opportunities to fix the
problem.

1.F.4. Except for the limited right of replacement or refund set forth in
paragraph 1.F.3, this work is provided to you ‘AS-IS’, WITH NO
OTHER WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.

1.F.5. Some states do not allow disclaimers of certain implied

warranties or the exclusion or limitation of certain types of damages.
If any disclaimer or limitation set forth in this agreement violates the
law of the state applicable to this agreement, the agreement shall be
interpreted to make the maximum disclaimer or limitation permitted
by the applicable state law. The invalidity or unenforceability of any
provision of this agreement shall not void the remaining provisions.
1.F.6. INDEMNITY - You agree to indemnify and hold the
Foundation, the trademark owner, any agent or employee of the
Foundation, anyone providing copies of Project Gutenberg™
electronic works in accordance with this agreement, and any
volunteers associated with the production, promotion and distribution
of Project Gutenberg™ electronic works, harmless from all liability,
costs and expenses, including legal fees, that arise directly or
indirectly from any of the following which you do or cause to occur:
(a) distribution of this or any Project Gutenberg™ work, (b)
alteration, modification, or additions or deletions to any Project
Gutenberg™ work, and (c) any Defect you cause.

Section 2. Information about the Mission of

Project Gutenberg™
Project Gutenberg™ is synonymous with the free distribution of
electronic works in formats readable by the widest variety of
computers including obsolete, old, middle-aged and new computers.
It exists because of the efforts of hundreds of volunteers and
donations from people in all walks of life.

Volunteers and financial support to provide volunteers with the

assistance they need are critical to reaching Project Gutenberg™’s
goals and ensuring that the Project Gutenberg™ collection will
remain freely available for generations to come. In 2001, the Project
Gutenberg Literary Archive Foundation was created to provide a
secure and permanent future for Project Gutenberg™ and future
generations. To learn more about the Project Gutenberg Literary
Archive Foundation and how your efforts and donations can help,
see Sections 3 and 4 and the Foundation information page at
www.gutenberg.org.

Section 3. Information about the Project

Gutenberg Literary Archive Foundation
The Project Gutenberg Literary Archive Foundation is a non-profit
501(c)(3) educational corporation organized under the laws of the
state of Mississippi and granted tax exempt status by the Internal
Revenue Service. The Foundation’s EIN or federal tax identification
number is 64-6221541. Contributions to the Project Gutenberg
Literary Archive Foundation are tax deductible to the full extent
permitted by U.S. federal laws and your state’s laws.

The Foundation’s business office is located at 809 North 1500 West,

Salt Lake City, UT 84116, (801) 596-1887. Email contact links and up
to date contact information can be found at the Foundation’s website
and official page at www.gutenberg.org/contact

Section 4. Information about Donations to

the Project Gutenberg Literary Archive
Foundation
Project Gutenberg™ depends upon and cannot survive without
widespread public support and donations to carry out its mission of
increasing the number of public domain and licensed works that can
be freely distributed in machine-readable form accessible by the
widest array of equipment including outdated equipment. Many small
donations ($1 to $5,000) are particularly important to maintaining tax
exempt status with the IRS.

The Foundation is committed to complying with the laws regulating

charities and charitable donations in all 50 states of the United
States. Compliance requirements are not uniform and it takes a
considerable effort, much paperwork and many fees to meet and
keep up with these requirements. We do not solicit donations in
locations where we have not received written confirmation of
compliance. To SEND DONATIONS or determine the status of
compliance for any particular state visit www.gutenberg.org/donate.

While we cannot and do not solicit contributions from states where

we have not met the solicitation requirements, we know of no
prohibition against accepting unsolicited donations from donors in
such states who approach us with offers to donate.

International donations are gratefully accepted, but we cannot make

any statements concerning tax treatment of donations received from
outside the United States. U.S. laws alone swamp our small staff.

Please check the Project Gutenberg web pages for current donation
methods and addresses. Donations are accepted in a number of
other ways including checks, online payments and credit card
donations. To donate, please visit: www.gutenberg.org/donate.

Section 5. General Information About Project

Gutenberg™ electronic works
Professor Michael S. Hart was the originator of the Project
Gutenberg™ concept of a library of electronic works that could be
freely shared with anyone. For forty years, he produced and
distributed Project Gutenberg™ eBooks with only a loose network of
volunteer support.

Project Gutenberg™ eBooks are often created from several printed

editions, all of which are confirmed as not protected by copyright in
the U.S. unless a copyright notice is included. Thus, we do not
necessarily keep eBooks in compliance with any particular paper
edition.

Most people start at our website which has the main PG search
facility: www.gutenberg.org.

This website includes information about Project Gutenberg™,

including how to make donations to the Project Gutenberg Literary
Archive Foundation, how to help produce our new eBooks, and how
to subscribe to our email newsletter to hear about new eBooks.
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade

Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.

Let us accompany you on the journey of exploring knowledge and

personal growth!

ebookgate.com

Engineering Circuit Analysis 10th Edition Hayt download pdf
100% (2)
Engineering Circuit Analysis 10th Edition Hayt download pdf
71 pages
An Introduction to Programming with C 8th Edition Diane Zak all chapter instant download
100% (4)
An Introduction to Programming with C 8th Edition Diane Zak all chapter instant download
46 pages
MUNIN D9 2 Qualitative Assessment CML Final
No ratings yet
MUNIN D9 2 Qualitative Assessment CML Final
45 pages
Download Invent Your Own Computer Games with Python 3rd Edition Al Sweigart ebook All Chapters PDF
100% (2)
Download Invent Your Own Computer Games with Python 3rd Edition Al Sweigart ebook All Chapters PDF
81 pages
Download full Formal Methods Applied to Complex Systems 1st Edition Jean-Louis Boulanger ebook all chapters
100% (2)
Download full Formal Methods Applied to Complex Systems 1st Edition Jean-Louis Boulanger ebook all chapters
81 pages
Instant ebooks textbook Mathematical Foundations for Signal Processing Communications and Networking 1st Edition Erchin Serpedin (Editor) download all chapters
100% (2)
Instant ebooks textbook Mathematical Foundations for Signal Processing Communications and Networking 1st Edition Erchin Serpedin (Editor) download all chapters
81 pages
Complete Download Tools for Thinking Modelling in Management Science 2nd Edition Michael Pidd PDF All Chapters
100% (2)
Complete Download Tools for Thinking Modelling in Management Science 2nd Edition Michael Pidd PDF All Chapters
81 pages
Download Full Applied structural and mechanical vibrations theory and methods Second Edition Gatti PDF All Chapters
100% (2)
Download Full Applied structural and mechanical vibrations theory and methods Second Edition Gatti PDF All Chapters
81 pages
Instant ebooks textbook Electronic Circuit Analysis 2nd ed Edition K Lal Kishore download all chapters
100% (2)
Instant ebooks textbook Electronic Circuit Analysis 2nd ed Edition K Lal Kishore download all chapters
81 pages
Instant Access to Curbside Consultation in Fracture Management 49 Clinical Questions First Edition Walter W. Virkus ebook Full Chapters
100% (2)
Instant Access to Curbside Consultation in Fracture Management 49 Clinical Questions First Edition Walter W. Virkus ebook Full Chapters
81 pages
Download Complete Animal Models of Drug Addiction 1st Edition Styliani Vlachou PDF for All Chapters
100% (2)
Download Complete Animal Models of Drug Addiction 1st Edition Styliani Vlachou PDF for All Chapters
81 pages
Instant Access to Controversies in Uro Oncology 5th International Symposium on Special Aspects of Radiotherapy Berlin Germany May 11 13 2000 Frontiers of Radiation Therapy Oncology 1st Edition T Ed. Wiegel ebook Full Chapters
100% (2)
Instant Access to Controversies in Uro Oncology 5th International Symposium on Special Aspects of Radiotherapy Berlin Germany May 11 13 2000 Frontiers of Radiation Therapy Oncology 1st Edition T Ed. Wiegel ebook Full Chapters
81 pages
Buy ebook Theories of Modern Capitalism 1st Edition Tom Bottomore cheap price
100% (2)
Buy ebook Theories of Modern Capitalism 1st Edition Tom Bottomore cheap price
81 pages
Where can buy Optical Inspection of Microsystems Second Edition Wolfgang Osten (Editor) ebook with cheap price
100% (5)
Where can buy Optical Inspection of Microsystems Second Edition Wolfgang Osten (Editor) ebook with cheap price
51 pages
Get Search and Seizure Martha Ann Bridegam free all chapters
100% (2)
Get Search and Seizure Martha Ann Bridegam free all chapters
81 pages
Concepts of Programming Languages 12th Edition Robert W. Sebesta download
100% (2)
Concepts of Programming Languages 12th Edition Robert W. Sebesta download
49 pages
Buy ebook Discrete Mathematics Proofs Structures and Applications Third Edition Rowan Garnier cheap price
100% (5)
Buy ebook Discrete Mathematics Proofs Structures and Applications Third Edition Rowan Garnier cheap price
61 pages
Instant Access to Counting 2nd Edition Khee Meng Koh ebook Full Chapters
100% (20)
Instant Access to Counting 2nd Edition Khee Meng Koh ebook Full Chapters
60 pages
Where can buy The genetics of the dog 2. ed Edition Elaine A Ostrander ebook with cheap price
100% (2)
Where can buy The genetics of the dog 2. ed Edition Elaine A Ostrander ebook with cheap price
71 pages
[FREE PDF sample] Broadway Yearbook 2000 2001 A Relevant and Irreverent Record Broadway Yearbook Steven Suskin ebooks
100% (2)
[FREE PDF sample] Broadway Yearbook 2000 2001 A Relevant and Irreverent Record Broadway Yearbook Steven Suskin ebooks
53 pages
Where Can Buy Behind Human Error 2nd Edition David D. Woods Ebook With Cheap Price
100% (20)
Where Can Buy Behind Human Error 2nd Edition David D. Woods Ebook With Cheap Price
84 pages
Advances in Botanical Research 1st Edition L.C. Van Loon (Eds.) download pdf
100% (18)
Advances in Botanical Research 1st Edition L.C. Van Loon (Eds.) download pdf
60 pages
Instant download Applied Cognitive Linguistics Language Padagogy Martin Pütz pdf all chapter
100% (20)
Instant download Applied Cognitive Linguistics Language Padagogy Martin Pütz pdf all chapter
60 pages
Download ebooks file An Introduction to Programming with C 8th Edition Diane Zak all chapters
100% (18)
Download ebooks file An Introduction to Programming with C 8th Edition Diane Zak all chapters
60 pages
Get Snapshots English Supplementary Reader for Class 11 Core Course 11073 1st Edition Ncert. PDF ebook with Full Chapters Now
100% (5)
Get Snapshots English Supplementary Reader for Class 11 Core Course 11073 1st Edition Ncert. PDF ebook with Full Chapters Now
61 pages
Download Complete (Ebook) Nelson Outdoor and Environmental Studies by Marcia Cross, Philip Hughes, Andrew Mannion & Leigh Park ISBN 9780170401777, 0170401774 PDF for All Chapters
100% (8)
Download Complete (Ebook) Nelson Outdoor and Environmental Studies by Marcia Cross, Philip Hughes, Andrew Mannion & Leigh Park ISBN 9780170401777, 0170401774 PDF for All Chapters
55 pages
Test Bank for Nutrition For A Changing World, 2nd Edition, Jamie Pope, Steven Nizielski - Quick Download In Full PDF Format With All Chapters
100% (6)
Test Bank for Nutrition For A Changing World, 2nd Edition, Jamie Pope, Steven Nizielski - Quick Download In Full PDF Format With All Chapters
46 pages
Instant ebooks textbook Chaos Complexity and Transport Theory and Applications Cristel Chandre download all chapters
100% (20)
Instant ebooks textbook Chaos Complexity and Transport Theory and Applications Cristel Chandre download all chapters
60 pages
Download World of Art Le Corbusier Architect and Visionary Kenneth Frampton ebook All Chapters PDF
100% (2)
Download World of Art Le Corbusier Architect and Visionary Kenneth Frampton ebook All Chapters PDF
81 pages
Download Full (Ebook) Introduction to Cutting and Packing Optimization: Problems, Modeling Approaches, Solution Methods by Guntram Scheithauer (auth.) ISBN 9783319644028, 9783319644035, 3319644025, 3319644033 PDF All Chapters
100% (10)
Download Full (Ebook) Introduction to Cutting and Packing Optimization: Problems, Modeling Approaches, Solution Methods by Guntram Scheithauer (auth.) ISBN 9783319644028, 9783319644035, 3319644025, 3319644033 PDF All Chapters
65 pages
Complete Download Graph Algorithms 2nd Edition Shimon Even PDF All Chapters
100% (20)
Complete Download Graph Algorithms 2nd Edition Shimon Even PDF All Chapters
60 pages
Instant Access to The Collected Poems of Arthur Yap 1st Edition Arthur Yap ebook Full Chapters
100% (2)
Instant Access to The Collected Poems of Arthur Yap 1st Edition Arthur Yap ebook Full Chapters
81 pages
Instant Download Half Linear Differential Equations 1st ed Edition Ondřej Doślý And Pavel Řehák (Eds.) PDF All Chapters
100% (12)
Instant Download Half Linear Differential Equations 1st ed Edition Ondřej Doślý And Pavel Řehák (Eds.) PDF All Chapters
50 pages
Citizenship in Modern Britain 1st Edition Trevor Desmoyers-Davis 2024 scribd download
100% (20)
Citizenship in Modern Britain 1st Edition Trevor Desmoyers-Davis 2024 scribd download
52 pages
Instant Download Aerial Warfare: The Battle For The Skies Frank Ledwidge PDF All Chapter
100% (2)
Instant Download Aerial Warfare: The Battle For The Skies Frank Ledwidge PDF All Chapter
62 pages
Full Download Romantic Narrative Shelley Hays Godwin Wollstonecraft 1st Edition Tilottama Rajan PDF DOCX
100% (2)
Full Download Romantic Narrative Shelley Hays Godwin Wollstonecraft 1st Edition Tilottama Rajan PDF DOCX
81 pages
Instant Download An Elementary Introduction to the Wolfram Language Stephen Wolfram PDF All Chapters
100% (5)
Instant Download An Elementary Introduction to the Wolfram Language Stephen Wolfram PDF All Chapters
62 pages
The Radiology Handbook A Pocket Guide to Medical Imaging 1st Edition J S Benseler 2024 Scribd Download
100% (2)
The Radiology Handbook A Pocket Guide to Medical Imaging 1st Edition J S Benseler 2024 Scribd Download
81 pages
Instant Download Protect Yourself Using Insurance Security Techniques and Common Sense to Keep Yourself Your Family and Your Things Safe Silver Lake Editors PDF All Chapters
100% (20)
Instant Download Protect Yourself Using Insurance Security Techniques and Common Sense to Keep Yourself Your Family and Your Things Safe Silver Lake Editors PDF All Chapters
60 pages
Immediate download The Effects of Cross and Self Fertilisation in the Vegetable Kingdom 1st Edition Charles Darwin ebooks 2024
100% (5)
Immediate download The Effects of Cross and Self Fertilisation in the Vegetable Kingdom 1st Edition Charles Darwin ebooks 2024
51 pages
Physics of the Atmosphere and Climate 2nd Edition Salby 2024 Scribd Download
100% (2)
Physics of the Atmosphere and Climate 2nd Edition Salby 2024 Scribd Download
81 pages
An Elegant Puzzle Systems of Engineering Management Will Larson download
100% (5)
An Elegant Puzzle Systems of Engineering Management Will Larson download
67 pages
Get Regents English Power Pack Revised Edition Carol Chaitkin [Carol Chaitkin] free all chapters
100% (20)
Get Regents English Power Pack Revised Edition Carol Chaitkin [Carol Chaitkin] free all chapters
60 pages
Instant Access to Dueling Visions U S Strategy toward Eastern Europe under Eisenhower 1st Edition Ronald R. Krebs ebook Full Chapters
100% (5)
Instant Access to Dueling Visions U S Strategy toward Eastern Europe under Eisenhower 1st Edition Ronald R. Krebs ebook Full Chapters
32 pages
[FREE PDF sample] Intermediate College Korean 1st Edition Clare You ebooks
100% (12)
[FREE PDF sample] Intermediate College Korean 1st Edition Clare You ebooks
77 pages
Ebooks File Key Concepts in Renaissance Literature Key Concepts Literature 4 2008th Edition Hebron All Chapters
100% (18)
Ebooks File Key Concepts in Renaissance Literature Key Concepts Literature 4 2008th Edition Hebron All Chapters
84 pages
[FREE PDF sample] After Rwanda In Search of a New Ethics 1st Edition Jean-Paul Martinon ebooks
100% (5)
[FREE PDF sample] After Rwanda In Search of a New Ethics 1st Edition Jean-Paul Martinon ebooks
61 pages
Get Arab Islamic Biographical Index II 2nd cumulated and enlarged edition Edition Jutta Cikar free all chapters
100% (20)
Get Arab Islamic Biographical Index II 2nd cumulated and enlarged edition Edition Jutta Cikar free all chapters
53 pages
Download full Research Actionable Knowledge and Social Change Reclaiming Social Responsibility Through Research Partnerships 1st Edition Edward P. St. John ebook all chapters
100% (2)
Download full Research Actionable Knowledge and Social Change Reclaiming Social Responsibility Through Research Partnerships 1st Edition Edward P. St. John ebook all chapters
81 pages
Advanced Mathematical Methods in Science and Engineering Second Edition S.I. Hayek Download PDF
100% (17)
Advanced Mathematical Methods in Science and Engineering Second Edition S.I. Hayek Download PDF
84 pages
Instant Download The I and Being Human 1st Edition Norman N. Holland PDF All Chapters
100% (5)
Instant Download The I and Being Human 1st Edition Norman N. Holland PDF All Chapters
61 pages
PDF Virginia Woolf and the materiality of theory sex animal life 1st Edition Ryan download
100% (20)
PDF Virginia Woolf and the materiality of theory sex animal life 1st Edition Ryan download
60 pages
The Importance of Being Civil The Struggle for Political Decency John A. Hall all chapter instant download
100% (5)
The Importance of Being Civil The Struggle for Political Decency John A. Hall all chapter instant download
61 pages
[FREE PDF sample] Essential Public Health Theory and Practice 2nd Edition Stephen Gillam ebooks
100% (5)
[FREE PDF sample] Essential Public Health Theory and Practice 2nd Edition Stephen Gillam ebooks
61 pages
Nation Society and Culture in North Africa Cass Series History and Society in the Islamic World 1st Edition J. Mcdougall 2024 scribd download
100% (5)
Nation Society and Culture in North Africa Cass Series History and Society in the Islamic World 1st Edition J. Mcdougall 2024 scribd download
61 pages
Logic The Basics 2nd ed. Edition Beall All Chapters Instant Download
100% (3)
Logic The Basics 2nd ed. Edition Beall All Chapters Instant Download
61 pages
Download French Country Cooking Authentic Recipes from Every Region Françoise Branget ebook All Chapters PDF
100% (20)
Download French Country Cooking Authentic Recipes from Every Region Françoise Branget ebook All Chapters PDF
50 pages
Instant Access to D I Y Kids 1st Edition Ellen Lupton ebook Full Chapters
100% (5)
Instant Access to D I Y Kids 1st Edition Ellen Lupton ebook Full Chapters
61 pages
Buy ebook Advanced Kalman Filtering Least Squares and Modeling A Practical Handbook 1st Edition Bruce P. Gibbs cheap price
100% (5)
Buy ebook Advanced Kalman Filtering Least Squares and Modeling A Practical Handbook 1st Edition Bruce P. Gibbs cheap price
61 pages
Full Download Science Askew A Light hearted Look at the Scientific World 1st Edition Donald E Simanek (Author) PDF DOCX
100% (2)
Full Download Science Askew A Light hearted Look at the Scientific World 1st Edition Donald E Simanek (Author) PDF DOCX
81 pages
Complete Download Sunbelt Rising The Politics of Space Place and Region 1St Edition Edition Michelle Nickerson PDF All Chapters
100% (5)
Complete Download Sunbelt Rising The Politics of Space Place and Region 1St Edition Edition Michelle Nickerson PDF All Chapters
61 pages
Beginning Anomaly Detection Using Python-Based Deep Learning, 2nd Edition Suman Kalyan Adari - Quickly download the ebook to explore the full content
100% (1)
Beginning Anomaly Detection Using Python-Based Deep Learning, 2nd Edition Suman Kalyan Adari - Quickly download the ebook to explore the full content
70 pages
Arthur Miller His Life And Work Gottfried Martin instant download
100% (2)
Arthur Miller His Life And Work Gottfried Martin instant download
85 pages
Art Show Mystery Teachers Resource Guide Saddleback Educational Publishing pdf download
100% (2)
Art Show Mystery Teachers Resource Guide Saddleback Educational Publishing pdf download
42 pages
Asteroid Attack Graves Sue pdf download
100% (2)
Asteroid Attack Graves Sue pdf download
44 pages
Assamite Stefan Petrucha download
100% (2)
Assamite Stefan Petrucha download
90 pages
Arms Race Low Nic pdf download
100% (2)
Arms Race Low Nic pdf download
89 pages
Athena Squier Robert instant download
100% (2)
Athena Squier Robert instant download
39 pages
Art Show Mystery Eleanor Robins pdf download
100% (2)
Art Show Mystery Eleanor Robins pdf download
33 pages
Arrested Heart Hines Yvette instant download
100% (2)
Arrested Heart Hines Yvette instant download
34 pages
Download Complete Transatlantic Security Cooperation Counter Terrorism in the Twenty First Century 1st Edition Wyn Rees PDF for All Chapters
100% (2)
Download Complete Transatlantic Security Cooperation Counter Terrorism in the Twenty First Century 1st Edition Wyn Rees PDF for All Chapters
81 pages
Where can buy Fundamentals of Network Biology (567 Pages) ebook with cheap price
No ratings yet
Where can buy Fundamentals of Network Biology (567 Pages) ebook with cheap price
24 pages
Full download Financial ACCT2 2nd Edition 2e by Norman H. Godwin; C. Wayne Alderman pdf docx
No ratings yet
Full download Financial ACCT2 2nd Edition 2e by Norman H. Godwin; C. Wayne Alderman pdf docx
24 pages
Get First Course in Partial Differential Equations with Complex Variables and Transform Methods (Dover Books on Mathematics) A PDF ebook with Full Chapters Now
No ratings yet
Get First Course in Partial Differential Equations with Complex Variables and Transform Methods (Dover Books on Mathematics) A PDF ebook with Full Chapters Now
34 pages
Download ebooks file Power of Now A Guide to Spiritual Enlightenment The gd all chapters
No ratings yet
Download ebooks file Power of Now A Guide to Spiritual Enlightenment The gd all chapters
24 pages
Instant ebooks textbook Technology and Digital Media in the Early Years Tools for Teaching and Learning Rishi Bikramjit; Bandyopadhyay Subir; download all chapters
No ratings yet
Instant ebooks textbook Technology and Digital Media in the Early Years Tools for Teaching and Learning Rishi Bikramjit; Bandyopadhyay Subir; download all chapters
24 pages
AW The ACE Programmers Guide 0201699710 All Chapters Instant Download
No ratings yet
AW The ACE Programmers Guide 0201699710 All Chapters Instant Download
34 pages
DC SLD
No ratings yet
DC SLD
1 page
Journey To Enterprise Agility
No ratings yet
Journey To Enterprise Agility
1 page
Rollei P11 Universal Projector Info
No ratings yet
Rollei P11 Universal Projector Info
7 pages
Straightforward Protection For Your Business - Wherever You're Heading
No ratings yet
Straightforward Protection For Your Business - Wherever You're Heading
4 pages
Optiplex 5090 Desktop Owners Manual10 en Us CON MARCAS
No ratings yet
Optiplex 5090 Desktop Owners Manual10 en Us CON MARCAS
23 pages
Microprocessors and Microcontrollers
No ratings yet
Microprocessors and Microcontrollers
22 pages
Simulasi Fenomena Aliran Daya Pada Sistem Tenaga Listrik "Ieee 5-Bus" Berbasis Metode Numeris Dan Berbantuan Aplikasi Matlab
No ratings yet
Simulasi Fenomena Aliran Daya Pada Sistem Tenaga Listrik "Ieee 5-Bus" Berbasis Metode Numeris Dan Berbantuan Aplikasi Matlab
9 pages
Problem Solving Through Programming in C Week 4 Programming Assignment
No ratings yet
Problem Solving Through Programming in C Week 4 Programming Assignment
4 pages
GLE 53 4M+ Coupe PM
No ratings yet
GLE 53 4M+ Coupe PM
2 pages
Service Manual 1D81C (1) (1) - 51-102
No ratings yet
Service Manual 1D81C (1) (1) - 51-102
52 pages
Kdd2019tutorial 190804223750 PDF
No ratings yet
Kdd2019tutorial 190804223750 PDF
272 pages
Reusability report: Deep learning-based analysis of images and spectroscopy data with AtomAI
No ratings yet
Reusability report: Deep learning-based analysis of images and spectroscopy data with AtomAI
7 pages
Data Flow Diagram (DFD)
100% (1)
Data Flow Diagram (DFD)
17 pages
Personal Details: Expression of Interest ID
No ratings yet
Personal Details: Expression of Interest ID
4 pages
ElectroHydraulics Textbook
No ratings yet
ElectroHydraulics Textbook
150 pages
Braking System
No ratings yet
Braking System
11 pages
Seminar Front Sheet - Acknowledgement Format
No ratings yet
Seminar Front Sheet - Acknowledgement Format
4 pages
Liquid Electricity: Name-Rahul Raj ROLL NO-174013 REGD N0-1701287338 Branch-Ee
No ratings yet
Liquid Electricity: Name-Rahul Raj ROLL NO-174013 REGD N0-1701287338 Branch-Ee
18 pages
Portfolio Ferril Samal
No ratings yet
Portfolio Ferril Samal
84 pages
Association Analysis Basic Concepts Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
No ratings yet
Association Analysis Basic Concepts Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
102 pages
Lab 3 - Inheritance and Association
No ratings yet
Lab 3 - Inheritance and Association
17 pages
MSBTE Diploma Project Report Templet
No ratings yet
MSBTE Diploma Project Report Templet
18 pages
Project Charter: Applicable Processes/Procedures
No ratings yet
Project Charter: Applicable Processes/Procedures
6 pages
Decoding Liberation Samir Chopra download
100% (1)
Decoding Liberation Samir Chopra download
52 pages
Department of Computer Science: COMSATS University Islamabad, Abbottabad Campus
100% (1)
Department of Computer Science: COMSATS University Islamabad, Abbottabad Campus
6 pages
Full download Computer Vision ECCV 2020 16th European Conference Glasgow UK August 23 28 2020 Proceedings Part XV Andrea Vedaldi pdf docx
100% (4)
Full download Computer Vision ECCV 2020 16th European Conference Glasgow UK August 23 28 2020 Proceedings Part XV Andrea Vedaldi pdf docx
65 pages
T REC G.9807.1 202010 I!Amd2!PDF E
No ratings yet
T REC G.9807.1 202010 I!Amd2!PDF E
290 pages
Review of Related Literature
No ratings yet
Review of Related Literature
16 pages
Agile_Scrum_MCQs
No ratings yet
Agile_Scrum_MCQs
5 pages

Download ebooks file Beginning Anomaly Detection Using Python Based Deep Learning Implement Anomaly Detection Applications with Keras and PyTorch 2nd Edition Suman Kalyan Adari all chapters

Uploaded by

Download ebooks file Beginning Anomaly Detection Using Python Based Deep Learning Implement Anomaly Detection Applications with Keras and PyTorch 2nd Edition Suman Kalyan Adari all chapters

Uploaded by

Visit https://ebookgate.

com to download the full version and

Beginning Anomaly Detection Using Python Based Deep

Explore and download more ebooks at ebookgate.com

Deep Learning with PyTorch Second Edition MEAP V03 Howard

Python Machine Learning Machine Learning and Deep Learning

Deep Learning with TensorFlow Explore neural networks with

Deep Reinforcement Learning with Python RLHF for Chatbots

Deep Learning for Numerical Applications with SAS 1ed.

Detection Algorithms for Wireless Communications With

Image Analysis Classification and Change Detection in

Single Photon Generation and Detection Physics and

Suman Kalyan Adari

ISBN-13 (pbk): 979-8-8688-0007-8 ISBN-13 (electronic): 979-8-8688-0008-5

Copyright © 2024 by Suman Kalyan Adari, Sridhar Alla

About the Technical Reviewers������������������������������������������������������������������������������� xi

Chapter 1: Introduction to Anomaly Detection��������������������������������������������������������� 1

Chapter 2: Introduction to Data Science����������������������������������������������������������������� 23

Chapter 3: Introduction to Machine Learning������������������������������������������������������� 105

Chapter 4: Traditional Machine Learning Algorithms������������������������������������������� 135

Chapter 5: Introduction to Deep Learning������������������������������������������������������������ 183

Chapter 6: Autoencoders�������������������������������������������������������������������������������������� 261

Chapter 7: Generative Adversarial Networks������������������������������������������������������� 321

Chapter 8: Long Short-Term Memory Models������������������������������������������������������� 345

Chapter 9: Temporal Convolutional Networks������������������������������������������������������ 393

Chapter 10: Transformers������������������������������������������������������������������������������������� 443

Shubho Mohanty is a product thinker and creator, bringing

architecture, which has revolutionized the field of natural language processing as

• Categories of different anomalies

• What is anomaly detection?

• Where is anomaly detection used?

Figure 1-1. A couple swans by a lake

Figure 1-3. A black swan appears

Anomalies as Data Points

Figure 1-4. Density and tensile strength in a batch of screw samples

Anomalies in a Time Series

Personal Spending Pattern

Figure 1-8. Spending habits of a person over the course of a month

• Data point–based anomalies

Data Point–Based Anomalies

Change Point Detection

Anomaly Score Calculation

The Three Styles of Anomaly Detection

• Supervised anomaly detection

• Semi-supervised anomaly detection

Where Is Anomaly Detection Used?

• Employees may be assigned a specific connection to upload data

• An employee may regularly be accessing data as part of their work

• The detector would be looking at many variables, including who

1. Go to https://www.kaggle.com/c/titanic, where you are

3. Click Download All. You will be prompted to sign in. Create an

print(f"Pandas version: ", pandas.__version__)

You should see output similar to that displayed in Figure 2-4.

Pandas version: 2.0.0

Pandas, Scikit-Learn, and Matplotlib

Scikit-Learn is a machine learning library that covers a wide range of functionality

# Path to the training data

Figure 2-6. Path to the Titanic training dataset defined

# pd.read_csv() returns a Pandas DataFrame object

Figure 2-7. pd.read_csv() takes in the path as a parameter and returns a

Figure 2-8. Calling .head(2) on df to display two rows of df

You should see output that looks like Table 2-1.

Table 2-1. Output of Executing the Code in Figure 2-8

0 1 0 3 Braund, Mr. Owen male 22.0 1 0 A/5 21171 7.2500 NaN S

To get the table’s dimensions, run the following:

metric_rows = [ [0.9, 0.2, 0.3], [0.8, 0.3, 0.2] ]

The output should look like Table 2-2.

Table 2-2. Output of Executing the Code in Figure 2-10

0 0.9 0.2 0.3

metric_dict = {'Model1': [0.9, 0.8], 'Model2': [0.2, 0.3], ↵

Figure 2-11. Creating a dataframe from dictionaries

1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If you

1.F.5. Some states do not allow disclaimers of certain implied

Section 2. Information about the Mission of

Volunteers and financial support to provide volunteers with the

Section 3. Information about the Project

About the Technical Reviewers�� xi

Chapter 1: Introduction to Anomaly Detection�� 1

Chapter 2: Introduction to Data Science�� 23

Chapter 3: Introduction to Machine Learning�� 105

Chapter 4: Traditional Machine Learning Algorithms�� 135

Chapter 5: Introduction to Deep Learning�� 183

Chapter 6: Autoencoders�� 261

Chapter 7: Generative Adversarial Networks�� 321

Chapter 8: Long Short-Term Memory Models�� 345

Chapter 9: Temporal Convolutional Networks�� 393

Chapter 10: Transformers�� 443

Anomalies as Data Points

Anomalies in a Time Series

Personal Spending Pattern

Data Point–Based Anomalies

Change Point Detection

Anomaly Score Calculation

The Three Styles of Anomaly Detection

Where Is Anomaly Detection Used?

print(f"Pandas version: ", pandas.version)

Pandas, Scikit-Learn, and Matplotlib