Anomal Detection With Keras, Tensorflow, and Deep Learning: Click Here To Download The Source Code To This Post
Anomal Detection With Keras, Tensorflow, and Deep Learning: Click Here To Download The Source Code To This Post
/
A N O M A LY O U T L I R D T C T I O N DP LARNING K R A A N D T N O R F LO W T U TO R I A L
Anomal detection with Keras ,
TensorFlow and Deep Learning ,
Adrian Roserock on March 2 2020 ,
Click here to download the source code to this post
,
In this tutorial ou will learn how to perform anomal and outlier detection using
autoencoders, Keras, and TensorFlow.
,
ack in Januar I showed ou how to use standard machine learning models
to perform anomal detection and outlier detection in image datasets .
,
Our approach worked well enough ut it egged the question :
“ Could deep learning e used to improve the accurac of our anomal
detector ?
To answer such a question would require us to dive further down the rait hole
and answer questions such as :
What model architecture should we use ?
Are some deep neural network architectures etter than others for
/
anomal outlier detection ?
How do we handle the class imalance prolem ?
What if we wanted to train an unsupervised anomal detector ?
, , ’
This tutorial addresses all of these questions and the end of it ou ll e ale
to perform anomal detection in our own image datasets using deep learning .
,
To learn how to perform anomal detection with Keras TensorFlow and Deep,
,
Learning just keep reading !
Looking for the source code to this post ?
J U M P R I G H T TO T H D O W N LOA D C T I O N
, ’
In the 몭rst part of this tutorial we ll discuss anomal detection including , :
What makes anomal detection so challenging
Wh traditional deep learning methods are not su몭cient for anomal outlier /
detection
How autoencoders can e used for anomal detection
, ’
From there we ll implement an autoencoder architecture that can e used for
.
anomal detection using Keras and TensorFlow We ll then train our autoencoder ’
model in an unsupervised fashion .
, ’
Once the autoencoder is trained I ll show ou how ou can use the autoencoder
/ /
to identif outliers anomalies in oth our training testing set as well as in new
images that are not part of our dataset splits .
What is anomal detection ?
: , ,
Figure 1 In this tutorial we will detect anomalies with Keras TensorFlow and Deep Learning,
(image source).
To quote m intro to anomal detection tutorial :
“ Anomalies are de몭ned as events that deviate from the standard happen
, ’
rarel and don t follow the rest of the pattern
xamples of anomalies include :
“ .”
,
Large dips and spikes in the stock market due to world events
/
Defective items in a factor on a conveor elt
Contaminated samples in a la
Depending on our exact use case and application anomalies onl tpicall ,
.
occur 0 001 1- % of the time — that’s an incredil small fraction of the time.
The prolem is onl compounded the fact that there is a massive imalance
in our class laels .
, ,
de몭nition anomalies will rarel occur so the majorit of our data points will
e of valid events .
,
To detect anomalies machine learning researchers have created algorithms
, - ,
such as Isolation Forests One class VMs lliptic nvelopes and Local Outlier ,
; ,
Factor to help detect such events however all of these methods are rooted in
traditional machine learning .
What aout deep learning ?
Can deep learning e used for anomal detection as well ?
The answer is es — ut ou need to frame the prolem correctl.
How can deep learning and autoencoders e used
for anomal detection ?
,
As I discussed in m intro to autoencoder tutorial autoencoders are a tpe of
unsupervised neural network that can :
1 Accept an input set of data
2 -
Internall compress the data into a latent space representation
3 Reconstruct the input data from the latent representation
,
To accomplish this task an autoencoder uses two components an encoder and :
a decoder .
The encoder accepts the input data and compresses it into the latent space -
.
representation The decoder then attempts to reconstruct the input data from
the latent space .
- - ,
When trained in an end to end fashion the hidden laers of the network learn
몭lters that are roust and even capale of denoising the input data .
,
However what makes autoencoders so special from an anomal detection
.
perspective is the reconstruction loss When we train an autoencoder we ,
- -
tpicall measure the mean squared error M etween ( ) :
1 The input image
2 The reconstructed image from the autoencoder
,
The lower the loss the etter a jo the autoencoder is doing at reconstructing
the image .
’
Let s now suppose that we trained an autoencoder on the entiret of the MNIT
dataset :
: .
Figure 2 amples from the MNIT handwritten digit enchmarking dataset We will use
, ,
MNIT to develop an unsupervised autoencoder with Keras TensorFlow and deep
learning.
We then present the autoencoder with a digit and tell it to reconstruct it :
: , , ,
Figure 3 Reconstructing a digit from MNIT with autoencoders Keras TensorFlow and deep
learning.
We would expect the autoencoder to do a reall good jo at reconstructing the
, — and if we were
digit as that is exactl what the autoencoder was trained to do
to look at the M etween the input image and the reconstructed image, we
would 몭nd that it’s quite low.
’
Let s now suppose we presented our autoencoder with a photo of an elephant
and asked it to reconstruct it :
: ,
Figure 4 When we attempt to reconstruct an image with an autoencoder ut the result has a high
, . ,
M we have an outlier In this tutorial we will detect anomalies with autoencoders Keras and , ,
deep learning .
ince the autoencoder has never seen an elephant efore and more to the ,
,
point was never trained to reconstruct an elephant our M will e ver high , .
,
If the M of the reconstruction is high then we likel have an outlier .
Alon Agmon does a great jo explaining this concept in more detail in this
article .
Con몭guring our development environment
’
To follow along with toda s tutorial on anomal detection I recommend ou use ,
TensorFlow 2 0 . .
To con몭gure our sstem and install TensorFlow 2 0 ou can follow either m . ,
Uuntu or macO guide :
.
How to install TensorFlow 2 0 on Uuntu Uuntu 18 04 O CPU and ( . ;
optional NVIDIA GPU )
.
How to install TensorFlow 2 0 on macO Catalina and Mojave Oes ( )
:
Please note PImageearch does not support Windows — refer to our FAQ.
Project structure
“
Go ahead and gra the code from the Downloads section of this post Once ” .
’ , ’
ou ve unzipped the project ou ll e presented with the following structure :
→ Launch Jupter Noteook on Google Cola
Anomaly detection with Keras, TensorFlow, and Deep Learning
1. $ tree dirsfirst
2. .
3. ├── output
4. │ ├── autoencoder.model
5. │ └── images.pickle
6. ├── pyimagesearch
7. │ ├── __init__.py
8. │ └── convautoencoder.py
9. ├── find_anomalies.py
10. ├── plot.png
11. ├── recon_vis.png
12. └── train_unsupervised_autoencoder.py
13.
14. 2 directories, 8 files
/
responsile for uilding a Keras TensorFlow autoencoder implementation .
We will train an autoencoder with unlaeled data inside
train_unsupervised_autoencoder.py , resulting in the following outputs:
autoencoder.model : The serialized, trained autoencoder model.
images.pickle : A serialized set of unlaeled images for us to 몭nd anomalies
in .
plot.png : A plot consisting of our training loss curves.
recon_vis.png : A visualization 몭gure that compares samples of ground-truth
digit images versus each reconstructed image.
,
From there we will develop an anomal detector inside find_anomalies.py
and appl our autoencoder to reconstruct data and 몭nd anomalies .
Implementing our autoencoder for anomal
detection with Keras and TensorFlow
The 몭rst step to anomal detection with deep learning is to implement our
autoencoder script .
Our convolutional autoencoder implementation is identical to the ones from our
introduction to autoencoders post as well as our denoising autoencoders
; , ’
tutorial however we ll review it here as a matter of completeness — if ou want
additional details on autoencoders, e sure to refer to those posts.
Anomaly detection with Keras, TensorFlow, and Deep Learning
1. # import the necessary packages
2. from tensorflow.keras.layers import BatchNormalization
3. from tensorflow.keras.layers import Conv2D
4. from tensorflow.keras.layers import Conv2DTranspose
5. from tensorflow.keras.layers import LeakyReLU
6. from tensorflow.keras.layers import Activation
7. from tensorflow.keras.layers import Flatten
8. from tensorflow.keras.layers import Dense
9. from tensorflow.keras.layers import Reshape
10. from tensorflow.keras.layers import Input
11. from tensorflow.keras.models import Model
12. from tensorflow.keras import backend as K
13. import numpy as np
14.
15. class ConvAutoencoder:
16. @staticmethod
17. def build(width, height, depth, filters=(32, 64), latentDim=16):
18. # initialize the input shape to be "channels last" along with
19. # the channels dimension itself
20. # channels dimension itself
21. inputShape = (height, width, depth)
22. chanDim = 1
23.
24. # define the input to the encoder
25. inputs = Input(shape=inputShape)
26. x = inputs
27.
28. # loop over the number of filters
29. for f in filters:
30. # apply a CONV => RELU => BN operation
31. x = Conv2D(f, (3, 3), strides=2, padding="same")(x)
32. x = LeakyReLU(alpha=0.2)(x)
33. x = BatchNormalization(axis=chanDim)(x)
34.
35. # flatten the network and then construct our latent vector
36. volumeSize = K.int_shape(x)
37. x = Flatten()(x)
38. latent = Dense(latentDim)(x)
39.
40. # build the encoder model
41. encoder = Model(inputs, latent, name="encoder")
1 width : Width of the input images.
2 height : Height of the input images.
3 depth : Numer of channels in the images.
4 filters : Numer of 몭lters the encoder and decoder will learn,
respectivel
5 latentDim : Dimensionalit of the latent-space representation.
CONV => LeakyReLU => BN laers .
-
construct our latent space representation — this same representation will now
e used to reconstruct the original input image:
→ Launch Jupter Noteook on Google Cola
Anomaly detection with Keras, TensorFlow, and Deep Learning
43. # start building the decoder model which will accept the
44. # output of the encoder as its inputs
45. latentInputs = Input(shape=(latentDim,))
46. x = Dense(np.prod(volumeSize[1:]))(latentInputs)
47. x = Reshape((volumeSize[1], volumeSize[2], volumeSize[3]))(x)
48.
49. # loop over our number of filters again, but this time in
50. # reverse order
51. for f in filters[::1]:
52. # apply a CONV_TRANSPOSE => RELU => BN operation
53. x = Conv2DTranspose(f, (3, 3), strides=2,
54. padding="same")(x)
55. x = LeakyReLU(alpha=0.2)(x)
56. x = BatchNormalization(axis=chanDim)(x)
57.
58. # apply a single CONV_TRANSPOSE layer used to recover the
59. # original depth of the image
60. x = Conv2DTranspose(depth, (3, 3), padding="same")(x)
61. outputs = Activation("sigmoid")(x)
62.
63. # build the decoder model
64. decoder = Model(latentInputs, outputs, name="decoder")
65.
66. # our autoencoder is the encoder + decoder
67. autoencoder = Model(inputs, decoder(encoder(inputs)),
68. name="autoencoder")
69.
70. # return a 3tuple of the encoder, decoder, and autoencoder
71. return (encoder, decoder, autoencoder)
, -
Here we are take the latent input and use a full connected laer to reshape it
( . .,
into a 3D volume i e the image data ).
, ,
We loop over our 몭lters once again ut in reverse order appling a series of
CONV_TRANSPOSE => RELU => BN laers The . CONV_TRANSPOSE ’
laer s purpose
is to increase the volume size ack to the original image spatial dimensions .
,
Finall we uild the decoder model and construct the autoencoder Recall that .
an autoencoder consists of oth the encoder and decoder components We then .
- ,
return a 3 tuple of the encoder decoder and autoencoder , .
,
Again if ou need further details on the implementation of our autoencoder e ,
sure to review the aforementioned tutorials .
Implementing the anomal detection training script
,
With our autoencoder implemented we are now read to move on to our
training script .
,
director and insert the following code :
→ Launch Jupter Noteook on Google Cola
Anomaly detection with Keras, TensorFlow, and Deep Learning
1. # set the matplotlib backend so figures can be saved in the background
2. import matplotlib
3. matplotlib.use("Agg")
4.
5. # import the necessary packages
6. from pyimagesearch.convautoencoder import ConvAutoencoder
7. from tensorflow.keras.optimizers import Adam
8. from tensorflow.keras.datasets import mnist
9. from sklearn.model_selection import train_test_split
10. import matplotlib.pyplot as plt
11. import numpy as np
12. import argparse
13. import random
14. import pickle
15. import cv2
’ ,
Given that we re performing unsupervised learning next we ll de몭ne a function ’
to uild an unsupervised dataset :
→ Launch Jupter Noteook on Google Cola
Anomaly detection with Keras, TensorFlow, and Deep Learning
17. def build_unsupervised_dataset(data, labels, validLabel=1,
18. anomalyLabel=3, contam=0.01, seed=42):
19. # grab all indexes of the supplied class label that are *truly*
20. # that particular label, then grab the indexes of the image
21. # labels that will serve as our "anomalies"
22. validIdxs = np.where(labels == validLabel)[0]
23. anomalyIdxs = np.where(labels == anomalyLabel)[0]
24.
25. # randomly shuffle both sets of indexes
26. random.shuffle(validIdxs)
27. random.shuffle(anomalyIdxs)
28.
29. # compute the total number of anomaly data points to select
30. i = int(len(validIdxs) * contam)
31. anomalyIdxs = anomalyIdxs[:i]
32.
33. # use NumPy array indexing to extract both the valid images and
34. # "anomlay" images
35. validImages = data[validIdxs]
36. anomalyImages = data[anomalyIdxs]
37.
38. # stack the valid images and anomaly images together to form a
39. # single data matrix and then shuffle the rows
40. images = np.vstack([validImages, anomalyImages])
41. np.random.seed(seed)
42. np.random.shuffle(images)
43.
44. # return the set of images
45. return images
Our build_supervised_dataset ( . .,
function accepts a laeled dataset i e for
)
supervised learning and turns it into an unlaeled dataset i e for unsupervised ( . .,
learning ).
Given that our validLabel=1 ,
default onl MNIT numeral ones are
; , ’
selected however we ll also contaminate our dataset with a set of numeral
three images ( validLabel=3 ).
datapoints .
From our set of labels ( ),
and using the valid lael we generate a list of
validIdxs ( ).
Line 22 The exact same process is applied to gra anomalyIdxs
(Line 23). We then proceed to randoml shuffle (
the indices Lines 26 and 27 ).
Given our anomal contamination percentage we reduce our set of ,
anomalyIdxs (
Lines 30 and 31 ).
( - ).
returned Lines 40 45 Notice that the laels have een intentionall
,
discarded e몭ectivel making our dataset read for unsupervised learning .
Our next function will help us visualize predictions made our unsupervised
autoencoder :
→ Launch Jupter Noteook on Google Cola
Anomaly detection with Keras, TensorFlow, and Deep Learning
47. def visualize_predictions(decoded, gt, samples=10):
48. # initialize our list of output images
49. outputs = None
50.
51. # loop over our number of output samples
52. for i in range(0, samples):
53. # grab the original image and reconstructed image
54. original = (gt[i] * 255).astype("uint8")
55. recon = (decoded[i] * 255).astype("uint8")
56.
57. # stack the original and reconstructed image sidebyside
58. output = np.hstack([original, recon])
59.
60. # if the outputs array is empty, initialize it as the current
61. # sidebyside image display
62. if outputs is None:
63. outputs = output
64.
65. # otherwise, vertically stack the outputs
66. else:
67. outputs = np.vstack([outputs, output])
68.
69. # return the output images
70. return outputs
input images to our autoencoder as well as their corresponding output
samples .
parameter This code should look familiar if ou read either m
introduction to autoencoders guide or denoising autoencoder tutorial .
’
Now that we ve de몭ned our imports and necessar functions we ll go ahead and , ’
parse our command line arguments :
→ Launch Jupter Noteook on Google Cola
Anomaly detection with Keras, TensorFlow, and Deep Learning
72. # construct the argument parse and parse the arguments
73. ap = argparse.ArgumentParser()
74. ap.add_argument("d", "dataset", type=str, required=True,
75. help="path to output dataset file")
76. ap.add_argument("m", "model", type=str, required=True,
77. help="path to output trained autoencoder")
78. ap.add_argument("v", "vis", type=str, default="recon_vis.png",
79. help="path to output reconstruction visualization file")
80. ap.add_argument("p", "plot", type=str, default="plot.png",
81. help="path to output plot file")
82. args = vars(ap.parse_args())
,
Our function accepts four command line arguments all of which are output 몭le
paths :
dataset : De몭nes the path to our output dataset 몭le
model : peci몭es the path to our output trained autoencoder
vis : An optional argument that speci몭es the output visualization 몭le path.
default, I’ve named this 몭le recon_vis.png ; however, ou are welcome to
override it with a di몭erent path and 몭lename
plot : Optionall indicates the path to our output training histor plot.
,
default the plot will e named plot.png in the current working director
’
We re now read to prepare our data for training :
→ Launch Jupter Noteook on Google Cola
Anomaly detection with Keras, TensorFlow, and Deep Learning
84. # initialize the number of epochs to train for, initial learning rate,
85. # and batch size
86. EPOCHS = 20
87. INIT_LR = 1e3
88. BS = 32
89.
90. # load the MNIST dataset
91. print("[INFO] loading MNIST dataset...")
92. ((trainX, trainY), (testX, testY)) = mnist.load_data()
93.
94. # build our unsupervised dataset of images with a small amount of
95. # contamination (i.e., anomalies) added into it
96. print("[INFO] creating unsupervised dataset...")
97. images = build_unsupervised_dataset(trainX, trainY, validLabel=1,
98. anomalyLabel=3, contam=0.01)
99.
100. # add a channel dimension to every image in the dataset, then scale
101. # the pixel intensities to the range [0, 1]
102. images = np.expand_dims(images, axis=1)
103. images = images.astype("float32") / 255.0
104.
105. # construct the training and testing split
106. (trainX, testX) = train_test_split(images, test_size=0.2,
107. random_state=42)
, :( )
First we initialize three hperparameters 1 the numer of training epochs 2 ,( )
, ( )
the initial learning rate and 3 our atch size Lines 86 88( - ).
Line 92 loads MNIT while Lines 97 and 98 uild our unsupervised dataset with
1% contamination (i.e., anomalies) added into it.
,
From here forward our dataset does not have laels and our autoencoder will ,
attempt to learn patterns without prior knowledge of what the data is .
’
Now that we ve uilt out unsupervised dataset it consists of 99 , % numeral
ones and 1 % numeral threes (i.e., anomalies/outliers).
,
From there we preprocess our dataset adding a channel dimension and
[ , ](
scaling pixel intensities to the range 0 1 Lines 102 and 103 ).
- ’ ,
Using scikit learn s convenience function we then split data into 80 % training
and 20% testing sets (Lines 106 and 107).
, ’
Our data is read to go so let s uild our autoencoder and train it :
→ Launch Jupter Noteook on Google Cola
Anomaly detection with Keras, TensorFlow, and Deep Learning
109. # construct our convolutional autoencoder
110. print("[INFO] building autoencoder...")
111. (encoder, decoder, autoencoder) = ConvAutoencoder.build(28, 28, 1)
112. opt = Adam(lr=INIT_LR, decay=INIT_LR / EPOCHS)
113. autoencoder.compile(loss="mse", optimizer=opt)
114.
115. # train the convolutional autoencoder
116. H = autoencoder.fit(
117. trainX, trainX,
118. validation_data=(testX, testX),
119. epochs=EPOCHS,
120. batch_size=BS)
121.
122. # use the convolutional autoencoder to make predictions on the
123. # testing images, construct the visualization, and then save it
124. # to disk
125. print("[INFO] making predictions...")
126. decoded = autoencoder.predict(testX)
127. vis = visualize_predictions(decoded, testX)
128. cv2.imwrite(args["vis"], vis)
-
mean squared error - loss ( -
Lines 111 113 ).
-
Lines 116 120 launch the training procedure with TensorFlow Keras Our / .
autoencoder will attempt to learn how to reconstruct the original input images .
Images that cannot e easil reconstructed will have a large loss value .
, ’
Once training is complete we ll need a wa to evaluate and visuall inspect our
. ,
results Luckil we have our visualize_predictions convenience function in
. -
our ack pocket Lines 126 128 make predictions on the test set uild a ,
,
visualization image from the results and write the output image to disk .
, ’
From here we ll wrap up :
→ Launch Jupter Noteook on Google Cola
Anomaly detection with Keras, TensorFlow, and Deep Learning
130. # construct a plot that plots and saves the training history
131. N = np.arange(0, EPOCHS)
132. plt.style.use("ggplot")
133. plt.figure()
134. plt.plot(N, H.history["loss"], label="train_loss")
135. plt.plot(N, H.history["val_loss"], label="val_loss")
136. plt.title("Training Loss")
137. plt.xlabel("Epoch #")
138. plt.ylabel("Loss")
139. plt.legend(loc="lower left")
140. plt.savefig(args["plot"])
141.
142. # serialize the image data to disk
143. print("[INFO] saving image data...")
144. f = open(args["dataset"], "wb")
145. f.write(pickle.dumps(images))
146. f.close()
147.
148. # serialize the autoencoder model to disk
149. print("[INFO] saving autoencoder...")
150. autoencoder.save(args["model"], save_format="h5")
To close out we, :
Plot our training histor loss curves and export the resulting plot to disk Lines (
-
131 140 )
,
erialize our unsupervised sampled MNIT dataset to disk as a Pthon pickle
(Lines 144-146)
ave our trained autoencoder (
Line 150)
Fantastic jo developing the unsupervised autoencoder training script .
Training our anomal detector using Keras and
TensorFlow
, “
To train our anomal detector make sure ou use the Downloads section of ”
this tutorial to download the source code .
,
From there 몭re up a terminal and execute the following command :
→ Launch Jupter Noteook on Google Cola
Anomaly detection with Keras, TensorFlow, and Deep Learning
130. $ python train_unsupervised_autoencoder.py \
131. dataset output/images.pickle \
132. model output/autoencoder.model
133. [INFO] loading MNIST dataset...
134. [INFO] creating unsupervised dataset...
135. [INFO] building autoencoder...
136. Train on 5447 samples, validate on 1362 samples
137. Epoch 1/20
138. 5447/5447 [==============================] 7s 1ms/sample loss: 0.0421 val_loss:
0.0405
139. Epoch 2/20
140. 5447/5447 [==============================] 6s 1ms/sample loss: 0.0129 val_loss:
0.0306
141. Epoch 3/20
142. 5447/5447 [==============================] 6s 1ms/sample loss: 0.0045 val_loss:
0.0088
143. Epoch 4/20
144. 5447/5447 [==============================] 6s 1ms/sample loss: 0.0033 val_loss:
0.0037
145. Epoch 5/20
146. 5447/5447 [==============================] 6s 1ms/sample loss: 0.0029 val_loss:
0.0027
147. ...
148. Epoch 16/20
149. 5447/5447 [==============================] 6s 1ms/sample loss: 0.0018 val_loss:
0.0020
150. Epoch 17/20
151. 5447/5447 [==============================] 6s 1ms/sample loss: 0.0018 val_loss:
0.0020
152. Epoch 18/20
153. 5447/5447 [==============================] 6s 1ms/sample loss: 0.0017 val_loss:
0.0021
154. Epoch 19/20
155. 5447/5447 [==============================] 6s 1ms/sample loss: 0.0018 val_loss:
0.0021
156. Epoch 20/20
157. 5447/5447 [==============================] 6s 1ms/sample loss: 0.0016 val_loss:
0.0019
158. [INFO] making predictions...
159. [INFO] saving image data...
160. [INFO] saving autoencoder...
: , ,
Figure 5 In this plot we have our loss curves from training an autoencoder with Keras TensorFlow and
deep learning .
~
Training the entire model took 2 minutes on m 3Ghz Intel Xeon processor and ,
,
as our training histor plot in Figure 5 shows our training is quite stale .
,
Furthermore we can look at our output recon_vis.png visualization 몭le to see
that our autoencoder has learned to correctl reconstruct the 1 digit from the
MNIT dataset :
:
Figure 6 Reconstructing a handwritten digit using a deep learning autoencoder
trained with Keras and TensorFlow .
,
efore proceeding to the next section ou should verif that oth the
→ Launch Jupter Noteook on Google Cola
Anomaly detection with Keras, TensorFlow, and Deep Learning
130. $ ls output/
131. autoencoder.model images.pickle
’
You ll e needing these 몭les in the next section .
Implementing our script to 몭nd anomalies outliers /
using the autoencoder
Our goal is to now :
1 -
Take our pre trained autoencoder
2 ( . .,
Use it to make predictions i e reconstruct the digits in our dataset )
3 Measure the M etween the original input images and reconstructions
4 ,
Compute quanitles for the Ms and use these quantiles to identif
outliers and anomalies
Open up the find_anomalies.py , ’
몭le and let s get started:
→ Launch Jupter Noteook on Google Cola
Anomaly detection with Keras, TensorFlow, and Deep Learning
1. # import the necessary packages
2. from tensorflow.keras.models import load_model
3. import numpy as np
4. import argparse
5. import pickle
6. import cv2
7.
8. # construct the argument parse and parse the arguments
9. ap = argparse.ArgumentParser()
10. ap.add_argument("d", "dataset", type=str, required=True,
11. help="path to input image dataset file")
12. ap.add_argument("m", "model", type=str, required=True,
13. help="path to trained autoencoder")
14. ap.add_argument("q", "quantile", type=float, default=0.999,
15. help="qth quantile used to identify outliers")
16. args = vars(ap.parse_args())
’
We ll egin with imports and command line arguments The . load_model import
dataset : The path to our input dataset pickle 몭le that was exported to
disk as a result of our unsupervised training script
model : Our trained autoencoder path
quantile : The q-th quantile to identif outliers
, ’ () , ( )
From here we ll 1 load our autoencoder and data and 2 make predictions :
→ Launch Jupter Noteook on Google Cola
Anomaly detection with Keras, TensorFlow, and Deep Learning
18. # load the model and image data from disk
19. print("[INFO] loading autoencoder and image data...")
20. autoencoder = load_model(args["model"])
21. images = pickle.loads(open(args["dataset"], "rb").read())
22.
23. # make predictions on our image data and initialize our list of
24. # reconstruction errors
25. decoded = autoencoder.predict(images)
26. errors = []
27.
28. # loop over all original images and their corresponding
29. # reconstructions
30. for (image, recon) in zip(images, decoded):
31. # compute the mean squared error between the groundtruth image
32. # and the reconstructed image, then add it to our list of errors
33. mse = np.mean((image recon) ** 2)
34. errors.append(mse)
(
and attempt to reconstruct the inputs Line 25 ).
,
Looping over the original and reconstructed images Lines 30 34 compute the -
-
mean squared error etween the ground truth and reconstructed image uilding ,
a list of errors .
, ’
From here we ll detect the anomalies :
→ Launch Jupter Noteook on Google Cola
Anomaly detection with Keras, TensorFlow, and Deep Learning
36. # compute the qth quantile of the errors which serves as our
37. # threshold to identify anomalies any data point that our model
38. # reconstructed with > threshold error will be marked as an outlier
39. thresh = np.quantile(errors, args["quantile"])
40. idxs = np.where(np.array(errors) >= thresh)[0]
41. print("[INFO] mse threshold: {}".format(thresh))
42. print("[INFO] {} outliers found".format(len(idxs)))
-
Lines 39 computes the q th quantile of the error — this value will serve as our
threshold to detect outliers .
outlier .
, ’
Next we ll loop over anomal indices in our dataset :
→ Launch Jupter Noteook on Google Cola
Anomaly detection with Keras, TensorFlow, and Deep Learning
44. # initialize the outputs array
45. outputs = None
46.
47. # loop over the indexes of images with a high mean squared error term
48. for i in idxs:
49. # grab the original image and reconstructed image
50. original = (images[i] * 255).astype("uint8")
51. recon = (decoded[i] * 255).astype("uint8")
52.
53. # stack the original and reconstructed image sidebyside
54. output = np.hstack([original, recon])
55.
56. # if the outputs array is empty, initialize it as the current
57. # sidebyside image display
58. if outputs is None:
59. outputs = output
60.
61. # otherwise, vertically stack the outputs
62. else:
63. outputs = np.vstack([outputs, output])
64.
65. # show the output visualization
66. cv2.imshow("Output", outputs)
67. cv2.waitKey(0)
,
Inside the loop we arrange each original and recon - -
image side side ,
verticall stacking all results as an outputs .
image Lines 66 and 67 displa the
resulting image .
Anomal detection with deep learning results
We are now read to detect anomalies in our dataset using deep learning and
our trained Keras TensorFlow model/ .
’ “ ”
tart making sure ou ve used the Downloads section of this tutorial to
— from there ou can execute the following
download the source code
command to detect anomalies in our dataset:
→ Launch Jupter Noteook on Google Cola
Anomaly detection with Keras, TensorFlow, and Deep Learning
44. $ python find_anomalies.py dataset output/images.pickle \
45. model output/autoencoder.model
46. [INFO] loading autoencoder and image data...
47. [INFO] mse threshold: 0.02863757349550724
47. [INFO] mse threshold: 0.02863757349550724
48. [INFO] 7 outliers found
~ . ,
With an M threshold of 0 0286 which corresponds to the 99 9 . % quantile, our
,
autoencoder was ale to 몭nd seven outliers 몭ve of which are correctl laeled
as such :
:
Figure 7 hown are anomalies that have een detected from reconstructing data with a Keras-
.
ased autoencoder
( ),
the MNIT dataset 67 total samples the autoencoder does a surpsingl good
,
jo at reconstructing them given the limited data — ut we can see that the
M for these reconstructions was higher than the rest .
considered suspicious as well .
Deep learning practitioners can use autoencoders to spot outliers in their
datasets even if the image was correctl laeled !
Images that are correctl laeled ut demonstrate a prolem for a deep neural
network architecture should e indicative of a suclass of images that are worth
exploring more — autoencoders can help ou spot these outlier suclasses.
M autoencoder anomal detection accurac is not
M autoencoder anomal detection accurac is not
good enough What should I do . ?
:
Figure 8 Anomal detection with unsupervised deep learning models is an active area of
.( :
research and is far from solved image source Figure 4 of Deep Learning for Anomal
:
Detection A urve Chalapath and Chawla )
, /
Unsupervised learning and speci몭call anomal outlier detection is far from a ,
, ,
solved area of machine learning deep learning and computer vision — there is
- -
no o몭 the shelf solution for anomal detection that is 100 % correct.
I would recommend ou read the 2019 surve paper Deep Learning for ,
: ,
Anomal Detection A urve Chalapath and Chawla for more information
- - - -
on the current state of the art on deep learning ased anomal detection .
,
While promising keep in mind that the 몭eld is rapidl evolving ut again , ,
/
anomal outlier detection are far from solved prolems .
' ?
What s next I recommend PImageearch
Universit .
Course information :
• • : /
13 total classes 21h 2m video Last updated 4 2021
★★★★★ 4.84 (128 Ratings) • 3,690 tudents nrolled
I strongl elieve that if ou had the right teacher ou could master
computer vision and deep learning .
Do ou think learning computer vision and deep learning has to e
- , ,
time consuming overwhelming and complicated Or has to involve ?
?
complex mathematics and equations Or requires a degree in
computer science ?
’
That s not the case .
All ou need to master computer vision and deep learning is for
,
someone to explain things to ou in simple intuitive terms And that s . ’
.
exactl what I do M mission is to change education and how complex
Arti몭cial Intelligence topics are taught .
' ,
If ou re serious aout learning computer vision our next stop should
,
e PImageearch Universit the most comprehensive computer
, ,
vision deep learning and OpenCV course online toda Here ou ll . ’
learn how to successfull and con몭dentl appl computer vision to
learn how to successfull and con몭dentl appl computer vision to
, , .
our work research and projects Join me in computer vision master .
✓ 13 courses on essential computer vision, deep learning, and
OpenCV topics
✓ 13 Certi몭cates of Completion
✓ 21h 2m on-demand video
✓ rand new courses released ever month, ensuring ou can keep
up with state-of-the-art techniques
✓ Pre-con몭gured Jupter Noteooks in Google Cola
✓ Run all code examples in our we rowser — works on Windows,
macO, and Linux (no dev environment con몭guration required!)
✓ Access to centralized code repos for all 400+ tutorials on
PImageearch
✓ as one-click downloads for code, datasets, pre-trained models,
etc.
✓ Access on moile, laptop, desktop, etc.
C L I C K H R TO J O I N P Y I M AG A R C H U N I V R I T Y
ummar
,
In this tutorial ou learned how to perform anomal and outlier detection using
, ,
Keras TensorFlow and Deep Learning .
Traditional classi몭cation architectures are not su몭cient for anomal detection as :
The are not meant to e used in an unsupervised manner
The are not meant to e used in an unsupervised manner
The struggle to handle severe class imalance
,
And therefore the struggle to correctl recall the outliers
Autoencoders on the other hand :
Are naturall suited for unsupervised prolems
Learn to oth encode and reconstruct input images
Can detect outliers measuring the error etween the encoded image and
reconstructed image
We trained our autoencoder on the MNIT dataset in an unsupervised fashion
,
removing the class laels graing all laels with a value of 1 , and then using
1% of the 3 laels .
,
As our results demonstrated our autoencoder was ale to pick out man of the
3 digits that were used to “contaminate” our 1 ‘s.
-
If ou enjoed this tutorial on deep learning ased anomal detection e sure ,
!
to let me know in the comments Your feedack helps guide me on what
tutorials to write in the future .
(
To download the source code to this log post and e noti몭ed when future
tutorials are pulished here on PImageearch), just enter our email address
in the form elow!
Download the ource Code and FR 17 page -
Resource Guide
.
nter our email address elow to get a zip of the code and a FR 17 page -
, ,
Resource Guide on Computer Vision OpenCV and Deep Learning Inside ou ll . '
- , , ,
몭nd m hand picked tutorials ooks courses and liraries to help ou master CV
and DL !
Your email address D O W N L OA D T H C O D !
Aout the Author
, ’ , .
Hi there I m Adrian Roserock PhD All too often I see developers students , ,
, ,
and researchers wasting their time studing the wrong things and generall
struggling to get started with Computer Vision, Deep Learning, and OpenCV. I
created this wesite to show ou what I elieve is the est possile wa to
get our start .
Previous Article :
,
Denoising autoencoders with Keras TensorFlow and Deep Learning ,
Next Article :
. -
NVIDIA Jetson Nano img pre con몭gured for Deep Learning and
Computer Vision
:
24 responses to Anomal detection with Keras ,
,
TensorFlow and Deep Learning
Darshan Ier
,
March 2 2020 at 3 17 pm :
,
Wonderful article Adrian as alwas
Adrian Roserock
,
March 2 2020 at 4 10 pm :
, ! ’
Thank ou for the kind words Darshan I m glad ou enjoed it .
Victor
,
March 3 2020 at 4 41 pm:
!
Hi Adrian could ou explain me wh is necessar add a channel dimension if
?
the images are grascale alread the MNIT dataset alread have images
?
with 1 channel or not What is the purpose of adding an extra dimension at
the end ?. Thanks, great tutorial!
“From there, we preprocess our dataset adding a channel dimension and
scaling pixel intensities to the range [0, 1] (Lines 102 and 103).”
Adrian Roserock
,
March 4 2020 at 1 25 pm:
; ,
You are correct that images are implied to e grascale however consider
.
the architecture of a CNN We need to specif the numer of channels and
“ ”
our input data must have the shape HxWxD where H is the height W is ,“ ”
, “ ” . “ ”
the width and D is the depth We add in that D channel dimension
(setting D=1) in order to make the dataset compatile with our architecture.
If ou used RG images then D=3.
areer ul amin
,
March 4 2020 at 6 37 am:
, .
sir please using lstm anomal detection in surveilance vedios how i detect
anomal using lstm in surveilance vedios .
Adrian Roserock
,
March 4 2020 at 1 23 pm:
’ -
I don t have an tutorials on LTM ased anomal detection in videos I .
/
ma cover that in a future tutorial ut I cannot guarantee if when that ma
e .
Anton mith
, :
March 4 2020 at 12 34 pm
!!
Asolute greatness
I eagerl await more, and look forward to starting the full Guru’s course later
this ear!
For the second time ὠ
Adrian Roserock
,
March 4 2020 at 1 23 pm :
Thanks Anton !
Nazia
, :
March 5 2020 at 12 12 am
,
Hi Adrian
.
wonderful tutorial Please make another tutorial ased on LTM anomal
.
detection looking forward to our tutorial
Adrian Roserock
,
March 5 2020 at 2 19 pm :
’ ’
I m sure this won t e m last tutorial on anomal detection .
zhi zhou
zhi zhou
, :
March 5 2020 at 5 00 am
Thanks for our great log Adrian !
Just one question: wh we need 1% of 3 digits when training? can we use 1
digits for training model and test on the rest data including 1 and 3? Do ou
smulate the real world that we cannot get 100% clear data for our training?
Adrian Roserock
,
March 11 2020 at 4 35 pm :
, ’ -
Correct I m demonstrating that in the real world ou ma not have 100 %
perfectl laeled data without anomalies or outliers .
Kino
, :
March 7 2020 at 11 06 am
, !
Hello thanks for the tutorial i onl have one question :
’ ’
Wh don t we jus t remove the anomalies from the dataset and train the
?
autoencoder on our valide images onl so like that when we pass an
, ’
anomal to the model we ll automaticall get a high loss .
Adrian Roserock
,
March 11 2020 at 4 34 pm :
That ma or ma not e possile for our use case .
What happens if ou have a dataset of millions or illions of images You ?
’
wouldn t want to manuall go through them all and determine which are
. /
versus are not anomalies Anomal outlier detection methods can help ou
spot such data points .
Martin
,
March 15 2020 at 12 43 am :
.
Nice article I have few questions :
– wh ou use sigmoid activation in decoder, when ou use ‘mse’ loss and
not cross-entrop to train the network?
– the numer of conv_transpose laers in decoder seems to e greater
one than numer of conv laers in encoder. An reason for that?
– did ou tr to train classi몭cation network on top of autoencoder? It might
help with selecting which parts of reconstruction error are actuall useful to
,
decect anomal in case ou have at least some laeled anomalies
Thx!
Mujee
,
March 16 2020 at 4 16 am:
? ,
How can I run this code on videos Additionall I have noticed that ou also
-
have a post aout doing the same thing with opencv and scikit learn which
.
seems faster alternative Do ou have a source code example for this ?
Thanks .
pankaj
,
March 17 2020 at 9 33 am:
Hello Adrain ,
Kudos for ringing this series on Autoencoders
Could ou please put some tutorials regarding Variational Autoencoders and
its practical application ?
(
What are the di몭erent tpe of other Autoencoders kinda a rief intro )
That gonna e reall helpful !
est Regards ὤ
Adrian Roserock
,
March 19 2020 at 9 36 am :
, . /
Thanks for the suggestion Pankaj I cannot guarantee if when I will cover it
ut I’ll certainl consider it.
KT
,
March 22 2020 at 4 20 pm :
, ’ .
thanks a lot est article on the suject I ve seen looking forward to more
great posts.
Adrian Roserock
,
March 25 2020 at 1 26 pm :
Thank ou for the kind words KT , !
Niv
,
April 4 2020 at 3 12 am :
.
Great article Are ou going to have more practical usages for Anomal
?
Detection like viration analsis and general time series Meanwhile an ,
good references ?
Adrian Roserock
,
April 9 2020 at 8 44 am :
I primaril cover computer vision here on the PImageearch log I .
appreciate the suggestion ut general time series data is likel something I
’
won t e covering .
Walid
,
April 13 2020 at 10 23 am :
Great article and wonderful illustration
I am using the staing at home now to catch the acklog I have from our
exceptional logs .
, . ()
While I was reading U expected that ou will use model evaluate to get the
loss directl ut ou did it on another wa recomputing the di몭erence
.
etween predicted and actual An speci몭c reason ou did not use
.
model evaluate ()
Thanks a lot and I hope everone is safe
Ricardo
,
April 15 2020 at 3 58 am :
,
Hi Adrian nice article .
’
I m wondering wh is the purpose of having a validation set when training an
. , ’
autoencoder In this case an autoencoder isn t meant to over몭t the training
set so it has a lower MR compared with a di몭erent set of data ?
Thanks
Comment section
, ,
He Adrian Roserock here author and creator of PImageearch While I .
,
love hearing from readers a couple ears ago I made the tough decision to
:
no longer o몭er 1 1 help over log post comments .
+
At the time I was receiving 200 emails per da and another 100 log +
.
post comments I simpl did not have the time to moderate and respond to
,
them all and the shear volume of requests was taking a toll on me .
,
Instead m goal is to do the most good for the computer vision deep ,
,
learning and OpenCV communit at large focusing m time on
- , ,
authoring high qualit log posts tutorials and ooks courses / .
If ou need help learning computer vision and deep learning I suggest ,
ou refer to m full catalog of ooks and courses — the have helped
, ,
tens of thousands of developers students and researchers just like
,
ourself learn Computer Vision Deep Learning and OpenCV , .
Click here to rowse m full catalog .
PImageearch Universit — NOW NROLLING!
, ,
You can master Computer Vision Deep Learning and OpenCV
:
Course information
• • : /
13 total classes 21h 2m video Last updated 4 2021
★★★★★
4.84 (128 Ratings) • 3,690 tudents nrolled
✓ 13 courses on essential computer vision, deep learning, and OpenCV topics
✓ 13 Certi몭cates of Completion
✓ 21h 2m on-demand video
✓ rand new courses released ever month, ensuring ou can keep up with state-of-
-
the art techniques
✓ Pre-con몭gured Jupter Noteooks in Google Cola
✓ Run all code examples in our we rowser — works on Windows, macO, and Linux
(no dev environment con몭guration required!)
✓ Access to centralized code repos for all 400+ tutorials on PImageearch
✓ as one-click downloads for code, datasets, pre-trained models, etc.
✓ Access on moile, laptop, desktop, etc.
JOIN NOW
Picked For You
Mixing normal images and adversarial images when training CNNs
Defending against adversarial image attacks with Keras and TensorFlow
(
Adversarial attacks with FGM Fast Gradient ign Method)
Contrastive Loss for iamese Networks with Keras and TensorFlow
,
Comparing images for similarit using siamese networks Keras and TensorFlow ,
imilar articles
ANNOUNCMNT P Y I M AG C O N F
: ,
PImageConf 2018 The practical hands on computer vision and deep -
learning conference
Januar 8 2018 ,
DP LARNING RAPRRY PI T U TO R I A L
, -
A fun hands on deep learning project for eginners students and hoists , ,
,
April 30 2018
O P N C V T U TO R I A L T U TO R I A L
Image Arithmetic OpenCV
Januar 19 2021 ,
You can learn Computer Vision Deep Learning and , ,
OpenCV .
, ,
Get our FR 17 page Computer Vision OpenCV and Deep Learning Resource
. ’ - , , ,
Guide PDF Inside ou ll 몭nd m hand picked tutorials ooks courses and liraries to
help ou master CV and DL .
Your email address D O W N L OA D F O R F R
Topics Machine Learning and Computer Vision
Medical Computer Vision
Deep Learning
Optical Character Recognition OCR( )
Dli Lirar
( )
Optical Character Recognition OCR
/
medded IoT and Computer Vision
Oject Detection
Oject Tracking
Face Applications
OpenCV Tutorials
Image Processing
Rasperr Pi
Interviews
Keras
, ,
FR CV DL and OpenCV Crash Course Get tarted
Practical Pthon and OpenCV OpenCV Install Guides
Deep Learning for Computer Vision with Aout
Pthon
FAQ
PImageearch Gurus Course
log
Rasperr Pi for Computer Vision
Contact
Privac Polic
© 2021 PImageearch. All Rights Reserved.