Automated Image Data Preprocessing With Deep Reinforcement Learning
Automated Image Data Preprocessing With Deep Reinforcement Learning
Reinforcement Learning
Tran Ngoc Minh1 , Mathieu Sinn2 , Hoang Thanh Lam3 , Martin Wistuba4
IBM Research Dublin, Ireland
1,4
{m.n.tran, martin.wistuba}@ibm.com, 2,3 {mathsinn, t.l.hoang}@ie.ibm.com
arXiv:1806.05886v2 [cs.CV] 29 Apr 2021
Abstract
Data preparation, i.e. the process of transforming raw data into a format that can
be used for training effective machine learning models, is a tedious and time-
consuming task. For image data, preprocessing typically involves a sequence
of basic transformations such as cropping, filtering, rotating or flipping images.
Currently, data scientists decide manually based on their experience which trans-
formations to apply in which particular order to a given image data set. Besides
constituting a bottleneck in real-world data science projects, manual image data
preprocessing may yield suboptimal results as data scientists need to rely on in-
tuition or trial-and-error approaches when exploring the space of possible image
transformations and thus might not be able to discover the most effective ones. To
mitigate the inefficiency and potential ineffectiveness of manual data preprocessing,
this paper proposes a deep reinforcement learning framework to automatically
discover the optimal data preprocessing steps for training an image classifier. The
framework takes as input sets of labeled images and predefined preprocessing
transformations. It jointly learns the classifier and the optimal preprocessing trans-
formations for individual images. Experimental results show that the proposed
approach not only improves the accuracy of image classifiers, but also makes them
substantially more robust to noisy inputs at test time.
1 Introduction
Data preprocessing, i.e. the process of transforming raw data into a format that can be used for training
effective machine learning models, accounts for 50-80% of the time spent on typical data science
projects [3, 11]. Besides constituting a bottleneck, manual data preprocessing is also ineffective as it
only explores a small part of the space of possible transformations and thus might not discover the
most effective ones for removing noise and/or extracting meaningful features from a given set of
raw data. Unstructured data1 are particularly challenging in this regard as their preparation requires
deep expertise in fields such as Computer Vision or Natural Language Preprocessing; moreover,
because of the high complexity of machine learning models dealing with such data, the effect of data
preprocessing is particularly difficult to understand. Hence, automating data preprocessing is highly
desirable as it increases the productivity of data scientists and may lead to better performance of the
resulting machine learning models.
Despite of its high potential value, the automation of data preprocessing has been mostly overlooked
by the machine learning community, with only few prior works on this subject [3, 13]. Recently, Bilalli
et al. [3] suggested a method for automating data preprocessing via meta-learning. However, their
approach only focuses on structured data with a limited number of relatively simple preprocessing
1
By unstructured data we mean images, text and time series, while we use structured data to refer to data in
tabular format, e.g. as in relational databases.
2 Related Work
Generalization is the main challenge of image classifiers, particularly when trained on small and/or
noisy training data sets. Therefore, numerous approaches have been proposed to improve the gener-
alization, such as adding a regularization term on the norm of weights [4], using dropout [18, 19]
or batch normalization [20]. Data augmentation is another effective approach that helps increase
the generalization of image classifiers through applying simple transformations such as rotating and
flipping input images, and adding the transformed images to the training set [10]. The full set of
transformations used in [10] includes shifting, zooming in/out, rotating, flipping, distorting, shading
and styling. Data augmentation with more complicated transformations is investigated in [9], which
evaluates three concrete preprocessing techniques, namely Zero Component Analysis, Mean Normal-
ization and Standardization, on the performance of different convolutional neural networks. While the
approaches in [9, 10] preprocess images according to a preselected chain of transformations, Paulin et
al. [13] suggest that the transformation set should be chosen in principled way instead of resorting to
(manual) trial-and-error, which is feasible only when the number of possible transformations is small.
Their proposed approach selects a set of transformations, possibly ordered, through a greedy search
strategy. Although this approach offers a more competitive set of transformations, it still has several
limitations: Firstly, the search process is inefficient because it involves retraining the classifier on
2
Throughout this paper, we use the words preprocessing and transformation interchangeably to indicate an
operation applied to data instances such as flipping an image.
3
The term chain of transformations is used to indicate an ordered set of transformations.
4
An implementation of the method can be found at https://github.com/IBM/automation-of-image-data-
preprocessing.
2
the whole augmented data set every time a candidate transformation in the search space is evaluated.
Secondly, the same preprocessing transformations are applied to all images, which has a number of
disadvantages as discussed in Section 1. Our approach uses a reinforcement learning framework to
address precisely those shortcomings.
Reinforcement learning [14] and specifically deep reinforcement learning [7] have recently drawn
substantial attention from the machine learning research community. However, there are only few
studies [1, 2, 8] applying deep reinforcement learning to visual recognition tasks, such as edge
detection, segmentation and object detection, or active object localization. However, none of these
works considers automating the preprocessing of images or learning transformation sets. To the best
of our knowledge, our work is the first study utilizing deep reinforcement learning to search for
effective chains of preprocessing transformations for individual images.
Agent
finish preprocessing Fully Transformed
Decision Maker
Data Source, Images
e.g. Images
apply next
action
values
action
Deep Neural
image Networks Environment
partially preprocessed
image and returned reward
In this section we define states and actions used in our reinforcement learning framework and
introduce an important property of preprocessing techniques that can be used.
3
output classes. In our study, we use a variant of Deep Q-Network (DQN) [16, 17] to model a network
policy. The network policy implemented in a DQN as shown in Figure 2(b) resembles the CNN in
Figure 2(a), except that the output layer is extended to form an action space. The DQN output layer
containing Q-values consists of two parts that are corresponding to two groups of actions. The first
part is a vector playing the same role as the logit vector in the CNN, i.e. it represents the unnormalized
likelihood of the k classes. We denote each slot in this part as a stop action SActioni . If the decision
maker decides on the next action as one of the stop actions SActioni , the preprocessing of an input
image will stop with a prediction of class i for the input image. The second part of the DQN output
layer is a vector representing a set of n transformation actions. If a decision is made for a next
action with one of the transformation actions T Actionj , the current image will be continued to
be preprocessed with the transformation j. The two sets of stop and transformation actions form
an action space which has totally k + n actions in case of discrete transformation. Note that it is
straightforward to also support continuous actions. For example, we can model a continuous rotation
by defining two slots in the second part of the DQN output: one for the Q-value of the rotation action
and one for the value of the rotation angle. Likewise, we can also adapt the first part of the DQN
output in order to apply the framework to a regression problem, e.g. when the inputs are time series
and the task is to forecast future values.
Last fully connected layer Last fully connected layer Last fully con nected layer
Class 1 |...| Class k SAction 1 |...| SAction k TAction 1 |...| TAction n
+
SAction 1 |...| SAction k TAction 1 |...| TAction n
(a) (b) (c)
The decision maker is where a reinforcement learning policy is deployed. It is responsible for
selecting the next action to be applied to the current state. The action and the state are then passed to
the environment component for further processing. In our study, we use the max policy to select an
appropriate action, given the DQN output layer. Furthermore, in order to enable the exploration of
reinforcement learning, we allow the decision maker to select alternative next actions randomly with
some probability , which is known as -greedy exploration strategy [14]. The probability starts at a
maximum of 1.0 and is annealed down to a minimum of 0.1 during training.
Using a DQN as in Figure 2(b) is a simple starting point; performance gains can be achieved using
other variants of DQN. In our work, we implemented a variant of DQN, namely Dueling DQN
4
(DDQN) [21] as shown in Figure 2(c). The idea behind DDQNs is that the Q-values will be a
combination of a value function and an advantage function. The value function specifies how good it
is to be in a given state while the advantage function indicates how much better selecting an action is
compared to the others. The benefit of separating the two functions is that the reinforcement learning
framework does not need to learn both value and advantage at the same time, and therefore a DDQN
is able to learn the state-value function efficiently. In order to update the deep neural network, we use
the Bellman equation Q(s, a) = r + γ × maxa0 (Q(s0 , a0 )), where Q(s, a) is the DQN output value
of action a given input state s, r and s0 are the reward and the next state returned by the environment
when the action a is applied to the state s, and γ is the discounted parameter. We refer the reader to
[21] for more details on DDQNs.
3.4 Environment
The environment is where actual transformations on images are performed. In addition, it is also
responsible for calculating return rewards during training. Upon receiving an image and an action
from the reinforcement learning agent, the environment behaves differently depending on the type
of the action. If it is a transformation action, the environment will apply that transformation to the
image only if the length of the chain of transformations applied particularly to that image is smaller
than a configurable parameter max_len. If the chain is longer than max_len, the image is recovered
to its original state and the reinforcement learning framework must seek another transformation chain
for the image. Note, this recovery mechanism is only used for training. At test time, we simply pick
the stop action with the largest Q-value for the prediction of the image. The recovery mechanism
also solves the memory problem described in Section 3.1.3. Regardless of the length of the current
transformation chain, the environment will return a zero reward to the reinforcement learning agent
in this case.
If the environment receives a stop action SActioni , it does not return a new image but a reward and
classifies the original image as class i. The strategy to compute return rewards during training plays
an important role for the convergence of the training. The environment uses the ground true label of
the original image to determine the reward. A simple strategy is to assign a reward of +1 if the label
is equal to i and −1 if otherwise. However, this simple strategy does not work well when the number
of classes k is larger than 2 since it causes unbalancing rewards. Hence, we suggest a more robust
scheme to compute return rewards, which is to assign a reward of k − 1 if the label is equal to i and
−1 if otherwise.
Policy Network
Trained
CNN CNN CNN from
RL
RL Framework
Environment
Agent
4 Methodology
Our methodology in setting up experiments is illustrated in Figure 3. In order to evaluate our auto-
preprocessing framework, we train three different models, namely NN, RL and CL, shown in Figures
3(a), 3(b) and 3(c), respectively. For all of them, we use the same neural network architecture. Figure
3(a) represents a CNN model with an arbitrary architecture. This same architecture will also be used
as the policy network in the reinforcement learning framework as shown in Figure 3(b). Since both
models use the same network architecture, any performance difference between the two models in
5
our experiments is caused by the reinforcement learning solution. In Figure 3(c), we also have a CNN
model with the same network architecture, but we do not train the network from scratch. Rather,
we continue to fine-tune the network obtained from the reinforcement learning framework. The N
original training images are preprocessed by the framework to produce N new training images which
are used as inputs of the fine-tuning process.
In our experiments, we implement three different CNN architectures, namely Arch1, Arch2 and
Arch3 as shown in Figures 4(a), 4(b) and 4(c), respectively. Hence, we have totally nine models for
comparison. The architectures are selected according to their complexity ranging from simple in
Figure 4(a) to complex in Figure 4(c). Note that the hyperparameters and architectures of the models
in Figure 4 are not designed “optimally” (e.g. using (hyper-)parameter tuning or auto-architecture
search), but chosen such that there is some level of complexity difference between them, the effect of
which is discussed in our evaluation below.
5x5(64) SAME
MP 2x2 SAME
3x3(96) SAME
MP 2x2 SAME
Dropout (0.3)
Dropout (0.3)
FC (512)
FC (256)
Input
Relu
Relu
Relu
Relu
LRN
LRN
BN
BN
(a)
3x3(128) SAME
3x3(256) SAME
3x3(512) SAME
3x3(64) SAME
MP 2x2 VALID
MP 2x2 VALID
Dropout (0.3)
Dropout (0.3)
Dropout (0.5)
FC (1024)
Input
Relu
Relu
Relu
Relu
Relu
BN
BN
BN
BN
BN
(b)
3x3(192) SAME
3x3(192) VALID
MP 2x2 VALID
3x3(48) SAME
3x3(48) VALID
MP 2x2 VALID
3x3(96) SAME
3x3(96) VALID
MP 2x2 VALID
Dropout (0.4)
Dropout (0.4)
Dropout (0.4)
Dropout (0.4)
Dropout (0.4)
FC (512)
FC (256)
Input
Relu
Relu
Relu
Relu
Relu
Relu
Relu
Relu
(c)
Figure 4: CNN architectures used in our experiments. LRN, BN, MP and FC stand for local response
normalization, batch normalization, max pooling and fully connected, respectively. All convolutional
layers use a stride of 1x1 and all max pooling layers use a stride of 2x2.
5 Experimental Results
In this section we will present our experiments to validate our solution to the problem of image
preprocessing automation. We start our experiments by comparing the accuracy performance of
image classifiers with and without preprocessing. Then, we evaluate the robustness of the classifiers
with respect to distorted images at test time. In addition, we also provide some insights on the
behaviour of the reinforcement learning framework.
We select for our study four data sets with different levels of complexity and noise. MNIST [12]
is a very clean 10-class data set with 70K 28x28x1 images divided into 55K/5K/10K for training,
validation and testing, respectively. SVHN [5] is a 10-class data set that is noisier than MNIST with
∼864K 32x32x3 images divided into ∼598K/6K/26K. CIFAR [15] is a 10-class data set that is yet
noisier than SVHN with 60K 32x32x3 images divided into 45K/5K/10K. Finally, DOGCAT [6] is the
noisiest of all four data sets; it has 2 classes with 25K 100x100x3 images divided into 20K/1K/4K.
With respect to the transformation set, we implemented two operations, namely image rotation
and flipping, for simplicity because they trivially satisfy the symmetry property requirement
without the necessity to implement a memory mechanism. Concretely, there are 11 transforma-
tions consisting of 3 flippings (horizontally, vertically, and both) and 8 rotations (with angles
−1, −2, −4, −8, +8, +4, +2, +1 degrees). The parameter max_len specifying the maximum length
of a transformation chain is set to 10 in our experiments. Other general parameters include
optimizer = Adam, learning_rate = 0.0001 and regularization_coef f icient = 0.001. For
each experiment, we perform 5 runs with 5 different initializations and report results as mean ± std.
6
5.2 Performance of Image Classifiers
Performance results in terms of accuracy are shown in Table 1. It can be seen that in most cases,
the bare convolutional neural network classifier (NN) produces the worst performance while the
reinforcement learning classifier (RL) yields higher accuracy. The accuracy performance is improved
further by the CNN classifier that is continued to learn (CL) from the trained RL classifier. We note
that the accuracy reported in Table 1 does not achieve state-of-the-art performance as the networks
that we used in our experiments were relatively simple and not adapted for the data sets; nevertheless
we believe it is worth mentioning that the RL framework results in improving the accuracy of the
baseline methods, nb: without increasing the size of the training set. Moreover, it is interesting
to observe that the accuracy difference between the NN classifier and the RL classifier increases
for noisier and more complex data sets. On the one hand, while for MNIST simple preprocessing
techniques such as rotation and flipping do not help improving the accuracy, they even decrease
accuracy as some digits change their meaning when being rotated or flipped. On the other hand,
on the much noisier DOGCAT data set, the RL classifier is much more successful in increasing the
accuracy of the baseline CNN.
In order to evaluate the robustness of image classifiers, we distort each test image with 50% probability
by applying a random chain of transformations. Robustness results in term of accuracy are shown
in Table 2. As we can see, the results are consistent in all cases in the sense that the NN classifier
is less robust (its accuracy decreases significantly on the test set with distortions), compared to the
performance on clean test data reported in Table 1. On the other hand, the RL classifier is much more
robust as its performance only slightly degrades on the distorted test data. Note that, only 50% images
of the test set were distorted, hence the robustness difference between the two classifiers would be
even larger if all test images had been distorted. The robustness of the CL classifier is not as high as
for the RL classifier, but still substantially higher than that of the NN classifier. This is a trade-off
between the accuracy and the robustness when choosing between the RL and the CL classifiers.
7
5.4 A Deeper Look into the Operation of the Framework
In order to visualize how the reinforcement learning framework preprocesses distorted images, we
run another experiment on MNIST with coarser distortions, in particular rotations are performed
with large angles ±90 degrees and flipping operations as in the previous experiment. For each
distorted image, we trace the operation of the framework and obtain the transformation chain that
the framework automatically generates for the image. An illustration for a few images is shown
in Figure 5. It is interesting that most images are either classified directly or transformed to their
original version before being classified. The exact recovery is possible thanks to the symmetry
property of transformation actions. Although the framework is able to recover distorted images, it is
not guaranteed to find the optimal chain of transformations in term of the shortest recovery path. In
addition, there is a small number of images which are confused by the framework as shown in the
bottom row of Figure 5. These are the main source of misclassification errors of the reinforcement
learning classifier.
Figure 5: Illustration of how the reinforcement learning framework preprocesses distorted images.
6 Discussion
The key contributions of this paper are three-fold. Firstly, we developed the idea of automated data
preprocessing using a reinforcement learning framework. While we demonstrated and evaluated it
for image data, it is applicable to other types of structured and unstructured data as well. Secondly,
the proposed system is iterative and therefore it provides explainable data preprocessing, i.e. one
can inspect which transformations were applied to each data instance during the preprocessing.
Thirdly, compared with traditional data augmentation approaches, our system follows a more efficient
approach to produce a clean training data set that can be used effectively for training highly accurate
and robust machine learning models.
Despite being of high practical value, the automation of data preprocessing has only drawn little
interest by the machine learning research community so far. Although we suggest in this paper a
novel approach for this problem, there is still a lot of room to extend this work. Firstly, the set of
transformations may contain more advanced preprocessing techniques such as rotations with learnable
angles, cropping/scaling with learnable ratios, image segmentation, object detection, etc. While it is
easy to integrate continuous actions with learnable parameters into the framework as described in
Section 3.1.2, complicated actions like image segmentation and object detection may require more
efforts. For example, one could select only a small number of segments or objects as the simplified
representation of an image for the next iteration after applying those actions. Secondly, one could
boost the performance of the reinforcement learning framework by replacing the current simple DQN
policy network. In addition, CNNs derived from the policy network (as described in Figure 3(c)) may
be a way to obtain better performance in terms of accuracy.
7 Conclusions
We have presented in this paper a novel approach to the problem of automating data preprocessing,
which is of high potential value for real-world data science and machine learning projects. The
approach is based on a reinforcement learning framework to find sequences of preprocessing transfor-
mations for each data instance individually. We showed in our experiments that even with simple
preprocessing actions such as rotation and flipping, image classifiers can benefit significantly with
respect to their accuracy and particularly their robustness. Thanks to the iterative nature of the
framework, our solution also provides a certain level of explanability, i.e. we can trace exactly how an
image is preprocessed via a chain of transformations. In summary, we believe that this is a promising
8
research approach to address the problem of automating data preprocessing. Future work should aim
at addressing continuous actions, transformations that require memorization, and demonstrating the
framework on other types of data such as text or time series.
References
[1] A. Gherega, M. Radulescu, M. Udrea, “A Q-Learning Approach to Decision Problems in Image
Processing”, International Conferences on Advances in Multimedia, Pages 60-66, 2012.
[2] B. Bhanu, J. Peng, “Adaptive Integrated Image Segmentation and Object Recognition”, IEEE
Transactions on Systems, Man and Cybernetics, Pages 427-441, 2000.
[3] B. Bilalli, A. Abello, T. Aluja-Banet, R. Wrembel, “Automated Data Pre-processing via Meta-
learning”, International Conference on Model and Data Engineering, Pages 194-208, 2016.
[4] B. Wang, D. Klabjan, “Regularization for Unsupervised Deep Neural Nets”, AAAI Conference
on Artificial Intelligence, 2017.
[5] CIFAR Data Set, https://www.cs.toronto.edu/ kriz/cifar.html, 2018.
[6] DOGCAT Data Set, https://www.kaggle.com/c/dogs-vs-cats-redux-kernels-edition/data, 2018.
[7] I. Goodfellow, Y. Bengio, A. Courville, “Deep Learning”, MIT Press, 2017.
[8] J.C. Caicedo, S. Lazebnik, “Active Object Localization with Deep Reinforcement Learning”,
IEEE International Conference on Computer Vision, Pages 2488-2496, 2015.
[9] K.K. Pal, K.S. Sudeep, “Preprocessing for Image Classification by Convolutional Neural
Networks”, IEEE International Conference on Recent Trends in Electronics, Information &
Communication Technology, 2016.
[10] L. Perez, J. Wang, “The Effectiveness of Data Augmentation in Image Classification using
Deep Learning”, arXiv:1712.04621, 2017.
[11] M.A. Munson, “A Study on the Importance of and Time Spent on Different Modeling Steps”,
ACM SIGKDD Explorations, Pages 65-71, 2011.
[12] MNIST Data Set, http://yann.lecun.com/exdb/mnist/, 2018.
[13] M. Paulin, J. Revaud, Z. Harchaoui, F. Perronnin, C. Schmid, “Transformation Pursuit for
Image Classification”, IEEE Conference on Computer Vision & Pattern Recognition, Pages
3646-3653, 2014.
[14] R.S. Sutton, A.G. Barto, “Reinforcement Learning: An Introduction”, MIT Press, 2017.
[15] SVHN Data Set, http://ufldl.stanford.edu/housenumbers/, 2018.
[16] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, M. Riedmiller,
“Playing Atari with Deep Reinforcement Learning”, NIPS Deep Learning Workshop, 2013.
[17] V. Mnih et al., “Human-level Control through Deep Reinforcement Learning”, Nature Journal,
Pages 529-533, 2015.
[18] Y. Gal, Z. Ghahramani, “A Theoretically Grounded Application of Dropout in Recurrent Neural
Networks”, Advances in Neural Information Processing Systems, Pages 1019-1027, 2016.
[19] Y. Kubo, G. Tucker, S. Wiesler, “Compacting Neural Network Classifiers via Dropout Training”,
arXiv:1611.06148, 2017.
[20] Y. Ma, D. Klabjan, “Convergence Analysis of Batch Normalization for Deep Neural Nets”,
arXiv:1705.08011, 2017.
[21] Z. Wang, T. Schaul, M. Hessel, H.V. Hasselt, M. Lanctot, N.D. Freitas, “Dueling Network Ar-
chitectures for Deep Reinforcement Learning”, International Conference on Machine Learning,
Pages 1995-2003, 2016.