Deep Cleaner-A Few Shot Image Dataset Cleaner Usin
Deep Cleaner-A Few Shot Image Dataset Cleaner Usin
ABSTRACT Images are increasingly used for AI-based diagnosis and analysis of many diseases like cervical
cancer, mouth cancer, glucose analysis from retina etc. In many cases, data collection is done by specialised
camera modules which capture images of affected areas. As with any other sources of data, this process
is also error-prone and may contain unwanted objects and regions that may require cleaning by removing
them. Outliers in these kinds of dataset may adversely affect the performance of machine learning models.
Manually cleaning would be a tedious task, especially when the data is collated from different sources.
Hence, cleaning the data before training the model is of utmost importance. In this paper, we propose
a Few-Shot learning based model pre-trained in supervised contrastive learning settings to automate the
process of data cleaning. Our model learns the dataset distribution and distinguishes the accurate data
points from noisy data points. We also show that scaling up the model can greatly improve the Few-Shot
performance. On the noisy MobileODT cervical data, which was collected from Kaggle, our model obtained
52% accuracy without cleaning data using an EfficientNet architecture for the classification task. Whereas
the same architecture with ROI cropping achieved an accuracy of 76.56% after cleaning through the proposed
Deep Cleaner approach that requires only 100 clean images. The proposed approach performs 2.74% better
than a denoising auto-encoder, which is considered a powerful anomaly detection technique.
INDEX TERMS Cervical cancer, cervix, data cleaning, deep learning, few shot learning, medical imaging,
supervised contrastive learning.
I. INTRODUCTION the images created by these methods will not be suitable for
Digital image capturing devices are widespread these days, use due to the problems inherent in the automated image
and the technology is cheap. These images captured by a capturing process.
standard camera or an enhanced device with some different Some of the common problems that may found in such
form factor can be used for analysing and predicting med- dataset are
ical conditions. There have been many attempts to create 1) They have many extraneous objects present in the
datasets for such cases. As deep learning based algorithms image
require large amounts of data to build accurate models, many 2) They might be a zoomed out image with Region Of
robot-based automatic image capturing techniques have also Interest (ROI) in corners or ROI is very small
been deployed. Images captured with an endoscopic camera 3) There might be irrelevant areas in the image due to the
are also commonly used to create large image databases. lens going out of focus in the field of view, which was
Example dataset for the different disease case includes cervi- set by mistake while capturing the image
cal cancer data, oral cavity images, retina images, skin lesions
data etc. As they are generated in large volumes, some of These kinds of uncleaned data or data with noise will create
problems when training Machine Learning (ML) models. The
The associate editor coordinating the review of this manuscript and accuracy of the model trained using these data will not be
approving it for publication was Vishal Srivastava. good enough for a medical imaging application. Cleaning of
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
VOLUME 11, 2023 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ 18727
M. B. BIJOY et al.: Deep Cleaner—A Few Shot Image Dataset Cleaner Using Supervised Contrastive Learning
the dataset or creating the clean data with a clearly defined type will be an essential option in fixing the treatment and is
ROI marks the first step of a pipeline for building any AI specific to the geometry of the cervix.
model. So we are trying to build a Deep Learning based Cervigrams can be utilised for examining the cervix and to
automated approach for cleaning the data to enhance the ROI check for cervical cancer. Cervigrams is found to have high
present in the image. intra-observer and inter-observer variability among medical
In the ML dataset, noise can be basically of two types i.e. specialists. With the rising use of AI and DL, one can easily
class noise and attribute noise or label noise [1]. Wrongly classify the cervix types if enough training samples are given.
labelled data contribute to the label noise. This can happen For an extensive primary screening program, the data col-
by mistake or due to the wrong interpretation of the data by lected or the cervigrams can suffer all types of noises which
the curator. The other type of attribute noise or feature noise are mentioned in the previous paragraph. In this work, we are
is the corruption occurring in the observed data and can be mainly concentrating on a different type of noise (third type),
attributed mainly due to the data creation process which will the extra objects and then identifying the ROI for training a
be image capturing process in most of the cases. Both types of DL model. The cervical cancer datasets have been collected
noise can degrade the performance of your ML model [2], [3] from the Intel Kaggle competition [10]. The available images
and in many cases, label noise are found as more problematic, were of high resolution, on average 2400 × 3000 pixels. Also,
and many studies [4], [5], [6] have been done for addressing other artefacts are present in this image set, such as gloves,
this. Since feature noise is less explored among them, we are finger, human face and some of the images are not even of the
trying to mitigate the impact introduced by these noises in the cervix. There were potential concerns with this data. These
training of Deep Learning (DL) models. datasets were primarily used for the type detection in cervix.
Noise is intruded in to an image data by many ways. To enhance the performance of the cervix type detection
Broadly it can be introduced by three different sources. and cervical cancer detection models, we have to clean the
data collected before using it. In this work, we have applied
1) Introduced by environment in which the image is cap- a Few-Shot learning model with state of the art supervised
tured contrastive based pretraining to clean the datasets. Our model
2) Introduced by the device itself learns the dataset distribution and distinguishes accurate data
3) Introduced by the operator of the device. points from noisy data points. We also show that scaling up
the model can significantly improve the Few-Shot perfor-
For example, if the image is captured in low light or bad mance. With noisy MobileODT cervical data, it is shown that
ambient lighting with fluorescent light or coloured light, classification accuracy is improved from 52% to 76.56% after
it can affect the image quality. Nowadays, digital image cap- cleaning through the proposed Deep Cleaner approach. Our
turing devices are coming with a wide variety of settings. The proposed method requires only 100 clean images for training
quality of the captured images will be bad, if the settings like the model.
sensor ISO, shutter speed, aperture etc are not properly set [7]. Rest of the paper has been organised as follows. Section II
Even with the best available device settings and environment, contains some of the related works that have been reported
the person capturing the image can take pictures in wide in this domain. Section III describes the methodology used
angles where the ROI is not appearing centred or may contain in the work. Section IV has the results and performance
extraneous objects. analysis in detail. Section V contains the other analysis and
Images taken containing the cervical region of a woman are discussions and section VI concludes the work.
known as cervigrams [8]. We have selected cervigram images
as the primary dataset for our experiment, though we used II. BACKGROUND
other dataset also for benchmarking the results. Cervigrams For our experiment on outliers detection, the primary data
are captured through colposcopes or by any other digital set is the cervical image data set. In the pre-processing stage,
image capturing devices. cleaning up the natural images and isolating the ROI present
Cervical cancer is considered as a severe public health in the data is done. The cervical images captured by col-
problem, predominantly affecting women of lower socioe- poscopic cameras are good enough to check the presence
conomic status [9]. Cervical cancer affects women aged of cervical cancer. Nevertheless, these cervigrams have high
between 15-44 years, which is considered as the most produc- variability issues even among the medical experts with high
tive age group for any individual. It is preventable by early skills. This section highlights recent research on techniques
detection through various screening methods. Cervix is the applied in identifying noises in the image data, Outlier detec-
lower part of the uterus, which has a portion protruding into tion, Object detection and Segmentation used for finding ROI
the vagina called portio vaginalis. Cervical cancer is caused in cleaning the image.
due to the abnormal growth of cells within the cervix, the Starting from capturing the image to transmission through
lower portion of the uterus that serves as a connection to the a different medium, underlining different types of noises that
vagina. The primary culprit in most cervical cancer is the can creep into digital images. Kaur et al. [11] brought out dif-
Human Papilloma Virus (also known as the HPV). Coming ferent noise removal techniques like linear and non-linear fil-
to the actual treatment of cervical cancer, knowing the cervix ters and image processing techniques for denoising the image
before going ahead with other processing. Many attempts algorithm is expected to improve by using their method.
have been made to find the noise in medical images. Karimi They also tried a Deep multi-view clustering approach on
et al. [12] in their study explains the various types of noise different datasets and found it to be effective. Parik [23]
in medical image datasets like annotation noise, masking proposed a technique where the whole data set is divided
noise and inter-observer variability. They demonstrated the into subsets, clusters are formed on the similarity matrix,
effects of noisy labels on three different data sets. Jiang and then the results are combined to get the actual clusters.
et al. [13] also experimented with noisy labels, using both A Deep Adaptive Clustering (DAC) that recasts the clustering
natural and synthetic noise while comparing different param- problem into a binary pairwise classification was given by
eters like Noise level, noise types and their effect on deep Chang et al. [24]. They used these binary pairs framework to
learning, benchmarking their result with CIFAR [14], and judge whether pairs of images belong to the same clusters.
Webvison [15] dataset. They calculated the similarity as the cosine distance between
Deep Autoencoding Gaussian Mixture Model aka label features generated by a deep convolutional network.
DAGMM is another method proposed for unsupervised Object detection algorithms can identify the ROI and use
anomaly detection by Zong et al. [16]. A deep autoencoder this information to crop the image containing less noise,
is used for generation of a low-dimensional representation which can be the first step in creating a new cleaned dataset
of data and reconstruction error is also calculated for each for training the next pipeline stage. A group of deep learning
input data point. Gaussian Mixture Model (GMM) is fed Region Proposal Networks (RPN) was proposed and devel-
with the output of the previous stage. The traditional two- oped to accurately segment an image and detect objects from
stage approach, i.e. training and the standard Expectation- it semantically. The Region CNN algorithm, i.e R-CNN,
Maximisation (EM) algorithm, is not used in this. DAGMM picks rich feature hierarchies to get accurate detection.
tries to optimise deep autoencoder parameters together. The To speed up the detection process, Girshick et al. [25] pro-
joint optimisation also leads to a balanced autoencoding posed a technique where the section of the image containing
reconstruction, density estimation of latent representation, Region of Interest to be captured from the input image. The
and regularisation. This makes it possible for the autoencoder faster R-CNN algorithm founded by Ren et al. [26] reduced
to escape local optima and which again reduces reconstruc- the computational time by merging the region proposal and
tion errors. CNN modules. To realise the real-time object detection, Red-
Zhang et al. [17] proposed an image data cleaning frame- mon et al. [27] proposed the You Only Look Once (YOLO)
work called ImageDC using Deep Neural Networks, for bet- learning algorithm. An instance-based method extending the
tering the quality of the image datasets. Based on minority faster R-CNN method was proposed by Redmon et al. [27],
classes, they removed the images of the seldom used classes named mask R-CNN, achieved by merging the object predic-
and images with a low rate of recognition from the noisy tion and bounding box detection techniques.
data, in turn increasing the recognition rate. The framework Duan et al. [28], built a framework based on CornerNet
Probabilistic End to end Noise Correction In Labels, i.e. [29], instead of a pair of key points, their approach - Center-
PENCIL [18] devised another approach, the noisy labels Net detects each object as a triplet. Their modules Cascade
used for initialising label distributions and then dynamically corner pooling and Centre pooling ensured enriching the
correcting the labels by updating the network loss function information at the bottom right corners and top left. Center-
and label distributions. The framework did not use a fixed Net achieved an AP of 47.0% on the MS-COCO dataset [30],
absolute value for the label but modelled the label as distribu- better than all existing one-stage detectors and quite compa-
tion among all possible values of labels [19]. The label distri- rable to the top-ranked two-stage detectors. Tan and Le [31]
butions were maintained, updated later for network parameter proposed a weighted Bi-directional Feature Pyramid Network
learning (wherever the distributions acted as labels) and used (BiFPN) making fast feature fusion possible. The compound
for label learning where distributions are updated to correct scaling method is used for scaling the resolution, depth, and
noise. Kertész et al. [20] developed a new strategy called con- width of box prediction network. A new family of detectors
fident learning to find the noisy labels. Their model is based named EfficientDet [32] achieved the best accuracy with
on pruning noisy data using probabilistic threshold count to fewer parameters.
estimate noise. Ranking examples was another way to train Autoencoders [33], [34] are an excellent method to be used
with confidence. They achieved about 7% improvement over for anomaly detection. Suppose the autoencoder is trained
competing models with CIFAR data [14]. Csaba et al. [21] with some image dataset. In that case, it will give a low
developed an automated pipeline for cleaning the ImageNet reconstruction loss for any image from the training data and
data using heuristics built on model consensus and confident reconstruct the image as the original. For any image that the
learning, to remove incorrectly labelled data and ambiguous model has not seen, it can not perform the reconstruction
images from the data. effectively as the latent attributes are not adapted to this
Using multiple pre-trained networks for generating mul- particular image. Because of this, the outlier gives a very high
tiple feature extraction was suggested by Guérin and reconstruction loss and can be identified as an anomaly with a
Boots [22]. Performance of any multi-view clustering proper threshold identified. Denoising Autoencoders (DAE)
III. METHODOLOGY
One of the major hurdles in training a model is dealing with
outliers that pulls the model away from the ideal optimum.
To mitigate effects of such noises and outliers in the data
used, we explore an idea of automating the dataset cleaning
process, which can be a pre-processing step to yield better
results. In this section we discuss the workflow of the pro-
posed pre-processing step.
Inspired from recent development in DL techniques to
form better feature spaces that are able to capture essential
data features, we propose a supervised contrastive learning
mechanism to clean the datasets. The approach trains a model
on a large dataset of images which represents a vast variety
of image distributions. Due to variability of distributions in
the dataset, the supervised contrastive model is forced to
learn compact representations of images in a way that it can
distinguish among different classes. The trained supervised
contrastive model is then fine tuned with very few images
to identify the desired outliers. Fig.1 shows the entire data
cleaning pipeline.
The architecture is trained in such a way that the similar- compared to the uncleaned dataset. The details of improve-
ity of both the extracted feature vectors is high for I1 and ment will be discussed in the results section with some exam-
I2 . But image pairs that share no similarity between each ple use cases.
other are low. Intuitively, this learning attracts embeddings of
two augmentations of same image (positive pair) closer and IV. RESULTS
pushes the embeddings of two augmentations from different The proposed method takes a few clean images to learn
images (negative pair) far. It uses a special loss function called robust representation of target distribution using the pre-
SupCon [37] which encourages normalised embeddings from trained backbone trained in supervised contrastive setting.
same class to be put closer, while embeddings from different Intuitively, wide distribution of data is considered as noise
classes are pushed apart while training Few-Shot cleaners. Wider the noise, more
The prepared dataset is used for training the model in generalised the representations are going to be and the model
supervised contrastive setting with loss function mentioned can easily capture those differences between distributions.
in equations (1) and (2). Images from different distributions This section focuses on the performance analysis of the deep
are considered as different classes, which forces the model to learning model with and without using the proposed prepro-
learn the differences between the distributions. Clustering of cessing step. Techniques like Anomaly detection using Auto
the latent space generated using a trained model is plotted and Encoders and Denoising Auto Encoder have been trained and
shown in the results section. compared with our proposed work.
2N
X sup A. DESCRIPTION OF DATASET
Lsup = Li (1)
The training dataset for supervised contrastive learning com-
i=1
2N
prises of ImageNet [38] with all 1000 classes, 5121 cataract
sup −1 X exp(zi .zj /τ ) surgery images [39], 366 blood cell images [40], 9543 endo-
Li = 1i̸=j .1ŷi =ŷj . log P2N
k=1 1i̸=k . exp(zi .zk /τ )
2Ny − 1 scopic sinus surgery images [41], 5000 dermatoscopic images
j=1
of pigmented skin lesions [42] and 1480 cervical images [10].
(2)
The dataset is meant to closely represent a realistic set of
medical images. To the clean dataset we introduced variations
B. FEW-SHOT LEARNING
which included variation in colours, brightness and contrast.
The trained supervised contrastive model is now fine tuned In addition to this, further outliers were added like mobile
with very few images to capture the desired target distri- screens, gloves and other images that are found in some
bution. In our case, we made a new fine tuning dataset by medical datasets.
sampling few images from both target (cervigrams) and other
than target distribution (includes images from ImageNet, B. RESULTS OF SUPERVISED CONTRASTIVE TRAINING
Cataract Surgery images, Blood Cell images, Endoscopic For the feature extraction model in contrastive based training,
Sinus Surgery images, Dermatoscopic images of Pigmented we chose EfficientNet B2 [31] as our base encoder. Projec-
Skin Lesions). Except for the last few layers, the rest of the tion head consists of a small neural network, Multi Layer
model is frozen in the fine tuning stage. Due to the fact Perceptron (MLP) with one hidden layer, is used to map the
that the model is pretrained on a large dataset in supervised representations from the base encoder to single dimensional
contrastive setting, it doesn’t require too many images to latent space where contrastive loss is applied. We used an
capture the differences between the outliers and the target input image of size 512 × 512 × 3 as an input to the above
distribution. discussed framework. The parameters of the model are given
in Table 3.
C. CLEANING THE DATASET
Once the model has gone through the fine tuning phase with TABLE 1. Training parameters for supervised contrastive model.
Few-Shot learning, we can apply this model to the dataset
to find the outliers. It is already understood that the model’s
accuracy greatly depends on the training data’s accuracy.
So we will clean the original dataset by removing the data
identified as noise in the previous section of the pipeline.
Now a new clean dataset is available for training the main
task model.
D. MAIN TASK
The main task can be anything, and once the model is ade- According to the recent works [36], contrastive learning
quately designed, it can be trained on a clean dataset. Now, models benefit from large batch sizes and much stronger
this model can learn data representation more accurately for augmentation (augmentations used are shown in Fig 2).
the task, and it should improve the model’s overall accuracy Gradient accumulation is used to update the model with an
accumulated gradient over multiple steps. 1024 mini batches learning task. The above mentioned method uses self super-
are used in a single batch using gradient accumulation. Higher vision for cropping out the ROI and training the classi-
augmentation intensity improves the performance of the con- fier. We used pretrained weights from the above work for
trastive learners. ROI cutting and trained a classifier. EfficientNet B4 archi-
All images which were predicted as noise from cervi- tecture is used for training the model. The classifier was
cal dataset and subsets from remaining datasets are used in trained with learning rate decay and early stopping along with
clustering the latent space. We used the trained contrastive some data augmentation. The proposed approach achieved
model for generating embeddings and T-SNE for dimension- 77% accuracy compared to 62% without any cleaning as
ality reduction for visualising its latent space. Fig 3 shows shown in Fig 6.
the T-SNE projections of the representations learned using
supervised contrastive setting. E. COMPARISON WITH ANOMALY DETECTION METHODS
The works that closely relate to dataset cleaning are anomaly
C. RESULTS OF FEW SHOT CLEANERS detection or outlier detection. Identifying samples that are
Proposed model for cleaning was initially pre trained in not fruitful in the learning process and could degrade the
contrastive setting and later a classifier was trained on top of model performance should be removed. Often, these samples
frozen representations. As proposed Few-Shot learning can deviate from the ideal training data distribution statistics
make use of very small amount of clean images to be able as an anomaly. We compare the proposed approach with
to learn the differences between the target distribution and commonly used anomaly detection techniques using Auto
outliers. Encoders (AE) and Denoising Autoencoders (DAE).
We used data from ImageNet, Cataract Surgery images, For both AE and DAE, EfficientNet B4 was chosen as
Dermatoscopic images of Pigmented Skin Lesions, Blood encoder, the decoder is built up using few convolutions and
Cell images, Endoscopic Sinus Surgery images as noise and upsampling layers. A combination of Mean Squared Error
cervigrams as target distribution to make the Few-Shot learn- and Binary Cross Entropy is used as the loss function for
ers learn the distributions. Some images which were predicted above mentioned architecture. The parameters of AutoEn-
as noise with Few-Shot learning (using 100 cervigrams) were coder are shown in Table 3.
shown in Fig 4. On cleaning the dataset with different techniques we find
that the proposed approach performs 2.74% better than a
D. POST CLEANING TASK COMPARISONS DAE as shown in Fig 6. It also performs 14% better than no
The cleaned dataset was then trained using Efficient- cleaning, which is a significant gain in accuracy.
CenterDet1 for cervix type prediction as our main deep
V. DISCUSSION
1 https://github.com/BhanuPrakashPebbeti/FewShot-DataCleaners-using- The results discussed above provide conclusive evidence that
Supervised-Contrastive-Learning the proposed approach can cluster out of distribution data
FIGURE 11. Images predicted as noise using anomaly detection FIGURE 12. Image reconstructions and their anomaly scores by
using AE. AutoEncoder.
VI. CONCLUSION [13] L. Jiang, D. Huang, M. Liu, and W. Yang, ‘‘Beyond synthetic noise: Deep
As the requirement for huge amounts of data for training learning on controlled noisy labels,’’ 2019, arXiv:1911.09781.
[14] A. Krizhevsky, V. Nair, and G. Hinton. (2010). CIFAR-10 (Canadian
bigger and bigger models arises, it is becoming difficult to Institute for Advanced Research). [Online]. Available: http://www.cs.
clean datasets at this scale. Noisy data points pose a threat to toronto.edu/kriz/cifar.html
the overall performance and integrity of these huge models. [15] W. Li, L. Wang, W. Li, E. Agustsson, and L. Van Gool, ‘‘WebVi-
sion database: Visual learning and understanding from web data,’’ 2017,
As we move to an AI driven ecosystem of systems, it is vital arXiv:1708.02862.
to deal with outliers and noise in datasets. This study offers [16] B. Zong, Q. Song, M. R. Min, W. Cheng, C. Lumezanu, D. Cho, and
an automated solution to this problem. In cases like medi- H. Chen, ‘‘Deep autoencoding Gaussian mixture model for unsupervised
cal images where the dataset distribution is more bounded, anomaly detection,’’ in Proc. Int. Conf. Learn. Represent., 2018, pp. 1–19.
[Online]. Available: https://openreview.net/pdf?id=BJJLHbb0-
a statistic driven approach for cleaning could be preferred. [17] Y. Zhang, Z. Jin, F. Liu, W. Zhu, W. Mu, and W. Wang, ‘‘ImageDC: Image
This approach can also be extended to other areas of computer data cleaning framework based on deep learning,’’ in Proc. IEEE Int. Conf.
vision where noisy data poses a threat. From the above results Artif. Intell. Inf. Syst. (ICAIIS), Mar. 2020, pp. 748–752.
[18] K. Yi and J. Wu, ‘‘Probabilistic end-to-end noise correction for learning
and visualisations, it is proved that Deep Learning based with noisy labels,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recog-
representation learning methods can automate the cleaning nit. (CVPR), Jun. 2019, pp. 7017–7025.
procedure. Future work will include training model with more [19] B.-B. Gao, C. Xing, C.-W. Xie, J. Wu, and X. Geng, ‘‘Deep label distribu-
tion learning with label ambiguity,’’ IEEE Trans. Image Process., vol. 26,
variety of dataset to improve the model.
no. 6, pp. 2825–2838, Mar. 2017, doi: 10.1109/TIP.2017.2689998.
[20] C. Northcutt, L. Jiang, and I. Chuang, ‘‘Confident learning: Estimating
ACKNOWLEDGMENT uncertainty in dataset labels,’’ J. Artif. Intell. Res., vol. 70, pp. 1373–1411,
The authors would like to thank the Department of Computer Apr. 2021.
[21] C. Kertész, ‘‘Automated cleanup of the ImageNet dataset by
Science Engineering, NIT Calicut, for giving their support for model consensus, explainability and confident learning,’’ 2021,
the work completion. They would also like to thank the Centre arXiv:2103.16324.
for Computational Modeling and Simulation (CCMS) and the [22] J. Guérin and B. Boots, ‘‘Improving image clustering with multiple pre-
trained CNN feature extractors,’’ 2018, arXiv:1807.07760.
Central Computer Centre (CCC), NIT Calicut, for providing [23] D. Parikh, ‘‘Similarity-based clustering for enhancing image classification
the NVIDIA DGX station facility for training their models. architectures,’’ 2020, arXiv:2011.04728.
Finally, they thank AI Club, NIT Calicut, for extending their [24] J. Chang, L. Wang, G. Meng, S. Xiang, and C. Pan, ‘‘Deep adaptive image
help in implementing the proposed model. clustering,’’ in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017,
pp. 5879–5887.
[25] R. Girshick, J. Donahue, T. Darrell, and J. Malik, ‘‘Rich feature hierarchies
REFERENCES for accurate object detection and semantic segmentation,’’ in Proc. IEEE
Conf. Comput. Vis. Pattern Recognit., Jun. 2014, pp. 580–587.
[1] X. Zhu and X. Wu, ‘‘Class noise vs. attribute noise: A quantitative study,’’
Artif. Intell. Rev., vol. 22, no. 3, pp. 177–210, Nov. 2004. [26] S. Ren, K. He, R. Girshick, and J. Sun, ‘‘Faster R-CNN: Towards real-time
[2] B. Frenay and M. Verleysen, ‘‘Classification in the presence of label object detection with region proposal networks,’’ in Proc. Adv. Neural Inf.
noise: A survey,’’ IEEE Trans. Neural Netw. Learn. Syst., vol. 25, no. 5, Process. Syst., 2015, pp. 91–99.
pp. 845–869, May 2013. [27] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, ‘‘You only look once:
[3] B. Frénay and A. Kaban, ‘‘A comprehensive introduction to label noise,’’ Unified, real-time object detection,’’ in Proc. IEEE Conf. Comput. Vis.
in Proc. Eur. Symp. Artif. Neural Netw., Comput. Intell. Mach. Learn. Pattern Recognit. (CVPR), Jun. 2016, pp. 779–788.
(ESANN), 2014, pp. 667–676. [28] K. Duan, S. Bai, L. Xie, H. Qi, Q. Huang, and Q. Tian, ‘‘CenterNet:
[4] R. Hataya and H. Nakayama. (2019). Investigating CNNs’ Learning Rep- Keypoint triplets for object detection,’’ in Proc. IEEE/CVF Int. Conf.
resentation Under Label Noise. [Online]. Available: https://openreview. Comput. Vis. (ICCV), Oct. 2019, pp. 6569–6578.
net/forum?id=H1xmqiAqFm [29] H. Law and J. Deng, ‘‘CornerNet: Detecting objects as paired keypoints,’’
[5] M. Pechenizkiy, A. Tsymbal, S. Puuronen, and O. Pechenizkiy, ‘‘Class Int. J. Comput. Vis., vol. 128, pp. 642–656, Mar. 2020.
noise and supervised learning in medical domains: The effect of fea- [30] T.-Y. Lin, M. Maire, S. Belongie, L. Bourdev, R. Girshick, J. Hays,
ture extraction,’’ in Proc. 19th IEEE Symp. Computer-Based Med. Syst. P. Perona, D. Ramanan, C. Lawrence Zitnick, and P. Dollár, ‘‘Microsoft
(CBMS), 2006, pp. 708–713. COCO: Common objects in context,’’ 2014, arXiv:1405.0312.
[6] D. F. Nettleton, A. Orriols-Puig, and A. Fornells, ‘‘A study of the effect of [31] M. Tan and Q. V. Le, ‘‘EfficientNet: Rethinking model scaling for con-
different types of noise on the precision of supervised learning techniques,’’ volutional neural networks,’’ in Proc. Int. Conf. Mach. Learn., 2019,
Artif. Intell. Rev., vol. 33, no. 4, pp. 275–306, 2010. pp. 6105–6114.
[7] B. Brummer and C. De Vleeschouwer, ‘‘Natural image noise dataset,’’ [32] M. Tan, R. Pang, and Q. V. Le, ‘‘EfficientDet: Scalable and efficient
in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops object detection,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.
(CVPRW), Jun. 2019, pp. 1777–1784, doi: 10.1109/CVPRW.2019.00228. (CVPR), Jun. 2020, pp. 10781–10790.
[8] J. Jeronimo, R. Long, L. Neve, D. Ferris, K. Noller, M. Spitzer, S. Mitra, [33] P. Baldi, ‘‘Autoencoders, unsupervised learning, and deep architectures,’’
J. Guo, B. Nutter, P. Castle, R. Herrero, A. C. Rodriguez, and M. Schiff- in Proc. ICML Workshop Unsupervised Transf. Learn., vol. 27,
man, ‘‘Preparing digitized cervigrams for colposcopy research and educa- Jul. 2012, pp. 37–49. [Online]. Available: https://proceedings.mlr.
tion: Determination of optimal resolution and compression parameters,’’ press/v27/baldi12a.html
J. Lower Genital Tract Disease, vol. 10, no. 1, pp. 39–44, 2006. [34] D. Bank, N. Koenigstein, and R. Giryes, ‘‘Autoencoders,’’ 2020,
[9] WHO. (2020). Cervical Cancer. [Online]. Available: https://www.who. arXiv:2003.05991.
int/reproductivehealth/topics/cancers/en/ [35] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, ‘‘Extracting and
[10] MobileODT. (2017). Cervical dataset. Intel and MobileODT Cer- composing robust features with denoising autoencoders,’’ in Proc. 25th Int.
vical Cancer Screening. [Online]. Available: https://www.kaggle.com/ Conf. Mach. Learn. (ICML), 2008, pp. 1096–1103.
c/intel-mobileodt-cervical-cancer-screening/data [36] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, ‘‘A simple framework
[11] S. Kaur, ‘‘Noise types and various removal techniques,’’ Int. J. Adv. Res. for contrastive learning of visual representations,’’ in Proc. Int. Conf. Mach.
Electron. Commun. Eng., vol. 4, no. 2, pp. 226–230, 2015. Learn., 2020, pp. 1597–1607.
[12] D. Karimi, H. Dou, S. K. Warfield, and A. Gholipour, ‘‘Deep learning [37] P. Khosla, P. Teterwak, C. Wang, A. Sarna, Y. Tian, P. Isola, A. Maschinot,
with noisy labels: Exploring techniques and remedies in medical image C. Liu, and D. Krishnan, ‘‘Supervised contrastive learning,’’ 2020,
analysis,’’ 2019, arXiv:1912.02911. arXiv:2004.11362.
[38] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, S. ABDUL FATHAAH received the B.Tech.
A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, ‘‘Ima- degree in electronics and communication engi-
geNet large scale visual recognition challenge,’’ Int. J. Comput. Vis., neering from the National Institute of Technol-
vol. 115, no. 3, pp. 211–252, Dec. 2015. ogy at Calicut. He is currently working with
[39] K. Schoeffmann, M. Taschwer, S. Sarny, B. Münzer, M. J. Primus, the e-commerce industry as a Product Engineer,
and D. Putzgruber, Cataract-101: Video Dataset 101 Cataract Surg- enabling sales through technology across India.
eries. New York, NY, USA: Association for Computing Machinery, 2018, He is also an experienced trainer and mentor in
pp. 421–425, doi: 10.1145/3204949.3208137.
deep learning who has taught over 200 students.
[40] P. Mooney. (2018). Blood Cell Images (Kaggle Dataset). [Online]. Avail-
His research interests include image processing,
able: https://www.kaggle.com/paultimothymooney/blood-cells
deep learning, and synthetic data generation.
[41] S. Lin, F. Qin, R. A. Bly, K. S. Moe, and B. Hannaford. (2020). UW
Sinus Surgery Cadaver/Live Dataset. [Online]. Available: https://digital.
lib.washington.edu/ researchworks/handle/1773/45396
[42] P. Tschandl, C. Rosendahl, and H. Kittler, ‘‘The HAM10000 dataset, a
large collection of multi-source dermatoscopic images of common pig-
mented skin lesions,’’ Sci. Data, vol. 5, no. 1, pp. 1–9, Aug. 2018.
[43] Z. Qin, Q. Zeng, Y. Zong, and F. Xu, ‘‘Image inpainting based on deep
learning: A review,’’ Displays, vol. 69, Sep. 2021, Art. no. 102028.
[Online]. Available: https://www.sciencedirect.com/science/article/pii/
S0141938221000391
[44] D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, and A. A. Efros, ‘‘Context AKASH RAUT received the B.Tech. degree in
encoders: Feature learning by inpainting,’’ in Proc. IEEE Conf. Comput. electrical and electronics engineering from the
Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 2536–2544. National Institute of Technology at Calicut. He is
currently working with global software product
company as a Senior Software Developer applying
various technologies, including AI to build prod-
ucts that solve problems on a global scale. His
M. B. BIJOY (Member, IEEE) received the research interests include deep learning, deep rein-
M.Tech. degree in computer science and engineer- forcement learning, natural language processing,
ing. He is a Research Scholar with the National and robotics.
Institute of Technology, Calicut, Kerala, India and
Joint Director with the Centre for Development
of Advanced Computing (C-DAC), Bengaluru,
India. He is involved in designing and developing
cloud computing infrastructures and security tools.
He has been instrumental in setting up and operat-
ing a scientific cloud service for Indian researchers
and scientists. He is the author of many research studies published at national
and international journals as well as conference proceedings. His research
interests include artificial intelligence, cloud computing, HPC technologies, P. N. POURNAMI (Senior Member, IEEE)
and quantum computing. received the Ph.D. degree from the National
Institute of Technology at Calicut, Kerala, India,
in 2018. She is currently an Assistant Professor
with the Department of Computer Science and
Engineering, National Institute of Technology at
BHANU PRAKASH PEBBETI is currently pursu- Calicut. She has published several research articles
ing the B.Tech. degree in electronics and commu- in international and national journals. Her research
nication engineering with the National Institute of interests include computer vision and deep
Technology at Calicut. He is also part of the AI learning.
Club NITC, working on various projects. As an
Intern at financial services company, he worked on
a project to solve a critical use case. His research
interests include deep learning, computer vision,
reinforcement learning, natural language process-
ing, and embedded AI.