Self-Supervised Learning For Medical Imaging
Self-Supervised Learning For Medical Imaging
https://doi.org/10.22214/ijraset.2022.44324
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue VI June 2022- Available at www.ijraset.com
Abstract: The role of deep learning is growing quite effectively. As the created models are producing promising accuracy and the
early detection and mitigation of diseases is becoming quite easy. As a result, the deep learning algorithm is receiving a variety of
interest nowadays for fixing several problems within side the area of scientific imaging. In ophthalmology, one instance is
detecting disorder or anomalies by using photos and classifying them into diverse disorder types or severity levels. This sort of
project has been finished the use of quite a few machine learning algorithms which have been optimized, in addition to
theoretical and empirical approaches. Diabetic Retinopathy is such a disease where in early detection plays a severe role as it
could result in imaginative and prescient loss.
Diabetic Retinopathy disease recognition has been one of the active and challenging research areas in the field of image
processing. Deep learning technique as well as hinders to work with disease recognition and find the accuracy of the model. To
create a model in a supervised manner, we need a huge amount of dataset which is very costly. So, as to overcome this problem,
we have implemented a self - supervised model for the detection of diabetic retinopathy which works with a very limited dataset.
This model is implemented using one of the pretext/proxy task image rotations developed on Dense NET architecture. The model
is fine-tuned with the various quantities of subsets of the original dataset and compared internally.
Keywords: Self-supervised model, pretext/proxy task, Dense NET architecture, fine-tuning
I. INTRODUCTION
The ophthalmology field has benefited from recent advances in deep learning, particularly in the case of deep convolutional neural
networks (CNNs) when applied to large data sets, such as two-dimensional (2D) fundus photography, a low-key imaging technology
that captures the back of the eye. Recently, self-supervised learning has achieved fantastic fulfillment within side the field of
Computer Vision. Particularly, self-supervised learning can successfully serve the sector of medical imaging in which a big quantity
of categorised facts is normally limited. The input data to the model is the diabetic retinopathy fundus images.
There are different types of proxy tasks. In this model we are using rotations as a proxy task. The model can predict the Diabetic
Retinopathy[1] by self-supervised model and by using one pretext/proxy task image rotations developed on Dense NET
architecture[2] and detect the different stages of it. To check various possibilities, we finetuned the model using different sets of
data sizes i.e., 5%, 10%, 25%, 50% and 100% of the original dataset. With batch size of 32 with 5 repetitions and prediction
architecture of simple multiclass. The model also finds the accuracy of the model and finds the value by using “Kappa-kaggle
score”[3].
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2532
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue VI June 2022- Available at www.ijraset.com
Image rotation prediction[10]-- It learns image features by training model to recognize the 2d rotation that is applied to the image
that it gets as input.
Image Reconstruction[11]--A visual feature learning algorithm driven by context-based pixel prediction.
Object saliency[12]-- It learns background-agnostic representations by performing the salient object detection in a selfsupervised
manner.
Contrastive Prediction[13]-- The model learns representations by predicting the future in latent space by using powerful
autoregressive models. It uses a probabilistic contrastive loss which induces the latent space to capture information that is
maximally useful to predict future samples
Although all the techniques mentioned derives state of art results and getting better with time and research. This report deals with
working of only one of the above techniques i.e., Image rotation prediction.
Then, as follows, fine-tune the model to determine the severity of the DR using some tagged data.
1) NO DR
2) MILD DR
3) MODERATE DR
4) SEVERE DR
5) PROLIFERATE DR
The model also finds the accuracy of the model and finds the value by using “Kappa-kaggle score”.
Dataset Preparation: The data is obtained from Kaggle mentioned in Diabetic Retinopathy 2019 Kaggle challenge. X It
contains images of retinal fundus resized into 224 x 224, categorized into five types, NO DR, mild, moderate, severe and
proliferate. For the proxy task we combine all the types as it does not require any labelling. For finetuning, we use 5%, 10%,
25% and 50% of the original data set to check the efficiency at each specific size of data.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2533
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue VI June 2022- Available at www.ijraset.com
Data Pre-processing: The data is resized into 244 x 244 and performed various geometric progressions i.e., rotations into
multiples of 90 degrees, (0, 90, 180, 270 degrees). This pre-processed data is then fed to the ConvNET model with
DenseNET121 encoder architecture. It was handled with learning rate of 1e-5, for 200 epochs with batch size of 32.
Fine-tuning with Labelled Data: The ConvNET is trained to predict the geometric progressions of the image which is nothing
short than learning characteristics of the image. It is utmost ready for classification but lacks knowledge of categorization,
which can be achieved by finetuning the model with labelled data. To check various possibilities, we finetuned the model using
different sets of data sizes i.e., 5%, 10%, 25%, 50% and 100% of the original dataset. With batch size of 32 with 5 repetitions
and prediction architecture of simple multiclass.
Classification: The final model is generated after the fine tuning which can classify or detect the Diabetic retinography stages.
It is tested with “qw_kappa_kaggle” scores based on the accuracy and obtained very promising results in comparison to dataset
size
IV. RESULT
Here is the sample code and result.
The final model recognizes and categorizes the retinal fundus data into one of the five kinds. The Kaggle dataset contains
approximately 3600 photos, each of which has been scored on a scale of 0 to 4 by a clinician (NO DR, mild, moderate, severe,
proliferate). To assess our performance on this benchmark, we pre-trained the model using all of the dataset's photos. Then they
were fine-tuned on the same Kaggle data but with varied subset sizes, resulting in a data-efficient evaluation. When compared to
other transfer learning methods that use a big corpus, the outcomes due to data efficient evaluation are not up to par. The dataset is
being tested using 5-fold cross validation. The task's statistic is quadratic weighted kappa, which determines how well two ratings
agree. Its values range from random (0) to total (1) agreement, and it can become negative if there is less agreement than chance.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2534
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue VI June 2022- Available at www.ijraset.com
Avg QW Kappa scores vs percentage of labeled images comparing rotation techniques and baseline values
V. CONCLUSION
The final model we implemented by using Self-Supervised learning and Dense NET 121 convolution network is able to detect
different stages of diabetic retinopathy by using the given fundus data. In this model we used proxy task as Rotations and we are also
able to find the accuracy of the model by using Kappa-Kaggle score. Our findings, particularly in the low data regime, show that in
the medical imaging sector, where data and annotation scarcity is a problem, it is possible to reduce the manual annotation labour
necessary. We believe that utilising deep learning to diagnose Diabetic Retinography improves and mitigates the risk of vision loss
for many patients, as well as making it cost-effective for regular check-ups.
REFERENCES
[1] The Four stages of Diabetic Retinopathy https://modernod.com/articles/2019-june/the-four-stages-of-diabeticretinopathy?c4src=article:infinite-scroll
[2] Densenet - 121 Architecture https://www.kaggle.com/datasets/pytorch/densenet121
[3] The five stages of Kappa – Kaggle score https://www.kaggle.com/code/aroraaman/quadratic-kappa-metric-explained-in-5-simple-steps/notebook
[4] Self – Supervised Learning https://neptune.ai/blog/self-supervised-learning
[5] Proxy task https://medium.com/analytics-vidhya/what-is-self-supervised-learning-in-computer-vision-a-simple-introduction-def3302d883d
[6] Fine tuning with Keras And Deep Learning https://pyimagesearch.com/2019/06/03/fine-tuning-with-keras-and-deep-learning/
[7] C. Doersch, A. Gupta and A. A. Efros, "Unsupervised Visual Representation Learning by Context Prediction," 2015 IEEE International Conference on
Computer Vision (ICCV), 2015, pp. 1422-1430, doi: 10.1109/ICCV.2015.167. L.-C. Chen, G. Papandreou, I.Kokkinos, K.Murphy, and A. L. Yuille. Semantic
Image Segmentationwith Deep Convolutional Nets and Fully Connected CRFs. arXiv:1412.7062 [cs], Dec. 2014. arXiv: 1412.7062.
[8] Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles [1603.09246v3] Unsupervised Learning of Visual Representations by Solving
Jigsaw Puzzles (arxiv.org).
[9] Zhang R., Isola P., Efros A.A. (2016) Colorful Image Colorization. In: Leibe B., Matas J., Sebe N., Welling M. (eds) Computer Vision – ECCV 2016. ECCV
2016. Lecture Notes in Computer Science,vol 9907. Springer, Cham. https://doi.org/10.1007/978-3-319-46487-9_40
[10] Spyros Gidaris, Praveer Singh, and Nikos Komodakis. Unsupervised representation learning by predicting image rotations. CoRR, abs/1803.07728, 2018. URL
http://arxiv.org/abs/1803.07728.
[11] Deepak Pathak, Philipp Krähenbühl, Jeff Donahue, Trevor Darrell, and Alexei Efros. Context encoders: Feature learning by inpainting. In The IEEE Conference
on Computer Vision and Pattern Recognition (CVPR), June 2016.
[12] Jiawei Wang, Shuai Zhu, Jiao Xu, and Da Cao. The retrieval of the beautiful: Self supervised salient object detection for beauty product retrieval. In Proceedings
of the 27th ACM International Conference on Multimedia, MM ’19, page 2548–2552, New York, NY, USA, 2019. Association for Computing Machinery.
ISBN 9781450368896. doi: 10.1145/3343031.3356059. URL: https://doi.org/10.1145/3343031. 3356059.
[13] Aäron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding. CoRR, abs/1807.03748, 2018. URL
http://arxiv.org/abs/1807.03748.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2535