0% found this document useful (0 votes)
14 views72 pages

Final 2

Uploaded by

hazemmohummed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views72 pages

Final 2

Uploaded by

hazemmohummed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 72

TAIZ UNIVERSITY ‫جامعة تعز‬

Collage of Engineering ‫كلية الهندسة وتقنية المعلومات‬


and information
Technology

Optic Disc Segmentation and Glaucoma Detection Using Deep


Learning

Done by:
Hajer Abdul-Jabbar Yaseen Manwar Alahdal
Mirfat Ameen Abdo Qassim Alabsi
Rawia Sameer Mohammed Hassan
Wafa Mohammed Saeed Saeed Alsabri

Supervisor:
Dr. Mujeeb AL Hakimi
Thanks, and appreciation
We thank God Almighty and depend on him for completing and completing the
project. We thank our parents who have always supported us and helped us and our
professors at the College of Engineering, Taiz University.

2
ABSTRACT
Glaucoma is a serious eye disease which damages the optic nerve. This damage is often
caused by an abnormally high pressure in the eye. Glaucoma is one of the leading causes of
blindness. It can occur at any age but is more common in older adults. Because vision loss
due to glaucoma can’t be recovered, it is important to have regular eye exams that include
measurements of the eye pressure so a diagnosis can be made in its early stages and treated
appropriately. Due to the poor traditional diagnosis, cost and effort, we restored to the use
of artificial intelligence using CNN techniques two stages first stage segment Optic
desc(OD) use four models MobilenetV2_unet, Resnet50_unet, Vgg16_unet, Vgg19_unet
which applied in two dataset origa dataset which achieve accuracy 98.46%, 96.87%, 97.88%,
98.88% respectively and prim dataset which achieve accuracy 99.32%, 98.86%, 99.66%,
99.07% respectively, second stage is classification use seven CNN models Resnet50, Vgg16,
Vgg19, Alexnet, Googlenet, MobilenetV2, Densenet121 which applied in three dataset
acrima, origa and prim in first dataset achieve accuracy 97%, 85%, 81%, 90% ,95% ,98%
,95% respectively, in second dataset achieve e accuracy 81%, 82%, 83%, 82% , 82% , 91%
, 85% respectively, in third dataset achieve accuracy 85%, 86%, 89% ,88% , 89% ,96%
,93% respectively. The best result in classification models is mobilenetV2 which achieve
high accuracy.

3
‫الملخص‬
‫المياه الزرقاء مرض خطير يصيب العين ويضر بالعصب البصري‪ .‬غالبًا ما يحدث هذا الضرر بسبب ارتفاع ضغط غير‬
‫طبيعي في العين‪ .‬يعتبر مرض المياه الزرقاء هو أحد األسباب الرئيسية للعمى‪ .‬يمكن أن يحدث في أي عمر ولكنه أكثر‬
‫نظرا ألنه ال يمكن استرداد فقدان البصر بسبب المياه الزرقاء‪ ،‬فمن المهم إجراء فحوصات منتظمة‬
‫عا عند كبار السن‪ً .‬‬‫شيو ً‬
‫نظرا لضعف‬‫للعين تتضمن قياسات ضغط العين حتى يمكن إجراء التشخيص في مراحله المبكرة وعالجه بشكل مناسب‪ً .‬‬
‫التشخيص التقليدي والتكلفة والجهد‪ ،‬لجأنا الى استخدام تقنية ‪ CNN‬في الذكاء االصطناعي وتقسيم العمل لمرحلتين مرحلة‬
‫اكتشاف القرص البصري باستخدام اربع موديالت ‪MobilenetV2_unet, Resnet50_unet, Vgg16_unet,‬‬
‫‪ Vgg19_unet‬ومطبقة على قاعدتين ‪ origa‬وتحقق دقة ‪ %98.88 ، %96.88 ، %96.87 ، %98.46‬على التوالي‪،‬‬
‫و قاعدة ‪ prim‬وتحقق دقة ‪ %99.07 ، %97.88 ، %98.86 ، %99.32‬على التوالي ‪ ،‬والمرحلة الثانية مرحلة‬
‫التصنيف بإستخدام سبع موديالت ‪Resnet50, Vgg16, Vgg19, Alexnet, Googlenet, MobilenetV2,‬‬
‫‪ Densenet121‬وتم تطبيقها على ثالث قواعد بيانات‪ ،‬القاعدة األولى ‪ acrima‬وتحقق دقة ‪، %81 ، %85 ، %97‬‬
‫‪ %95 ، %98 ، %95 ، %90 ، %90‬على التوالي ‪،‬والقاعدة الثانية ‪ origa‬وتحقق دقة ‪، %82 ، %83 ، %82 ،%82‬‬
‫‪ %85 ، %91 ، %82‬على التوالي ‪،‬والقاعدة الثالثة ‪prim‬وتحقق دقة ‪%93 ، %96 ، %89 ، %88 ، %89، %86 ، %85‬‬
‫على التوالي‪ ،‬وافضل نتيجة في التصنيف من حصلنا عليها من موديل ‪ MobilenetV2‬الذي حقق اعلى دقة لقواعد‬
‫البيانات الثالث‪.‬‬

‫‪4‬‬
Table of Contents
1.CHAPTER1: OVERVIEW………….……………………………………….……………9
1.1: introduction……………………………………………..………………………………9
1.2: Traditional devices used to detect glaucoma …………….……….……….………. ….9
1.3: Research Objectives ……………………………………………………….…………10
1.4: Research problem ………………………………………………………….……… …11
1.5: The importance of the research………………………………………………….….…11
1.6: Research problem ………………………………….…………………………………11
1.7: Research methodology ………………………….……………………………………12
1.8: Related work……………………...………………….…………...…….…………….13
2. CHAPTER 2: …………………………………………………………………….…….14
2.1: Introduction…………………………………………………………...………………14
2.2: background……………………………………………………………………………14
2.3: related work ………………….……….……………………………………...……….16
3 CHAPTER 3: Methodology ………………………………….………….………..…….24
3.1:Introduction……………………………………………………………………………24
3.2: Deep Learning…………………………………………………………..………….…24
3.3: Convolutional Neural Networks.……………………………………..………….…..,.24
3.2.1: Layers in Convolutional Neural Net…………………………………..………….…24
3.4: semantic segmentation.……………………………………………..………………..,.27
3.5: unit segmentation ……………………………………………………….………….…27
3.6: Convolutional Neural Network models.…………………………………………........28
3.6.1: MobilenetV2……………………………………………….……………………..…28
3.6.2:DenesNet121.………………………………………………………………………..29
3.6.3: ResNet…………………………………………..……….……………………….…31
3.6.4:Alexnet.……………………………………………………………………………...32
3.6.5:Googlenet.……….…………………………………………………..…………..…..33
3.6.6: Vgg………………………………………………..………………………….…..…33
3.7:Color Fundus Retinal photography.….………………….…………..………………...35
3.7.1:Fundus photography.….……………………….………...…………………………..35
3.7.2:FundusCamera.….……………………………….…………..……………………....35
3.8:Dataset.….…………………………………………………………….……………….36

5
3.8.1:acrima..….…………………………………………………………………………...36
3.8.2:Origa.….……………………………………………………………...………….…..37
3.8.3:Prim.….…………………………………………………………….………….…….37
4. CHAPTER4:Implementation……….………….….…………………………………....38
4.1:Methodology.….………………………………………………………………………38
4.1.1: preprocessing dataset for segmentation.….………………………………………….38
4.1.2: OD Segmentation Using Unit Semantic Segmentation.….……………………….....38
4.1.3: Classification using Pertained Deep CNNs.….…………………………………...…39
4.2:Experiments.….……………………………………………………………………......39
4.2.1:Segmentation….……………………………………………………………………..40
4.2.1.1:Origa segment result….……….……………………………………..…………….40
4.2.1.2: prim segment result…………………………………..…………………………....41
4.2.2: classification.….……………………………………………………..……………...41
4.2.2.1:Acrima classification result………………………………………………………..42
4.2.2.2: origa classification result……...…………………………………………………..42
4.2.2.3: prim classification result……………………………………………………….….43
4.2.2.4 Interface for training using Acrima…………………………………………….…..43
4.2.2.5 Interface for training using origa……………………………………………….….46
4.2.2.6 Interface for training using prim……………………………………………….…..48
4.3: Discussion………………………………………………………………………….….51
5:CHAPTER5:Conclusion and Future.….………………………………………………....52
5.1:Introduction.….………………………………………………………………………..52
5.2:Conclusion.….……………………………………………………………..…………..52
5.3: Difficulties.….………………………….…………………………….………….…..,.52
5.4: FutureWork.….……………………………………………………………..………....52
Reference…………………………………………………………………………………..53
Appendex…………………………………………………………………………………..57

6
List of Figures
Figures 1.1: Traditional devices used to detect glaucoma ……………………...….…….10
Figure1.2:flow chart of segment and classification system ……………………….…….12
Figure 2.1: Healthy optic disc and optic disc with glaucoma……………….…………... .16
Figure2.2:Grading of glaucoma diseases… ………………… ……..………..…… ……..16
Figure3.1:feature map…………………………….………….………………………….…25
Figures3.2:relu activation…… ……… …… … ……………………… …………………25
Figure3.3:relu activation work… ……… ………… ………….… ………….…… … ….25
Figure 3.4: unit architecture……… ……………… ……………… ……………… … ….28
Figure 3.5: mobilenetv2 architecture …………… …………… ………….… …………..29
Figure 3.6: densenet architecture…… ……… …………………… ……………… ……..30
Figure 3.7: resnet50 architecture ……… ………… …………… …………… ………....31
Figure 3.8: Alex net architecture……… …………………… ………………… ………..30
Figure 3.9: Googlenet architecture ………… …………… …………. …… ……… …..33
Figure 3.10: Vgg net architecture…………… ………………………………… ……… …34
Figure 3.11: a fundus camera ……… ………………………….…………… …………… 36
Figure 4.1: eyes images and masks………………….…………………………….………38
Figures 5.1: loss and accuracy acrima dataset resnet50 model…………………………….43
Figures 5.2: loss and accuracy acrima dataset vgg16 model ………….…………………..44
Figure 5.3: loss and accuracy acrima dataset vgg19 model ………….………………. ….44
Figure 5.4: loss and accuracy acrima dataset Alexnet model ……………………….……44
Figure 5.5: loss and accuracy acrima dataset googlenet model …………………………….45
Figure 5.6: loss and accuracy acrima dataset mobilenetV2 model…………..…….………45
Figure 5.7: loss and accuracy acrima dataset dnsenet121 model ………..…….………….45
Figure 5.8: loss and accuracy origa dataset resnet50 model …….………………….………46
Figure 5.9: loss and accuracy origa dataset vgg16 model ……….….…………………….46
Figure 5.10: loss and accuracy origa dataset vgg19 model ……….……...……….………47
Figure 5.11: loss and accuracy origa dataset alexnet model ……………….……………….47
Figure 5.12: loss and accuracy origa dataset googlenet model…….……………….………47
Figures 5.13: loss and accuracy origa dataset mobilenetv2 model …………..……………..48
Figures 5.14: loss and accuracy origa dataset densenet121 model ……….………………..48
Figure 5.15: loss and accuracy prim dataset resnet50 model ..…………………………….49

7
Figure 5.16: loss and accuracy prim dataset vgg16 model …………………………………49
Figure 5.17: loss and accuracy prim datasetvgg19 model ……….………………….…….49
Figure 5.18: loss and accuracy prim dataset alexnet model ……………………….………50
Figure 5.19: loss and accuracy prim dataset googlenet model …………………………….50
Figure 5.20: loss and accuracy prim dataset mobilenetV2 model ………………….………50
Figure 5.21: loss and accuracy prim dataset densenet121 model……….…………………51

List of Tables
Tables 3.1: models summery… ………………………… ………...………………………35
Table 5.1: result segment origa dataset model ………………………………….………….40
Table 5.2: result segment prim dataset model ………………………………….………….41
Table 5.3: result classify acrima dataset model ………………………..………………….42
Table 5.4: result classify origa dataset model ……………………… ………….….…….42
Table 5.5: result classify prim dataset model ………………………………………… ….43

8
CHAPTER 1: OVERVIEW
1.1 Introduction
Eyes are the most sensitive and delicate organ we possess, so an eye examination is
important to check for human eye diseases and other problems that may lead to vision loss.
Glaucoma is diabetes that affects the eye and is the main cause of blindness. Glaucoma is
associated with increased intraocular pressure (IOP). The optic disc (OD) is a part of the
retinal image that plays an important role in the detection of glaucoma so it is an integral
part of the screening system. Clinically, intraocular pressure measurement and visual field
testing to evaluate the optic nerve head are currently used to screen for glaucoma, but only
evaluation of the optic nerve head can detect glaucoma in its early stages [1].
This research presents two stages to glaucoma screening system. First stage is
segmentation (OD) by using U-net architecture in four models are MobileNetV2, ResNet50,
Vgg16, Vgg19. Second stage is classification glaucoma in CNN models are Densenet121,
Resnet50, Alexnet, Googlenet, Vgg16, Vgg19 and MobilenetV2. In three public datasets
acrima, origa and prim.
1.2 Traditional devices used to detect glaucoma
Tonometry device measured of the pressure inside the eye (intraocular pressure or IOP)
a small amount of pressure is applied to the eye by a small device. The average range for eye
pressure is 12–22 mm Hg (“mm Hg” refers to millimeters of mercury, a scale used to record
eye pressure. The devise used to screen for glaucoma. It is also used to measure how well
glaucoma treatment is working. [2] shown in figure 1.1(A)
Ophthalmoscopy: with a special lens to Examination of the shape and color of the optic
nerve, also called a dilated eye exam. A direct ophthalmoscope: is a device that produces an
unreversed or upright image of around 15 times magnification. The direct ophthalmoscope
is a critical tool used to inspect the back portion of the interior eyeball, which is called the
fundus. Examination is usually best carried out in a darkened room. An indirect
ophthalmoscope: is a device that produces a reversed or inverted direct images with two to
9
five times magnification. [3] In comparing direct vs indirect ophthalmoscope, the indirect
ophthalmoscope delivers a stronger source of light, greater opportunity for stereoscopic
inspection of the eyeball interior, and a specifically designed objective lens. Shown in figure
1.1(B)
Indirect ophthalmoscopes have proven to be an exceptionally valuable device for the
treatment and diagnosis of detachments, holes, and retinal tears. In order for the satisfactory
use of an indirect ophthalmoscope, the patient’s pupils must be completely dilated. shown
in figure 1.1(C)
Gonioscopy: Gonioscopy is performed during the eye exam to evaluate the internal
drainage system of the eye, also referred to as the anterior chamber angle. The "angle" is
where the cornea and the iris meet. This is the location where fluid inside the eye (aqueous
humor) drains out of the eye and into the venous system [4] shown in figure 1.1(D)

A B C D
Figure 1.1 Traditional devices used to detect glaucoma
1.3 Research Objectives
The main goal of this study is to identify the optic disc for early detection of glaucoma
and reducing the burden on physicians to examine patients with glaucoma.
Sub-objective
- Applying different CNNs models to choose the best models.
- Improve the performance of glaucoma detection systems.
- Employ the study material taken during our studies.

10
1.4 Research Problem
Glaucoma is the second largest disease in the world that causes blindness, and it is a
serious disease that may affect the eye and occurs on the retina. The symptoms of glaucoma
appear when the disease reaches an advanced stage. Therefore, early detection is very
important for delaying the progression of the disease and delaying blindness. The need for
an automated system to detect these diseases is important to save time and significantly
reduce the workload of ophthalmologists. Screening for glaucoma is expensive, time-
consuming, and prone to human error. It needs highly experienced professionals. To handle
a large number of data and reduce human errors in examination, it is important to develop
an automatic glaucoma detection system, which can provide actionable, accurate, rapid and
interpretable diagnoses. However, creating a robust automated OD-detection method is
difficult as the OD’s size varies from an image to another. Also, OD varies due to retinal
diseases. The goal in this thesis is the development of an automatic, novel, ultrafast, and
robust OD detection and segmentation method. The accuracy and the performance of the
proposed method should be comparable to the accuracy and performance of other automatic
algorithms.
1.5 Research Questions
What is the importance of our study in the field of ophthalmology?
- Why we use the optic disk segmentation before classification?
- What is the difference between the previous methods and our method in this study?
1.6 The importance of the research
Our research is of great importance in the field of ophthalmology, as it contributes to
two main tasks: Firs, it contributes to the early detection of the second most dangerous
disease that may affect the eye, which is glaucoma, whose neglect or lack of early detection
may result in blindness, because it is a disease that extinguishes the eye. Second: our project
also contributed to the process of identifying the optic nerve of the eye, which is the most
region damage by glaucoma disease. The process of detecting glaucoma and determining the
11
optic nerve is of very great importance, as it facilitates the identification of the doctor and
saves time, effort, and money for doctors and patients, because such examinations require a
long time for the results to appear, and they are very expensive and not easy for any patient
to conduct.
1.7 Research Methodology
The method in this research aims to segment the OD region in retinal images. Then using
CNNs styles to identify between normal and glaucoma in the segmented OD region. In
segment (OD) we used MobilenetV2_unet, Resnet50_unet, Vgg16_unet, Vgg19_unet
models Semantic Segmentation for localizing and extracting optic disc from retinal images.
In classification stage, uses Resnet50, Vgg16, Vgg19, Alexnet, Googlenet, MobilenetV2,
Densenet121 models to classify the segmented disc into healthy or glaucomatous.

Input
MobileNet,ResNet50,
Vgg16,Vgg19 Image

Image preprocessing

Segmentation OD

Cropped OD region

Classification

Output

Figure 1.2 flow chart of segment and classification system


12
1.8 Related work
Cheng et al. [5] segmented the OD region, using superpixel based segmentation, and then
CDR was computed based on the change of intensity within the cropped OD image. The
method was tested on two datasets, SiMES dataset and SCES dataset, separately and
achieved AUC of 83% and 88% respectively.
Chakravarty et al. [6] used Hough transform to detect OD region, extracted the texture of
projection and bag of words features from the detected OD, and finally trained SVM
classifier to discriminate between healthy and glaucoma OD. They obtained an accuracy of
76.8% and AUC of 78.0% on DRISTI-GS1dataset.
Karkuzhalietal. [7] used superpixel segmentation to retrieve OD and OC regions, and
measured CDR, inferior-superior-nasal-temporal (ISNT), distance between central OD and
optic nerve head, the area of blood vessels inside optic nerve head and intensity value
between central OD and optic nerve head. Neural networks were trained using the
measurements as the attributes to identify the abnormality and obtained100% accuracy on
26images. However, their results were hampered by validating on a small number of images.

13
CHAPTER 2: LITERATURE REVIEW
2.1 Introduction
Automated retinal image analysis to detect glaucoma has been researched intensively in
recent years with variable success. The methods vary from simpler machine learning
approaches to sophisticated and advanced deep learning approaches. We use two steps for
detection glaucoma first step is to segmentation OD and the second step is classification
glaucoma.

2.2 Background
Glaucoma is a chronic eye disease which damages the optic nerve and can lead to blindness.
Glaucoma is considered one of the main reasons of blindness, with approximately 80 million
people around the world [8]. According to the American Academy of Ophthalmology
(AAO) people may be at heightened risk of glaucoma if they are over 40 years old, are Black,
Hispanic, or Asian, have a family history of glaucoma, are nearsighted or farsighted, have
other chronic eye conditions, have injured their eye in the past have diabetes, have high
blood pressure, have poor blood circulation, have used corticosteroid medications for
prolonged periods of time [9]. Therefore, regular eye examination once per year is essential
and recommended for early glaucoma screening, particularly for people, over 40 years old.
Early signs of eye disease and changes in vision may start to occur at this age. Angle-closure
glaucoma and open-angle glaucoma are the two common glaucoma types and present
different warning signs.
Primary Angle-Closure Glaucoma, also called Narrow Angle Glaucoma. e. In Acute Angle-
Closure Glaucoma, the intraocular pressure rises very quickly, causing noticeable symptoms
such as eye pain, blurry vision, redness, rainbow-colored rings (“haloes”) around lights, and
nausea and/or vomiting. Acute Angle-Closure Glaucoma is a medical emergency. It can
cause permanent vision damage and requires immediate medical attention [2].

14
On the other hand, Primary Open-Angle Glaucoma, the most common form of glaucoma
and also called “the sneak thief of sight”, is a lifelong condition that accounts for at least
90% of all glaucoma cases. There are no early warning signs of Open-Angle Glaucoma. It
develops slowly and sometimes without noticeable sight loss for many years. If Open-Angle
Glaucoma is not diagnosed and treated, it can cause gradual loss of vision. With regular eye
exams, Open-Angle Glaucoma may be found early and usually responds well to treatment
to preserve vision. Glaucoma is not curable, and vision lost cannot be restored. With
medication, laser treatment and surgery, it is possible to slow or stop further loss of vision.
Since Open-Angle Glaucoma cannot be cured, it must be monitored for life. Diagnosis is the
first step to preserving your vision [2].
Most other types of glaucoma are variations of the open-angle or angle-closure types. These
glaucoma types can occur in one or both eyes. Normal-Tension Glaucoma (NTG),
Secondary Glaucoma, Pigmentary Glaucoma, Congenital Glaucoma, Exfoliative Glaucoma,
Neovascular Glaucoma, Uveitic Glaucoma, Traumatic Glaucoma [2].
This silent disease affects the optic disc (OD) region, and causes OD abnormalities, for
example enlarging cup to disc ratio a pale color, hemorrhage or changes in the vicinity of
the OD. Figure 2.1 shows the noticeable differences of the OD in a healthy and a glaucoma
eye. Various stages of glaucoma can be observed in Figure 2.2. Thus, optic nerve assessment
in retinal images becomes an essential and standard test for glaucoma detection. There is
currently no cure because retinal neurons that die do not regenerate, however, progression
of the disease may be slowed with drugs that lower intraocular pressure (IOP).

15
Healthy optic disc Glaucoma optic disc
Figure 2.1 Healthy optic disc and optic disc with glaucoma

Figure 2.2 Grading of glaucoma diseases


2.3 Related Work
In 2010 Daniel Welfera et al. [10] This study introduces a new adaptive method based on
mathematical morphology to identify some important optic disk features namely, the optic
disk locus and the optic disk rim. The proposed method was tested on two publicly available
databases, DRIVE and DIARETDB1. For the DRIVE database, we obtained correct optic
disk location in 100% of the images, 41.47% of mean optic disk overlap. For the 28
DIARETDB1 database, the optic disk was correctly located in 97.75% of the images with a
mean overlap of 43.65%. These results indicate that an improvement has been obtained over
others methods proposed by literature. Specially because our method tries not only to detect
the optic disk, but also to detect the optic disk contour (i.e., boundaries), without having to
assume any predefined shape (e.g., a circle of a predefined size). In order to evaluate their
16
results quantitatively, the optic disk segmentation results were compared to other approaches
using quantitative measures such as Mean Absolute Distance (MAD) and area overlap,
confirming the benefits offered by our approach. Furthermore, their method has been
designed to detect optic disk features even when diabetic lesions or illumination artifacts are
present in the retina image. However, it was verified experimentally that large opaque lesions
(e.g. large hemorrhages) tend to reduce the vascular tree visibility, and may impact
negatively on our method results. The experimental results are promising, and the proposed
method appears to be robust to the different imaging conditions existing in both image
databases tested, since the same parameter values were used in all experiments. Besides, the
method does not require a vessels elimination stage in order to reduce the vessels influence
in the optic disk location or in the detection of its rim. Usually, the optic disk location
requires some parameter fine tuning in most methods available in the literature, but this
additional parameter adjustment is not necessary in their method. On the other hand, their
proposed approach requires the computation of specific preprocessing stages (e.g.,
enhancement and smoothing steps to detect the vascular tree). The optic disk locus and
boundary detection, and report 95% of success rate in correct optic disk location, and an
accuracy of 70% in the optic disk boundary detection (modeling the optic disk as a circle).
They utilize texture descriptors and a regression-based method to find the most likely circle
fitting the optic disk.
In 2012 H. Yu et al. [11] The optic disk (OD) center and margin are typically requisite
landmarks in establishing a frame of reference for classifying retinal and optic nerve
pathology. Reliable and efficient OD localization and segmentation are important tasks in
automatic eye disease screening. This paper presents a new, fast, and fully automatic OD
localization and segmentation algorithm developed for retinal disease screening. First, OD
location candidates are identified using template matching. The template is designed to adapt
to different image resolutions. Then, vessel characteristics (patterns) on the OD are used to
determine OD location. Initialized by the detected OD center and estimated OD radius, a
17
fast, hybrid level set model, which combines region and local gradient information, is applied
to the segmentation of the disk boundary. Morphological filtering is used to remove blood
vessels and bright regions other than the OD that affect segmentation in the peripapillary
region. Optimization of the model parameters and their effect on the model performance are
considered. Evaluation was based on 1200 images from the publicly available MESSIDOR
database. The OD location methodology succeeded in 1189 out of 1200 images (99%
success). The average mean absolute distance between the segmented boundary and the
reference standard is 10% of the estimated OD radius for all image sizes. Its efficiency,
robustness, and accuracy make the OD localization and segmentation scheme described
herein suitable for automatic retinal disease screening in a variety of clinical settings. Index
Terms—Automatic eye disease screening, level set segmentation, optic disk (OD)
localization, parameter optimization. They divided the database into four subsets and each
subset was graded by a different single observer using computer software. An overall
accuracy of 90.32%.
In 2017. Artem et al. [12] this paper use Contrast Limited Adaptive Histogram Equalization
(CLAHE) as a pre-processing for segment optic desc and cup. It equalizes contrast by
changing color of image regions and interpolating the result a. then use convolutional neural
network built upon U-Nets. Use publicly available datasets DRIONSDB, RIM-ONE v.3,
DRISHTI-GS. the result for optic disc (IOU = 0.93, Dice = 0.97) and for optic cup (IOU =
0.93, Dice = 0.97).
In 2018 Guo et al. [13] first segmented OD and OC regions using U-net, then extracted
eight morphologic features and fed them as an input to a random forest classifier. They tested
using ORIGA dataset and obtained accuracy of 76.9% and AUC of 83.1%.
In 2018 Yaroub Elloumi et al. [14] This paper presented a method and its associated
software towards the development of an Android smartphone app based on a previously
developed ONH detection algorithm. The development of this app and the use of the D-Eye
lens which can be snapped onto a smartphone provide a mobile and cost-effective computer-
18
aided diagnosis (CAD) system in ophthalmology. In particular, this CAD system would
allow eye examination to be conducted in remote locations with limited access to clinical
facilities. A pre-processing step is first carried out to enable the ONH detection on the
smartphone platform. Then, the optimization steps taken to run the algorithm in a
computationally and memory efficient manner on the smartphone platform is discussed. The
smartphone code of the ONH detection algorithm was applied to the STARE and DRIVE
databases resulting in about 96% and 100% detection rates, respectively, with an average
execution time of about 2s and 1.3s. In addition, two other databases captured by the D-Eye
and iExaminer snap-on lenses for smartphones were considered resulting in about 93% and
91% detection rates, respectively, with an average execution time of about 2.7s and 2.2s,
respectively.
In 2019 Andres et al. [15] It employed five different ImageNet-trained models (VGG16,
VGG19, InceptionV3, ResNet50 and Xception) for automatic glaucoma assessment using
fundus images. Results from an extensive validation using cross-validation and cross-testing
strategies were compared with previous works in the literature. Using five public databases
(1707 images), an average AUC of 0.9605 with a 95% confidence interval of 95.92–97.07%,
an average specificity of 0.8580 and an average sensitivity of 0.9346 were obtained after
using the Xception architecture, significantly improving the performance of other
state-of-the-art works.
In 2019 YUAN GAO et al. [16] an automated detection scheme for glaucoma in terms of
different evaluation parameters was presented. These parameters require the precise
segmentation information for the OD and the OC respectively obtained by two proposed
methods.For extracting the accurate bound of OD, a novel segmentation model is presented.
First, LSACM is introduced to deal with the commonly occurred intensity inhomogeneity
phenomenon. Then, we make full use of the multi-view information based on the appearance
and shape of OD achieving accurate OD detection in varied conditions. Meanwhile, a novel
OC segmentation method is also presented. First, considering the special structure
19
relationship between the OD and the OC, a novel preprocessing approach is used to modify
LSACM (MLSACM) to guide the OC contour evolution in an effective region and reduce
the negative effect of non-objects, and it can overcome the difficulty which is that the
traditional ACM cannot directly segment the OC. Second, we extend the MLSACM model
by integrating the local image probability information around the point of interest from the
multi-dimensional feature space to remedy insufficient for the single-feature space. Finally,
the shape priori constraint information fused in proposed model becomes a stronger cue than
the intensity information in some regions maintaining the intrinsic anatomical structure of
the OC. The DRISHTI-GS database is applied to evaluate the performance of two novel
models for the OD and the OC. The average F-Score/average boundary distance which are
0.950/8.320 achieved by the proposed OD segmentation method and the average F-
Score/average boundary distance which are 0.852/20.390 obtained by the proposed OC
segmentation method are superior to ones acquired by others state-of-the-art approaches.
In 2019 Lio et al. [17] proposed a new deep learning method, named AG-CNN, for
automatic glaucoma detection and pathological area localization upon fundus images. AG-
CNN model is composed of the subnets of attention prediction, pathological area localization
and glaucoma classification. As such, glaucoma could be detected using the deep features
highlighted by the visualized maps of pathological areas, based on the predicted attention
maps. For training the AG-CNN model, established the LAG database with 5,824 fundus
images labeled with either positive or negative glaucoma, along with their attention maps on
glaucoma detection and LAG database contains 5,824 fundus images with 2,392 positive
and 3,432 negative glaucoma samples obtained from Beijing Tongren Hospital 1 the metric
of sensitivity is more important than that of specificity in glaucoma detection also. Attention
prediction subnet in AG-CNN, an attention prediction subnet is designed to generate the
attention maps of the fundus images, which are then used for pathological area localization
and glaucoma detection. Specifically, the input of the attention prediction subnet is the RGB
channels of a fundus image, which is represented by the tensor (size: 224 × 224 × 3). Then,
20
the input tensor is fed to one convolutional layer with kernel size of 7 × 7, followed by one
max-pooling layer. Subsequently, the features flow into 8 building blocks for extracting the
hierarchical features. For more details about the building blocks.
In 2019 Bajwa et al. [18] This paper proposed a two-stage solution for OD localization
and glaucoma screening classification. it developed a heuristic method for approximating
the location of OD in retinal images. In this way, the baseline truth of translation was
established for all seven data sets (Messidor, DRIONS, DRIVE, DIARETDB1, OCT & CFI,
HRF, Origa). For two phases fully automated localization was adopted using the fastest
RCNN which is a unified object detection network, the model consists of three main
modules: a region proposal network (RPN), a CNN classifier and a bounding box regression.
Trained with the pre-trained VGG16 as a classifier. These fully automated systems
performed from seven data sets with an IOU greater than 50% and a classification accuracy
of 79.67%.
In 2019. Mohamed et al. [19] preprocessed images to remove noise and enhance the
contrast, then segmented the OD and OC regions using simple linear iterative clustering
superpixels. Finally, CDR was computed to determine the presence of glaucoma. They
reported the value of CDR between 0.4 and 0.6 for the class of non-glaucomatous images
and greater than 0.6 is for glaucomatous image. The proposed method has been tested on
RIM-One database. The experimental results have successfully distinguished optic disc and
optic cup from the background with an average accuracy and sensitivity of 98.6% and 92.3%,
respectively tested on linear kernel.
In 2020 Orlando et al. [20] In this paper two different CNNs were used in experiments,
namely OverFeat and VGG-S. Using the Drishti-GS1 and DRIVE dataset. The method is
first applied to the fully connected layer as feature vectors, and used to train a new classifier
explicitly intended for the new task. The second strategy is to not only replace the classifier
and retrain it on top of the CNN on the new data set, but also to adjust the weights of the
previously tested network by continuing the reverse propagation process. Depending on the
21
new task, it is possible to fine-tune all network layers or keep some of the previous layers
fixed, while restricting fine-tuning to some of the network's top-level layers. In paper,
OverFeat features performed better than VGG-S in images with (AUC = 0.7212 vs AUC =
0.6655, respectively)
In 2021 manal et al. [21] the proposed models are based on CNNs. All the developed CNN
models contain convolutional layers followed by fully connected layers. The output of the
last fully connected layers is used as input to a softmax classifier for diagnosing glaucoma.
The first model presented in Section II-A utilizes transfer learning which involves using a
pre-trained CNN along with a limited-size labeled glaucoma dataset. The second model
presented in Section II-B combines transfer learning (Section II-A) with a self-learning
technique to perform semi-supervised learning. The final model in Section II-C combines a
convolutional auto encoder ( unsupervised learning) with a supervised learning stage to
perform semi-supervised learning also The systems are based on different deep learning
techniques including supervised and semi-supervised training like TCNN(Transfer
Convolutional Neural Network model), SCNN(Semi-supervised Convolutional Neural
Network model)and SCNN-DAE(Semi-supervised Convolutional Neural Network mode)
so the result is the best performance was achieved by the SSCNN-DAE Model.
In 2021 Xi Xu et al. [22] They introduced a transfer learning technique that leverages the
fundus feature learned from similar ophthalmic data to facilitate diagnosing glaucoma.
Specifically, a Transfer Induced Attention Network (TIA-Net) for automatic glaucoma
detection is proposed, which extracts the discriminative features that fully characterize the
glaucoma-related deep patterns under limited supervision. By integrating the channel-wise
attention and maximum mean discrepancy, our proposed method can achieve a smooth
transition between general and specific features, thus enhancing the feature transferability.
Different from previous studies applied classic CNN models to transfer features from the
non-medical dataset, they leverage knowledge from the similar ophthalmic dataset and
propose an attention-based deep transfer learning model for the glaucoma diagnosis task.
22
Extensive experiments on two real clinical datasets show that our TIA-Net outperforms other
state-of-the-art methods, and meanwhile, it has certain medical value and significance for
the early diagnosis of other medical tasks. To delimit the boundary between general and
specific features precisely, they first investigate how many layers should be transferred
during training with the source dataset network. Next, they compare our proposed model to
previously mentioned methods and analyze their performance. Finally, with the advantages
of the model design, they provided a transparent and interpretable transferring visualization
by High lighting the key specific features in each fundus image. They evaluate the
effectiveness of TIA-Net on two real clinical datasets and achieve an accuracy of
85.7%76.6%, sensitivity of 84.9%/75.3%, specificity of 86.9%/77.2%, and AUC of 0.929
and 0.835, far better than other state-of-the-art methods.

23
Chapter 3: Methodology
3.1 Introduction
This chapter exposes information about technology deep learning and CNN models used
to segment (OD) and classification glaucoma daises. In semantic segmentation used unet
model and used pretrained model in classification and fundus image in datasets used in
research.
3.2 Deep Learning
Deep Learning or Deep Neural Network refers to Artificial Neural Networks (ANN) with
multi layers. Over the last few decades, it has been considered to be one of the most powerful
tools, and has become very popular in the literature as it is able to handle a huge amount of
data. At its core, deep learning relies on iterative methods of teaching machines to imitate
human intelligence. An artificial neural network implements this iterative method through
several hierarchical levels. Initial levels help machines learn simple information, and as
levels increase, information continues to build. With each new level machines, you pick up
more information and combine it with what you learned in the last level. At the end of the
process, the system collects a final piece of information which is a composite input. This
information passes through several hierarchies and resembles complex logical reasoning.
[23]
3.3 Convolutional Neural Networks
CNN (convolutional neural network) is a class of deep learning designed for working
with two-dimensional image data. They use a special technique called convolution to
analyze visual imagery [23]. CNN have emerged as the master algorithm in computer vision
in recent years, and developing recipes for designing them has been a subject of considerable
attention. The history of convolutional neural network design started with LeNet-style
models.
3.3.1 Layers in a Convolutional Neural Network
There are four types of layers in a Convolutional Neural Network [24]:
24
• Convolutional layers
Convolutional layers are comprised of filters and feature maps.
1. Filters
The filters are essentially the neurons of the layer. They have both weighted inputs and
generate an output value like a neuron.
2. Feature Maps
The feature map is the output of one filter applied to the previous layer. A given filter is
drawn across the entire previous layer, moved one pixel at a time. Each position results in
an activation of the neuron and the output is collected in the feature map. Shown in figure
3.1

Figure 3.1 feature map


• ReLU layer
ReLU has been used after every Convolution operation. ReLU stands for the Rectified Linear
Unit and is a non-linear operation. Its output is given by: shown in figure 3.2

Figure 3.2 relu activation

25
ReLU is an element-wise operation (applied per pixel) and replaces all negative pixel values
in the feature map by zero. The purpose of ReLU is to introduce non-linearity in our ConvNet
since most of the real-world data we would want our ConvNet to learn would be non-linear
(Convolution is a linear operation — element-wise matrix multiplication and addition, so we
account for non-linearity by introducing a non-linear function like ReLU.

-1 3 2 Relu 0 3 2
4 -2 1 4 0 1
-1 1 3 0 1 3

Figure 3.3 relu activation work


• Pooling Layers
The Pooling layers follow a sequence of one or more convolutional layers. As such, pooling
may be considering a technique to compress or generalize feature representations and
generally reduce the overfitting of the training data by the model. Pooling layers are often
very simple, taking the average or the maximum of the input value in order to create its own
feature map. Spatial Pooling (also called subsampling or downsampling) reduces the
dimensionality of each feature map but retains the most important information. Spatial
Pooling can be of different types: Max, Average, Sum, etc.
• Fully Connected Layers
Fully connected layers are the normal flat feedforward neural network layer. These layers
may have a nonlinear activation function or a softmax activation in order to output
probabilities of class predictions. Fully connected layers are used at the end of the network
after feature extraction and consolidation has been performed by the convolutional and
pooling layers. They are used to create final nonlinear combinations of features and for
making predictions by the network.

26
3.4 Semantic segmentation
Semantic segmentation is one of the most important tasks in medical imaging. Before the
revolution of deeplearning in computer vision, traditional handcrafted features have been
utilized in semantic segmentation. During the last few years, deep learning-based approaches
have achieved outstanding results in image segmentation. These networks can be divided
into two main groups, i.e., encoder-decoder structures and the models using spatial pyramid
pooling [9]. The encoder-decoder networks have been successfully utilized for semantic
segmentation. These networks include encoding and decoding paths. The encoder generates
a large number of feature maps with reduced dimensionality, and the decoder produces
segmentation maps by recovering the spatial information. Image segmentation is a long-
standing computer Vision problem. Quite a few algorithms have been designed to solve this
task, such as the Watershed algorithm, Image thresholding, K-means clustering, Graph
partitioning methods, etc. Many deep learning architectures (like fully connected networks
for image segmentation) have also been proposed.
Encoding phase: The aim of this phase is to extract essential information from the image.
This is done using a pre trained Convolutional Neural Network
Decoding phase: The information extracted in encoding phase is used here to reconstruct
output of appropriate dimensions.
3.5 U-net Segmentation
A U-shaped architecture consists of a specific encoder-decoder scheme: The encoder reduces
the spatial dimensions in every layer and increases the channels. On the other hand, the
decoder increases the spatial dims while reducing the channels. The tensor that is passed in
the decoder is usually called bottleneck. Encoder (left side): It consists of the repeated
application of two 3x3 convolutions. Each conv is followed by a ReLU and
batch normalization. Then a 2x2 max pooling operation is applied to reduce the spatial
dimensions. Again, at each down sampling step, we double the number of feature channels,
while we cut in half the spatial dimensions.
27
Decoder path (right side): Every step in the expansive path consists of an upsampling of the
feature map followed by a 2x2 transpose convolution, which halves the number of feature
channels. We also have a concatenation with the corresponding feature map from the
contracting path, and usually a 3x3 convolutional (each followed by a ReLU). At the final
layer, a 1x1 convolution is used to map the channels to the desired number of classes. [25]

Figure 3.4 unet architecture


3.6 Convolution neural network models
3.6.1 mobileNet:
The MobileNet model is based on depthwise separable convolutions which is a form of
factorized convolutions which factorize a standard convolution into a depthwise convolution
and a 1×1 convolution called a pointwise convolution. For MobileNets the depthwise
convolution applies a single filter to each input channel. The pointwise convolution then
applies a 1×1 convolution to combine the outputs the depthwise convolution. A standard
convolution both filters and combines inputs into a new set of outputs in one step. The
depthwise separable convolution splits this into two layers, a separate layer for filtering and
a separate layer for combining. This factorization has the effect of drastically reducing
computation and model size. depthwise separable convolution are made up of two layers:
depthwise convolutions and pointwise convolutions. It uses depthwise convolutions to apply
28
a single filter per each input channel (input depth). Pointwise convolution, a simple1×1
convolution is then used to create a linear combination of the output of the depthwise layer.
MobileNets use both batchnorm and ReLU nonlinearities for both layers. All layers are
followed by a batchnorm [13]and ReLU nonlinearity with the exception of the final fully
connected layer which has no nonlinearity and feeds into a softmax layer for classification.
Figure 3 contrasts a layer with regular convolutions, batchnorm and ReLU nonlinearity to
the factorized layer with depthwise convolution, 1×1 pointwise convolution as well as
batchnorm and ReLU after each convolutional layer. Down sampling is handled with strided
convolution in the depthwise convolutions as well as in the first layer. A final average
pooling reduces the spatial resolution to 1 before the fully connected layer. Counting
depthwise and pointwise convolutions as separate layers, MobileNet has 28 layers. [26]

figure 3.5 mobilenetV2 architecture


3.6.2 DenseNet121
is a new CNN architecture that reached State-Of-The-Art (SOTA) results on classification
datasets (CIFAR, SVHN, ImageNet) using less parameters. Thanks to its new use of residual
it can be deeper than the usual networks and still be easy to optimize. General Architecture:
DenseNet is composed of Dense blocks. In those blocks, the layers are densely connected
together: Each layer receive in input all previous layers output feature maps

29
Figure 3.6 dense net Architecture
This extreme use of residual creates a deep supervision because each layer receives more
supervision from the loss function thanks to the shorter connections.
1. Dense block: A dense block is a group of layers connected to all their previous layers. A
single layer looks like this: Batch Normalization, ReLU activation and 3x3 Convolution.
2. Transition layer: Instead of summing the residual like in ResNet, DenseNet concatenates
all the feature maps. It would be impracticable to concatenate feature maps of different sizes
(although some resizing may work). Thus, in each dense block, the feature maps of each
layer have the same size. However down-sampling is essential to CNN. Transition layers
between two dense blocks assure this role. A transition layer is made of: Batch
Normalization,1x1 Convolution, Average pooling, Growth rate. Concatenating residuals
instead of summing them has a downside when the model is very deep: It generates a lot of
input channels!
That Dense Net has less parameters than a usual SotA networks. There are two reasons: First
of all, a DenseNet’s convolution generates a low number of feature maps. The authors
recommend 32 for optimal performance but shows SotA results with only 12 output

30
channels! The number of output feature maps of a layer is defined as the growth rate.
DenseNet has lower need of wide layers because as layers are densely connected there is
little redundancy in the learned features. All layers of a same dense block share a collective
knowledge. The growth rate regulates how much new information each layer contributes to
the global state.
The second reason DenseNet has few parameters despite concatenating many residuals
together is that each 3x3 convolution can be upgraded with a bottleneck. [27]
3.6.3 ResNet
ResNet is a short name for a residual network. Deep convolutional neural networks the
human level image classification result. Deep networks extract low, middle and high-level
features and classifiers in an end-to-end multi-layer fashion, and the number of stacked
layers can enrich the “levels” of features. The stacked layer is of crucial importance, look
at the ImageNet result. When the deeper network starts to converge, a degradation problem
has been exposed: with the network depth increasing, accuracy gets saturated (which might
be unsurprising) and then degrades rapidly. Such degradation is not caused by overfitting or
by adding more layers to a deep network leads to higher training error. The deterioration of
training accuracy shows that not all systems are easy to optimize. To overcome this problem,
Microsoft introduced a deep residual learning framework. Instead of hoping every few
stacked layers directly fit a desired underlying mapping, they explicitly let these layers fit a
residual mapping. The formulation of F(x)+x can be realized by feedforward neural
networks with shortcut connections. Shortcut connections are those skipping one or more
layers shown in Figure 1. The shortcut connections perform identity mapping, and their
outputs are added to the outputs of the stacked layers. By using the residual network, there
are many problems which can be solved such as:
• ResNets are easy to optimize, but the “plain” networks (that simply stack layers)
shows higher training error when the depth increases.

31
• ResNets can easily gain accuracy from greatly increased depth, producing results
which are better than previous networks. [28]

Figure 3.7 resnet50 Architecture


3.6.4 Alexnet
It consisted of 11×11, 5×5,3×3, convolutions, max pooling, dropout, data augmentation,
ReLU activations, SGD with momentum. It attached ReLU activations after every
convolutional and fully-connected layer. The architecture depicted in Figure 1, the AlexNet
contains eight layers with weights; the first five is convolutional and the remaining three are
fully connected. The output of the last fully-connected layer is fed to a 1000-way softmax
which produces a distribution over the 1000 class labels. The network maximizes the
multinomial logistic regression objective, which is equivalent to maximizing the average
across training cases of the log-probability of the correct label under the prediction
distribution. The kernels of the second, fourth, and fifth convolutional layers are connected
only to those kernel maps in the previous layer which reside on the same GPU. The kernels
of the third convolutional layer are connected to all kernel maps in the second layer. The
neurons in the fully-connected layers are connected to all neurons in the previous layer. In
short, AlexNet contains 5 convolutional layers and 3 fully connected layers. Relu is applied
after very convolutional and fully connected layer. Dropout is applied before the first and

32
the second fully connected year. The network has 62.3 million parameters and needs 1.1
billion computation units in a forward pass. [29]

Figure3.8: AlexNet model architecture


3.6.5 googlenet
GoogLeNet is a 22-layer deep convolutional neural network that’s a variant of the Inception
Network, a Deep Convolutional Neural Network developed by researchers at Google. The
GoogLeNet architecture presented in the ImageNet Large-Scale Visual Recognition
Challenge 2014(ILSVRC14) solved computer vision tasks such as image classification and
object detection [30]

Figure 3.9 googlenet model architecture


3.6.6 Vgg
This architecture is from VGG group, Oxford. It makes the improvement over AlexNet by
replacing large kernel-sized filters (11 and 5 in the first and second convolutional layer,

33
respectively) with multiple 3X3 kernel-sized filters one after another. With a given receptive
field (the effective area size of input image on which output depends), multiple stacked
smaller size kernel is better than the one with a larger size kernel because multiple non-linear
layers increases the depth of the network which enables it to learn more complex features,
and that too at a lower cost. For example, three 3X3 filters on top of each other with stride 1
ha a receptive size of 7, but the number of parameters involved is 3*(9C^2) in comparison
to 49C^2 parameters of kernels with a size of 7. Here, it is assumed that the number of input
and output channel of layers is C. Also, 3X3 kernels help in retaining finer level properties
of the image. The network architecture is given in the table. You can see that in VGG-D,
there are blocks with same filter size applied multiple times to extract more complex and
representative features. This concept of blocks/modules became a common theme in the
networks after VGG.The VGG convolutional layers are followed by 3 fully connected
layers. The width of the network starts at a small value of 64 and increases by a factor of 2
after every sub-sampling/pooling layer. It achieves the top-5 accuracy of 92.3 % on
ImageNet. [31]

Figure 3.10 vgg model architecture

34
Table 3.1 model summery [32]
Network Depth Size(MB) Parameter(× 106 )
MobileNetV2 88 14 56
ResNet50 - 98 26
DenseNet211 121 33 8
Alexnet 8 227 61
Googlenet 22 27 7
Vgg16 23 582 138
Vgg19 26 594 143

3.7 Color Fundus Retinal Photography


3.7.1 fundus photography
Fundus imaging is a crucial measure of an ophthalmic practice. It documents the retina of
the eye, the neurosensory nerve which transforms the visually capturing images into the
electrical impulses, our brain understands. Fundus imaging is performed with retina which
can be snapped straight for hardly as the pupil, which is used both as an entry and departure
for the fundus camera's imaging light rays. Fundus imaging plays an important role to
analyze and understand the medical position of human eye; it helps to diagnose the major
fundus abnormalities such as Glaucoma, Diabetic Retinopathy (DR) and Age-related
Macular Degeneration (AMD). [33] The retina is imaged to document conditions such as
diabetic retinopathy, age related macular degeneration, macular edema and retinal
detachment. Fundus photography is also used to help interpret fluorescein angiography as
certain retinal landmarks visible in fundus photography are not visible on a fluorescein
angiogram.
3.7.2 Fundus Camera
35
uses a fundus camera to record color images of the condition of the interior surface of the
eye, in order to document the presence of disorders and monitor their change over time. A
fundus camera or retinal camera is a specialized low power microscope with an attached
camera designed to photograph the interior surface of the eye, including the retina, retinal
vasculature, optic disc, macula, and posterior pole (i.e. the fundus). Fundus camera is used
to get fundus photographs. It consists of a specialized low power microscope and attached
camera that is designed to photograph the interior surface of the eye, including the retina,
OD, macula (i.e. the fundus). The optical design of fundus cameras is based on the principle
of indirect ophthalmoscope. A fundus camera provides an upright, magnified view of the
fundus. A typical camera views 30 to 50 degrees of retinal area considered the normal angle
of view, with a magnification of 2.5 times larger than life. It allows some modification of
this relationship through zoom or auxiliary lenses from 15 degrees which provides 5x
magnification to 140 degrees with a wide-angle lens which minimize the image by half. [34]

Figure 3.11 a fundus camera


3.8 Datasets
3.8.1 Acrima
This new dataset [35] consists of a total of 705 fundus images with 396 glaucoma images
and 309 normal images taken with centered optic disc. The dataset does not provide any
annotations for OD and OC segmentation. Relatively balanced proportion of normal and

36
glaucomatous images in this dataset makes it particularly suitable for training DL based
classifiers.
3.8.2 Origa
Online Retinal Fundus Image database for Glaucoma Analysis and research (ORIGA) [36]
is one of the largest and most commonly used datasets for glaucoma detection made public
since 2010. This dataset consists of 650 images (168 glaucoma, 482 healthy) collected by
Singapore Eye Research Institute between 2004 and 2007. The dataset provides class labels
for healthy and glaucoma, OD and OC contours and CDR values for each image.
3.8.3 Prim
This dataset consists of 971 images (460 glaucoma, 511 healthy)

37
Chapter4: Implementation
4.1 Methodology
4.1.1 preprocessing dataset for segmentation
There isn’t dataset to segment optic disc so we annotation dataset by use LabelMe tool to
label (OD) manually and make mask for images in as show in figure 4.1

Figure 4.1 eye images and masks


4.1.2 OD Segmentation Using Unet Semantic Segmentation
Before segment we process images and mask using TensorFlow library:
1-resize (): resize the input image to the desired size
2-image. astype (): To avoid distorting image intensities.
3-The Adam optimization algorithm: is an extension to stochastic gradient descent that has
recently seen broader adoption for deep learning applications in computer vision and natural
language processing.

38
Adam is a popular algorithm in the field of deep learning because it achieves good results
fast.Then using 4 unet models are mobilenetV2, resnet50, vgg16, vgg19 and 8 epochs.
4.1.3 Classification using Pretrained Deep CNNs
First, we use augmentation to origa dataset because their images in glaucoma image is very
less so the image of glaucoma be 486, we use TensorFlow library use function:
▪ ImageDataGenerator (): Generate batches of tensor image data with real-time data
augmentation, it has parameters:
▪ rotation_range: Int. Degree range for random rotations.
▪ width_shift_range: fraction of total width.
▪ height_shift_range: fraction of total height.
▪ shear_range: Float. Shear Intensity (Shear angle in counter-clockwise direction in
degrees).
▪ zoom_range: Range for random zoom.
▪ channel_shift_range: Range for random channel shifts.
▪ horizontal_flip: Randomly flip inputs horizontally.
then preprocessing image for all dataset with functions in torch library:
1-. Resize (): resize the input image to the desired size
2-ToTensor (): coverts the input image to Torch tensor
3-Normalize (): This operation takes a tensor image and normalize it with mean and standard
deviation. It has 3 parameters: mean, std and inplace.
4-RandomRotation (): rotates the image randomly by an angle.
Seven pretrained models—AlexNet, GoogleNet, ResNet-50, MobileNetV2, DenseNet 121,
vgg16, vgg19 were evaluated. Each CNN model was trained using the SDG optimizer and
30 epochs.
4.2 Experiments
4.2.1 Segmentation

39
In segmentation for the tow dataset and models ResNet50, Vgg16, Vgg19, MobilenetV2
calculate the performance (accuracy, sensitivity, specificity, F1-score) of model by
equations:
• Accuracy:
Calculate from the following formula:
TP+TN
Accuracy = ( ) ∗ 100
TP+FN+TN+FP
• Recall:
Calculate from the following formula:
TP
Recall = ( ) ∗ 100
TP + FN
• precision:
Calculate from the following formula:
TP
precision = ( ) ∗ 100
TP + FP
• F1-score
Calculate from the following formula:
2𝑇𝑃
F1-score= ( ) ∗ 100
2𝑇𝑃+𝐹𝑃+𝐹𝑁
Where: TP: True Positive
TN: True Negative
FP: False Positive
FN: False Negative
using weighted average: it Calculate metrics for each label, and find their average weighted
by support (the number of true instances for each label).
4.2.1.1 Origa segment result:

40
Table 4.1 result segment origa dataset model
method Accuracy % F1-score % Recall % Precision %
MobilenetV2_unet 98.46 98.13 98.46 98.49
Resnet50_unet 96.87 95.38 99.09 96.90
Vgg16_unet 97.88 97.21 98.15 97.92
Vgg19_unet 98.88 98.70 98.92 98.90
4.2.1.2 Prim segment result:
Table 4.2 result segment prim dataset model
method Accuracy % F1-score % Recall % Precision %
MobilenetV2_unet 99.32 99.23 99.34 99.32
Resnet50_unet 98.86 98.63 98.94 98.87
Vgg16_unet 99.66 99.64 99.67 99.66
Vgg19_unet 99.07 98.89 99.12 99.09
4.2.2 classification
In classification for the three dataset and models ResNet50, Vgg16, Vgg19, Alexnet,
Googlenet, MobilenetV2, Densenet121 calculate the performance (accuracy, sensitivity,
specificity) of model by equations:
• Accuracy:
Calculate from the following formula:
𝑻𝑷+𝑻𝑵
𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = ( ) ∗ 100
𝑻𝑷+𝑻𝑵+𝑭𝑷+𝑭𝑵

• Sensitivity:
Calculate from the following formula:
𝑻𝑷
senesitivity = ( ) ∗ 100
𝑻𝑷 + 𝑭𝑵
• Specificity:
Calculate from the following formula:

41
𝑻𝑵
specificity = ( ) ∗ 100
𝑻𝑵 + 𝑭𝑷
Where: TP: True Positive
TN: True Negative
FP: False Positive
FN: False Negative
4.2.2.1 Acrima classification result
Table 4.3 result classify acrima dataset model
method Accuracy % Sensitivity % Specificity % Loss % Train accuracy %
ResNet50 97 100 100 13.55 96.36
Vgg16 85 100 66.67 45.72 88.87
Vgg19 81 87.5 50 48.83 88.87
Alexnet 90 75 100 24.07 93.72
Googlenet 95 100 100 9.64 94.94
MobilenetV2 98 100 80 27.84 100
Densenet121 95 100 75 8.64 94.94
4.2.2.2 Origa Classification Result
Table 4.4 result classify origa dataset model
method Accuracy % Sensitivity % Specificity % Loss % Train accuracy %
ResNet50 81 66.67 75 47.09 88.30
Vgg16 82 100 100 43.33 80.44
Vgg19 83 100 100 31.99 84.30
Alexnet 82 100 75 37.86 84.44
Googlenet 82 60 100 65.80 86.37
MobilenetV2 91 66.67 100 36.76 100
Densenet121 85 66.67 100 41.76 87.85

42
4.2.2.3 Prim classification
Table 4.5 result classify prim dataset model
method Accuracy % Sensitivity % Specificity % Loss % Train accuracy %
ResNet50 85 83.33 100 18.20 88.82
Vgg16 86 100 80 55.31 85.59
Vgg19 89 100 100 19.54 86.47
Alexnet 88 100 80 36.96 90.74
Googlenet 89 100 100 22.69 92.50
MobilenetV2 96 100 100 5.02 99.85
Densenet121 93 100 83.33 18.98 92.21
4.2.2.4 Interface for training using Acrima
In all Interfaces models Train
Valid

Reset50 model

Figure 4.2 loss and accuracy acrima dataset resnet50 model

43
Vgg16 model

Figure 4.3 loss and accuracy acrima dataset vgg16 model


Vgg19 model

Figure 4.4 loss and accuracy acrima dataset vgg19 model


Alexnet model

Figure 4.5 loss and accuracy acrima dataset alexnet model


44
Googlenet model

Figure 4.6 loss and accuracy acrima dataset googlenet model


MobilenetV2 model

Figure 4.7 loss and accuracy acrima dataset mobilenetV2 model


Densenet121 model

Figure 4.8 loss and accuracy acrima dataset dnsenet121 model


45
4.2.2.5 Interface for training using Origa
In all Interfaces models Train
Valid

ResNet50 model

Figure 4.9 loss and accuracy origa dataset resnet50 model


Vgg16 model

Figure 4.10 loss and accuracy origa dataset vgg16 model

46
Vgg19 model

Figure 4.11 loss and accuracy origa dataset vgg19 model


Alexnet model

Figure 4.12 loss and accuracy origa dataset alexnet model


Googlenet model

Figure 4.13 loss and accuracy origa dataset googlenet model

47
MobilenetV2 model

Figure 4.14 loss and accuracy origa dataset mobilenetv2 model


Densenet121 model

Figure 4.15 loss and accuracy origa dataset densenet121 model


4.2.2.6 Interface for training using Prim
For all Interface
Train
Valid

48
ResNet50 model

Figure 4.16 loss and accuracy prim dataset resnet50 model


Vgg16 model

Figure 4.17 loss and accuracy prim dataset vgg16 model


Vgg19 model

Figure 4.18 loss and accuracy prim datasetvgg19 model

49
Alexnet model

Figure 4.19 loss and accuracy prim dataset alexnet model


Googlenet model

Figure 4.20 loss and accuracy prim dataset googlenet model


MobilenetV2 model

Figure 4.21 loss and accuracy prim datasetmobilenetV2 model

50
Densenet121 model

Figure 5.21 loss and accuracy prim dataset densenet121 model


4.3 Discussion
In our research we used CNN models for segmentation then classification. In segmentation
result the best in origa dataset is achieved by vgg19_unet model and the best in prim dataset
is achieved by vgg16_unet model.in classification result the best in the three dataset is
mobilenetV2 model.

51
Chapter5: Conclusion and Future
5.1 Introduction
This chapter explains the summery of the work and the difficulties that we faced during our
work in addition to future studies.
5.2 Conclusion
In this study for early detection glaucoma using tow stage algorithms first segment OD using
four models mobilenetV2_unet,resnet50_unet,vgg16_unet and vgg19_unet which applied
in the two dataset ,were the best result in origa is vgg19_unet model with accuracy 98.88%
,Recall 99.09% and precision 96.90% and in prim is vgg16_unet model with accuracy
99.66% ,Recall 99.67% and precision 99.66% then second classification glaucoma
mobilenetV2, resnet50, densenet121, googlenet, alexnet, vgg16, vgg19 which applied in the
three dataset, were the best result in three dataset is mobilenetV2 in acrima dataset accuracy
98%, sensitivity100% and Specificity 80%,in origa dataset accuracy 91%, sensitivity
66.67% and Specificity 100% ,in prim dataset accuracy 96% ,sensitivity100%and Specificity
100%.
5.3 Difficulties
1- lack of a large dataset available for training and unavailability of ready-made dataset for
segment.
2- The difficulty of training, which took a lot of time and effort
3- Software and technical errors
3- Long training time and software and technical errors.
4- Poor of the Internet in the research and training process

5.4 future work


• Enhance the results by using large datasets.
• Develop the system to make it classify the glaucoma disease into mild, moderate and
severe.
52
Reference:
[1] Syna Sreng, Noppadol Maneerat, Kazuhiko Hamamoto, and Khin Yadanar Win. (2020)
Deep Learning for Optic Disc Segmentation and Glaucoma Diagnosis on Retinal Images.
[2] book: Understanding and Living with Glaucoma,Source:www.glaucoma.org
[3] url:https://cementanswers.com/how-does-spondylosis-occur
[4]url: www.glaucoma.org
[5] Cheng, J; Yin, F.; Wong, D.W.K.; Tao, D.; Liu, J. Sparse dissimilarity-constrained
coding for glaucoma screening. IEEE Trans. Biomed. Eng. 2015, 62, 1395–1403
[6] Chakravarty, A.; Sivaswamy, J. Glaucoma classification with a fusion of segmentation
and image-based features. In Proceedings of the 2016 IEEE 13th International Symposium
on Biomedical Imaging (ISBI), Prague, Czech Republic, 13–16 April 2016; pp. 689–692
[7] Karkuzhali, S.; Manimegalai, D. Computational intelligence-based decision support
system for glaucoma detection. Biomed. Res. 2017, 28, 976–1683.
[8] Quigly, h; Broman, A.T. The number of people with glaucoma worldwide in 2010 and
2020.Br. j. Ophthalmol.2006,90,262-267.
[9]URL: https://www.healthline.com/health/glaucoma#treatments
[10] Daniel Welfer, Cleyson M Kitamura, Melissa M Dal Pizzol.Segmentation of the optic
disk in color eye fundus images using an adaptive morphological approach
[11] HonggangYu,E.S. Barriga ,Carla Agurto .Automatic optic disc localization and
segmentation in retinal images by a line operator and level set.
[12] Artem Sevastopolsky,Stepan Drapak,Konstantin Kiselev,Blake Snyder,.Stack-U-Net:
refinement network for improved optic disc and cup image segmentation

[13] Guo, F.; Mai, Y.; Zhao, X.; Duan, X.; Fan, Z.; Zou, B. and Xie, B. Yanbao: A mobile
app using the measurement of clinical parameters for glaucoma screening. IEEE Access
2018, 6, 77414–77428.

53
[14] Yaroub Elloumi, Mohamed Akil, Nasser Kehtarnavaz.A Mobile Computer Aided
System for Optic Nerve Head Detection.
[15] Andres Diaz-Pinto, Sandra Morales, Valery Naranjo, Thomas Köhler. CNNs for
automatic glaucoma assessment using fundus images: An extensive validation.
[16] Yuan Gao, Xiaosheng Yu, Chengdong Wu, Wei Zhou. Accurate and Efficient
Segmentation of Optic Disc and Optic Cup in Retinal Images Integrating Multi-View
Information
[17] Lio li, Mai xu, Xiaofei wang, Lai jaing, Hanruo liu, Attention Based Glaucoma
Detection: A large-scale Database and CNN Model.
[18] Bajwa, M.N.; Malik, M.I.; Siddiqui, S.A.; Dengel, A.; Shafait, F.; Neumeier, W.;
Ahmed, S. Two-stage framework for optic disc localization and glaucoma classification in
retinal fundus images using deep learning.
[19]Mohamed, N.A.; Zulkifley, M.A.; Zaki, W.M.D.W.; Hussain, A. An automated
glaucoma screening system using cup-to-disc ratio via Simple Linear Iterative Clustering
superpixel approach.
[20] Orlando, J.I.; Prokofyeva, E.; del Fresno, M.; Blaschko, M.B. Convolutional neural
network transfer for automated glaucoma identification. SPIE: International Society for
Optics and Photonics.
[21] Manal Alghamdi, and Mohamed Abdel-Mottaleb, A Comparative Study of Deep
Learning Models for Diagnosing Glaucoma from Fundus Images.
[22] Xi XuYu Guan, Jianqiang Li, Zerui Ma, Li Zhang, Li Li. Automatic glaucoma detection
based on transfer induced attention network
[22] Juneja, M.; Singh, S.; Agarwal, N.; Bali, S.; Gupta, S.; Thakur, N.; Jindal, P. Automated
detection of Glaucoma using deep learning convolution network (G-net).
[23] Marina Chatterjee. Deep Learning Tutorial: What it Means and what’s the role of Deep
Learning. [ https://www.mygreatlearning.com/].

54
[24] Author: Jason Brownlee. Deep Learning with Python Develop Deep Learning Models
On Thea no and TensorFlow Using Keras
[25] Ronneberger, O., Fischer, P., & Brox, T. (2015, October). U-net: Convolutional
networks for biomedical image segmentation. In International Conference on Medical image
computing and computer-assisted intervention (pp. 234-241). Springer, Cham.
[26] Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted
residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer vision
and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–45206.
[27] Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected
convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708.
[28] He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In
Proceedings of the IEEE Conference on Computer vision and Pattern Recognition, Las
Vegas, NV, USA, 27–30 June 2016; pp. 770–778.
[29] Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep
convolutional neural networks. In Proceedings of the Advances in Neural Information
Processing Systems, San Francisco, CA, USA, 3–8 December 2012; pp. 1097–1105
[30] Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.;
Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June
2015; pp. 1–9
[31] Karen Simonyan, Andrew Zisserman. Very Deep Convolutional Networks for Large-
Scale Image Recognition.
[32]url: [ https://keras.io/api/applications/]
[33] Shanthi Rajaji, S. Prabakaran. Fundus Abnormalities and Image Acquisition
Techniques – A Survey

55
[34]url: [ https://ophthalmology.med.ubc.ca/patient-care/ophthalmic-photography/color-
fundus-photography/]
[35] Diaz-Pinto, A.; Morales, S.; Naranjo, V.; Köhler, T.; Mossi, J.M.; Navea, A. CNNs for
automatic glaucoma assessment using fundus images: An extensive validation. Biomed.
Eng. Online 2019, 18, 29.
[36] Zhang, Z.; Yin, F.S.; Liu, J.; Wong, W.K.; Tan, N.M.; Lee, B.H.; Cheng, J.; Wong,
T.Y. Origa-light: An online retinal fundus image database for glaucoma analysis and
research. In Proceedings of the 2010 Annual International Conference of the IEEE
Engineering in Medicine and Biology, Buenos Aires, Argentina, 31 August–4 September
2010; pp. 3065–3068

56
APPENDEX
Segment code
import numpy as np
import tensorflow as tf
from tensorflow.keras import backend as K

def iou(y_true, y_pred):


def f(y_true, y_pred):
intersection = (y_true * y_pred).sum()
union = y_true.sum() + y_pred.sum() - intersection
x = (intersection + 1e-15) / (union + 1e-15)
x = x.astype(np.float32)
return x
return tf.numpy_function(f, [y_true, y_pred], tf.float32)

smooth = 1e-15
def dice_coef(y_true, y_pred):
y_true = tf.keras.layers.Flatten()(y_true)
y_pred = tf.keras.layers.Flatten()(y_pred)
intersection = tf.reduce_sum(y_true * y_pred)
return (2. * intersection + smooth) / (tf.reduce_sum(y_true) + tf.reduce_sum(y_pred) +
smooth)

def dice_loss(y_true, y_pred):


return 1.0 - dice_coef(y_true, y_pred)

from tensorflow.keras.layers import Conv2D, BatchNormalization, Activation, MaxPool2D,


Conv2DTranspose, Concatenate, Input
from tensorflow.keras.models import Model
from tensorflow.keras.applications import ResNet50

def conv_block(input, num_filters):


x = Conv2D(num_filters, 3, padding="same")(input)
x = BatchNormalization()(x)
x = Activation("relu")(x)

x = Conv2D(num_filters, 3, padding="same")(x)
x = BatchNormalization()(x)
x = Activation("relu")(x)

57
return x

def decoder_block(input, skip_features, num_filters):


x = Conv2DTranspose(num_filters, (2, 2), strides=2, padding="same")(input)
x = Concatenate()([x, skip_features])
x = conv_block(x, num_filters)
return x

def build_resnet50_unet(input_shape):
""" Input """
inputs = Input(input_shape)

""" Pre-trained ResNet50 Model """


resnet50 = ResNet50(include_top=False, weights="imagenet", input_tensor=inputs)

""" Encoder """


s1 = resnet50.get_layer("input_1").output ## (512 x 512)
s2 = resnet50.get_layer("conv1_relu").output ## (256 x 256)
s3 = resnet50.get_layer("conv2_block3_out").output ## (128 x 128)
s4 = resnet50.get_layer("conv3_block4_out").output ## (64 x 64)

""" Bridge """


b1 = resnet50.get_layer("conv4_block6_out").output ## (32 x 32)

""" Decoder """


d1 = decoder_block(b1, s4, 512) ## (64 x 64)
d2 = decoder_block(d1, s3, 256) ## (128 x 128)
d3 = decoder_block(d2, s2, 128) ## (256 x 256)
d4 = decoder_block(d3, s1, 64) ## (512 x 512)

""" Output """


outputs = Conv2D(1, 1, padding="same", activation="sigmoid")(d4)

model = Model(inputs, outputs, name="ResNet50_U-Net")


return model

if __name__ == "__main__":
input_shape = (H,W, 3)
model = build_resnet50_unet(input_shape)
model.summary()
58
import os
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "2"
import numpy as np
import cv2
from glob import glob
from sklearn.utils import shuffle
from sklearn.model_selection import train_test_split
import tensorflow as tf
from tensorflow.keras.callbacks import ModelCheckpoint, CSVLogger,
ReduceLROnPlateau, EarlyStopping, TensorBoard
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.metrics import Recall, Precision

""" Global Parameters """


H = 512
W=512
def shuffling(x, y):
x, y = shuffle(x, y, random_state=42)
return x, y

def load_data(path, split=0.2):


images = sorted(glob(os.path.join(path, "g", "*.jpg")))
masks = sorted(glob(os.path.join(path, "mask", "*.png")))

split_size = int(len(images) * split)

train_x, valid_x = train_test_split(images, test_size=split_size, random_state=42)


train_y, valid_y = train_test_split(masks, test_size=split_size, random_state=42)

train_x, test_x = train_test_split(train_x, test_size=split_size, random_state=42)


train_y, test_y = train_test_split(train_y, test_size=split_size, random_state=42)

return (train_x, train_y), (valid_x, valid_y), (test_x, test_y)

def read_image(path):
path = path.decode()
x = cv2.imread(path, cv2.IMREAD_COLOR)
x = cv2.resize(x, (W, H))
x = x/255.0
x = x.astype(np.float32)
59
return x

def read_mask(path):
path = path.decode()
x = cv2.imread(path, cv2.IMREAD_GRAYSCALE)
x = cv2.resize(x, (W, H))
x = x/255.0
x = x.astype(np.float32)
x = np.expand_dims(x, axis=-1)
return x

def tf_parse(x, y):


def _parse(x, y):
x = read_image(x)
y = read_mask(y)
return x, y

x, y = tf.numpy_function(_parse, [x, y], [tf.float32, tf.float32])


x.set_shape([H, W, 3])
y.set_shape([H, W, 1])
return x, y

def tf_dataset(X, Y, batch_size=4):


dataset = tf.data.Dataset.from_tensor_slices((X, Y))
dataset = dataset.map(tf_parse)
dataset = dataset.batch(batch_size)
dataset = dataset.prefetch(10)
return dataset

if __name__ == "__main__":
""" Seeding """
np.random.seed(42)
tf.random.set_seed(42)

""" Hyperparameters """


batch_size = 4
lr = 1e-4
num_epochs =8

60
model_path = os.path.join("/content/drive/MyDrive/dataset/origa/segmentation",
"resnet50.h5")
csv_path = os.path.join("/content/drive/MyDrive/dataset/origa/segmentation",
"data_resnet50.csv")

""" Dataset """


dataset_path = "/content/drive/MyDrive/dataset/origa/segmentation/data/"
(train_x, train_y), (valid_x, valid_y), (test_x, test_y) = load_data(dataset_path)
train_x, train_y = shuffling(train_x, train_y)

print(f"Train: {len(train_x)} - {len(train_y)}")


print(f"Valid: {len(valid_x)} - {len(valid_y)}")
print(f"Test: {len(test_x)} - {len(test_y)}")

train_dataset = tf_dataset(train_x, train_y, batch_size)


valid_dataset = tf_dataset(valid_x, valid_y, batch_size)

train_steps = len(train_dataset)
valid_steps = len(valid_dataset)

""" Model """


metrics = [dice_coef, iou, Recall(), Precision()]
model.compile(loss=dice_loss, optimizer=Adam(lr), metrics=metrics)

callbacks = [
ModelCheckpoint(model_path, verbose=1, save_best_only=True),
ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=5, min_lr=1e-7,
verbose=1),
CSVLogger(csv_path),
TensorBoard(),
EarlyStopping(monitor='val_loss', patience=20, restore_best_weights=False)
]

model.fit(
train_dataset,
epochs=num_epochs,
validation_data=valid_dataset,
steps_per_epoch=train_steps,
validation_steps=valid_steps,
callbacks=callbacks
61
)
import os
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "2"
import numpy as np
import cv2
import pandas as pd
from glob import glob
from tqdm import tqdm
import tensorflow as tf
from tensorflow.keras.utils import CustomObjectScope
from sklearn.metrics import accuracy_score, f1_score, jaccard_score, precision_score,
recall_score

def read_image(path):
x = cv2.imread(path, cv2.IMREAD_COLOR)
x = cv2.resize(x, (H, W))
ori_x = x
x = x/255.0
x = x.astype(np.float32)
x = np.expand_dims(x, axis=0)
return ori_x, x

def read_mask(path):
x = cv2.imread(path, cv2.IMREAD_GRAYSCALE)
x = cv2.resize(x, (H, W))
ori_x = x
x = x/np.max(x)
x = x.astype(np.int32)
return ori_x, x

def save_result(ori_x, ori_y, y_pred, save_image_path):


line = np.ones((H, 10, 3)) * 255.0

ori_y = np.expand_dims(ori_y, axis=-1) ## (256, 256, 1)


ori_y = np.concatenate([ori_y, ori_y, ori_y], axis=-1) ## (256, 256, 3)

y_pred = np.expand_dims(y_pred, axis=-1) ## (256, 256, 1)


y_pred = np.concatenate([y_pred, y_pred, y_pred], axis=-1) * 255.0 ## (256, 256, 3)

cat_images = np.concatenate([ori_x, line, ori_y, line, y_pred], axis=1)


62
cv2.imwrite(save_image_path, cat_images)

if __name__ == "__main__":
""" Seeding """
np.random.seed(42)
tf.random.set_seed(42)

""" Loading the model """


with CustomObjectScope({"iou": iou, "dice_coef": dice_coef, "dice_loss": dice_loss}):
model =
tf.keras.models.load_model("/content/drive/MyDrive/dataset/origa/segmentation/resnet50.
h5")

(train_x, train_y), (valid_x, valid_y), (test_x, test_y) = load_data(dataset_path)

""" Predict the mask and calculate the metrics values """
SCORE = []
for x, y in tqdm(zip(test_x, test_y), total=len(test_x)):

name = x.split("/")[-1]

""" Reading the image and mask. """


ori_x, x = read_image(x)
ori_y, y = read_mask(y)

""" Predict the mask """


y_pred = model.predict(x)[0]
y_pred = np.squeeze(y_pred, axis=-1)
y_pred = y_pred.astype(np.int32)

# save_image_path = f"/content/drive/MyDrive/result/{name}"
# save_result(ori_x, ori_y, y_pred, save_image_path)

""" Flattening the numpy arrays. """


y = y.flatten()
y_pred = y_pred.flatten()

""" Calculating metrics values """


acc_value = accuracy_score(y, y_pred)
63
f1_value = f1_score(y, y_pred, labels=[0, 1], average="weighted")
jac_value = jaccard_score(y, y_pred, labels=np.unique(y_pred), average="weighted")
recall_value = recall_score(y, y_pred, labels=np.unique(y_pred), average="weighted")
precision_value = precision_score(y, y_pred, labels=np.unique(y_pred),
average="weighted")
SCORE.append([name, acc_value, f1_value, jac_value, recall_value, precision_value])

""" Metrics values """


score = [s[1:]for s in SCORE]
score = np.mean(score, axis=0)
print(f"Accuracy: {score[0]:0.5f}")
print(f"F1: {score[1]:0.5f}")
print(f"Jaccard: {score[2]:0.5f}")
print(f"Recall: {score[3]:0.5f}")
print(f"Precision: {score[4]:0.5f}")

""" Saving all the results """


df = pd.DataFrame(SCORE, columns=["Image", "Accuracy", "F1", "Jaccard", "Recall",
"Precision"])
df.to_csv("/content/drive/MyDrive/dataset/origa/segmentation/score_resnet.csv")
Classification code
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import torch
from torch import nn, optim
from torchvision import transforms, datasets, models
from torch.utils.data.sampler import SubsetRandomSampler
import os
from PIL import Image
print(os.listdir("/content/drive/MyDrive/dataset/acrima/acrima"))
train_transforms = transforms.Compose([transforms.Resize(size=(224,224)),
transforms.ToTensor(),
transforms.Normalize([0.485,0.456,0.406],
[0.229,0.334,0.225] ),
transforms.RandomRotation(30)]
)
validation_transforms = transforms.Compose([transforms.Resize(size=(224,224)),
transforms.ToTensor(),
transforms.Normalize([0.485,0.456,0.406],
64
[0.229,0.334,0.225] ),
transforms.RandomRotation(30)])
test_transforms = transforms.Compose([transforms.Resize(size=(224,224)),
transforms.ToTensor(),
transforms.Normalize([0.485,0.456,0.406],
[0.229,0.334,0.225] ),
transforms.RandomRotation(30) ])
img_dir = '/content/drive/MyDrive/dataset/acrima/acrima'
train_data = datasets.ImageFolder(img_dir, transform=train_transforms)
# number of subprocesses to use for data loading
num_workers = 0
# percentage of training set to use as validation
valid_size = 0.2
test_size = 0.1

# convert data to a normalized torch.FloatTensor


transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5))
])
# obtain training indices that will be used for validation
num_train = len(train_data)
indices = list(range(num_train))
np.random.shuffle(indices)
valid_split = int(np.floor((valid_size) * num_train))
test_split = int(np.floor((valid_size + test_size) * num_train))
valid_idx, test_idx, train_idx = indices[:valid_split], indices[valid_split:test_split],
indices[test_split:]

print(len(valid_idx), len(test_idx), len(train_idx))

# define samplers for obtaining training and validation batches


train_sampler = SubsetRandomSampler(train_idx)
valid_sampler = SubsetRandomSampler(valid_idx)
test_sampler = SubsetRandomSampler(test_idx)

# prepare data loaders (combine dataset and sampler)


train_loader = torch.utils.data.DataLoader(train_data, batch_size=10,
sampler=train_sampler, num_workers=num_workers)
valid_loader = torch.utils.data.DataLoader(train_data, batch_size=10,
65
sampler=valid_sampler, num_workers=num_workers)
test_loader = torch.utils.data.DataLoader(train_data, batch_size=10,
sampler=test_sampler, num_workers=num_workers)
global val_loss_plot
global train_loss_plot
val_loss_plot=[]
train_loss_plot=[]
epoch_plot=[]
test_plot=[]
train_plot=[]
nb_classes=2
def confusion_Training_matrix(preds,labels):
preds=torch.argmax(preds,1)
conf_matrix=torch.zeros(nb_classes,nb_classes)
for p,t in zip(preds,labels):
conf_matrix+=1
print('Confusi on_Training_matrix\n',conf_matrix)
TP=conf_matrix.diag()
for c in range(nb_classes):
idx=torch.ones(nb_classes).byte()
idx[c]=0
TN=conf_matrix[idx.nonzero()[:,None],idx.nonzero()].sum()
FP=conf_matrix[c,idx].sum()
FN=conf_matrix[idx,c].sum()
senitivity=(TP[c]/(TP[c]+FN))
specificity=(TN/(TN+FP))
print('class{}\n TP{},TN{},FP{},FN{}'.format(c,TP[c],TN,FP,FN))
print('senitivity={}'.format(senitivity))
print('specificity={}'.format(specificity))
def confusion_testing_matrix(preds,labels):
preds=torch.argmax(preds,1)
conf_matrix=torch.zeros(nb_classes,nb_classes)
for p,t in zip(preds,labels):
conf_matrix+=1
print('Confusion_testing_matrix\n',conf_matrix)
TP=conf_matrix.diag()
for c in range(nb_classes):
idx=torch.ones(nb_classes).byte()
idx[c]=0
TN=conf_matrix[idx.nonzero()[:,None],idx.nonzero()].sum()
66
FP=conf_matrix[c,idx].sum()
FN=conf_matrix[idx,c].sum()
senitivity=(TP[c]/(TP[c]+FN))
specificity=(TN/(TN+FP))
print('class{}\n TP{},TN{},FP{},FN{}'.format(c,TP[c],TN,FP,FN))
print('senitivity={}'.format(senitivity))
print('specificity={}'.format(specificity))
def train_accuracy(model, criterion, use_cuda , train_loss):
# monitor test loss and accuracy
#train_loss = 0.
correct = 0.
total = 0.

for batch_idx, (data, target) in enumerate(train_loader):


# move to GPU
if use_cuda:
data, target = data.cuda(), target.cuda()
# forward pass: compute predicted outputs by passing inputs to the model
output = model(data)
# calculate the loss
loss = criterion(output, target)
pred = output.data.max(1, keepdim=True)[1]
# compare predictions to true label
correct += np.sum(np.squeeze(pred.eq(target.data.view_as(pred))).cpu().numpy())
total += data.size(0)
train_accuracy=100. * correct / total
print('\ntrain Accuracy: %2.2f%% (%2d/%2d)' % (
100. * correct / total, correct, total))
return train_accuracy

def valid_accuracy(model, criterion, use_cuda):


# monitor test loss and accuracy
test_loss = 0.
correct = 0.
total = 0.

for batch_idx, (data, target) in enumerate(valid_loader):


# move to GPU
if use_cuda:
data, target = data.cuda(), target.cuda()
67
# forward pass: compute predicted outputs by passing inputs to the model
output = model(data)
# calculate the loss
loss = criterion(output, target)
# update average test loss
test_loss = test_loss + ((1 / (batch_idx + 1)) * (loss.data - test_loss))
# convert output probabilities to predicted class
pred = output.data.max(1, keepdim=True)[1]

# compare predictions to true label


correct += np.sum(np.squeeze(pred.eq(target.data.view_as(pred))).cpu().numpy())
total += data.size(0)
test_accuracy=100. * correct / total
print('\n validation Accuracy: %2.2f%% (%2d/%2d)' % (
100. * correct / total, correct, total))
return test_accuracy
def test(model, criterion, use_cuda):
# monitor test loss and accuracy
test_loss = 0.
correct = 0.
total = 0.

for batch_idx, (data, target) in enumerate(test_loader):


# move to GPU
if use_cuda:
data, target = data.cuda(), target.cuda()
# forward pass: compute predicted outputs by passing inputs to the model
output = model(data)
# calculate the loss
pred_test=torch.argmax(output,1)
conf_matrix=torch.zeros(nb_classes,nb_classes)
for p,t in zip(pred_test,target):
conf_matrix[p,t]+=1
loss = criterion(output, target)
test_loss=test_loss+((1/(batch_idx +1))*(loss.data-test_loss))
pred = output.data.max(1, keepdim=True)[1]
correct+=np.sum(np.squeeze(pred.eq(target.data.view_as(pred))).cpu().numpy())
total+=data.size(0)
test_accuracy=100.*correct/total
print('\n testAccuracy: %2d%% (%2d/%2d)' % (
68
100. * correct / total, correct, total))
print('test loss:{:.6f}\n'.format(test_loss))
print('Confusion_matrix_test\n',conf_matrix)
TP=conf_matrix.diag()
for c in range(nb_classes):
idx=torch.ones(nb_classes).byte()
idx[c]=0
TN=conf_matrix[idx.nonzero()[:,None],idx.nonzero()].sum()
FP=conf_matrix[c,idx].sum()
FN=conf_matrix[idx,c].sum()

senitivity=(TP[c]/(TP[c]+FN))
specificity=(TN/(TN+FP))
print('class{}\n TP{},TN{},FP{},FN{}'.format(c,TP[c],TN,FP,FN))
print('senitivity={}'.format(senitivity))
print('specificity={}'.format(specificity))

return test_accuracy

def train(n_epochs, model, optimizer, criterion, use_cuda, save_path):

# initialize tracker for minimum validation loss


valid_loss_min = np.Inf
for epoch in range(1, n_epochs + 1):
# initialize variables to monitor training and validation loss
train_loss = 0.0
valid_loss = 0.0

###################
# train the model #
###################
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
# move to GPU
if use_cuda:
data, target = data.cuda(), target.cuda()

# initialize weights to zero


optimizer.zero_grad()
output = model(data)
69
# calculate loss
loss = criterion(output, target)

# back prop
loss.backward()
# grad
optimizer.step()

train_loss = train_loss + ((1 / (batch_idx + 1)) * (loss.data - train_loss))

if batch_idx % 100 == 0:
print('Epoch %d, Batch %d loss: %.6f' %
(epoch, batch_idx + 1, train_loss))

######################
# validate the model #
######################
model.eval()
for batch_idx, (data, target) in enumerate(valid_loader):
# move to GPU
if use_cuda:
data, target = data.cuda(), target.cuda()
## update the average validation loss
output = model(data)
confusion_Training_matrix(output,target)
loss = criterion(output, target)
valid_loss = valid_loss + ((1 / (batch_idx + 1)) * (loss.data - valid_loss))

# print training/validation statistics


print('Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}'.format(
epoch,
train_loss,
valid_loss
))
#test(model, criterion, use_cuda)
val_loss_plot.append(valid_loss)
train_loss_plot.append(train_loss)
test_plot.append(test(model, criterion, use_cuda))
train_plot.append(train_accuracy(model, criterion, use_cuda, train_loss))
70
epoch_plot.append(epoch)
if valid_loss < valid_loss_min:
torch.save(model.state_dict(), save_path)
print('Validation loss decreased ({:.6f} --> {:.6f}). Saving model ...'.format(
valid_loss_min,
valid_loss))
valid_loss_min = valid_loss

val_loss_plotarr=np.array(val_loss_plot)
train_loss_plotarr=np.array(train_loss_plot)
test_plotarr=np.array(test_plot)
train_plotarr=np.array(train_plot)
plt.plot(train_loss_plotarr)
plt.plot(val_loss_plotarr)
plt.title('training and validation loss')
plt.xlabel('epoch')
plt.ylabel('training and validation loss')
plt.show()
plt.plot(train_plotarr)
plt.plot(test_plotarr)
plt.title('training and validation accuracy')
plt.xlabel('epoch')
plt.ylabel('training and validation accuracy')
plt.show()
return model

def load_input_image(img_path):
image=Image.open(img_path)
prediction_transform=transforms.Compose([transforms.Resize(size=(224,224)),
transforms.ToTensor(),
transforms.Normalize([0.485,0.456,0.406],
[0.229,0.334,0.225])])
image=prediction_transform(image)[:3,:,:].unsqeeze(0)
return image

def predict_glaucoma(model,class_names,img_path):
img=load_input_image(img_path)
model=model.cpu()
model.eval()
idx=torch.argmax(model(img))
71
return class_names[idx]
model = models.resnet50(pretrained=True)
#12 30
for param in model.parameters():
param.requires_grad = False

model.fc = nn.Linear(model.fc.in_features, 2, bias=True)

fc_parameters = model.fc.parameters()
for param in fc_parameters:
param.requires_grad = True

model
use_cuda = torch.cuda.is_available()
if use_cuda:
model = model.cuda()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.fc.parameters(), lr=0.001, momentum=0.9)
train(30, model, optimizer, criterion, use_cuda,
'/content/drive/MyDrive/dataset/acrima/classification/resnet50.pt')
model.load_state_dict(torch.load('/content/drive/MyDrive/dataset/acrima/classification/res
net50.pt'))
test(model, criterion, use_cuda)

72

You might also like