0% found this document useful (0 votes)
8 views8 pages

PDF Thyroid

Ggggg

Uploaded by

varunsarkar2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views8 pages

PDF Thyroid

Ggggg

Uploaded by

varunsarkar2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Vol 13, Issue 06, JUNE/ 2022

ISSN NO: 0377-9254

THYROID DISEASE DETECTION USING MACHINE


LEARNING TECHNIQUES
1Mr.M.Ram Chandra, Assistant Professor, BTech, Department of CSE, [email protected]

2Nunchu Ayushi Yadav, BTech, Department of CSE, [email protected]

3Gadey Yeshwanth Reddy, BTech, Department of CSE, [email protected]

4S.Sai Abhinav, BTech, Department of CSE, [email protected]

ABSTRACT: Thyroid disorder leading cause of analysis the disease. Over a period of time, the
medical diagnosis and prediction development, machine learning algorithms have started playing a
which medical science is a complicated axiom. The crucial role in resolving the complex and non-linear
thyroid gland is one of our body's main organs. problems in the developing model. In any disease
Thyroid hormone secretions are responsible for prediction models are used to override the features
regulating metabolism. Hyperthyroidism and that can be selected from different datasets which can
hypothyroidism are the two prominent thyroid be used in classification in healthy patient as accurate
disorders that produce thyroid hormones for the as possible. If this is not done, misclassification can
control of body metabolism. Machine learning is lead to a healthy patient getting unnecessary
critical in the disease prediction process and in the treatment. The Thyroid gland is an endocrine gland
study and classification models used for thyroid present in the human neck beneath the Adam’s apple
disease on the basis of data obtained from hospital which help in secretion of thyroid hormone that
datasets. A decent knowledge base must be influence the rate of metabolism and protein
ensured, built, and used as a hybrid model to solve synthesis. The thyroid hormones are useful in
dynamic learning tasks like medical diagnosis and counting how briskly the heart beats and how fast we
prediction of tasks. Basic techniques of machine burn calories. The thyroid secretes two types of
learning are used for the identification and active hormones called levothyroxine (T4) and
inhibition of the thyroid. The data set is trained by triiodothyronine (T3). These hormones help in
using algorithms such as Random Forest regulating the body temperature. These also aid in
Classifier, XG Boost, KNN Classifier, Logistic energy-bearing and transmission in every part of the
Regression. The Random Forest Classifier is used body and decisive in protein management. Iodine is
to predict the Thyroid of the patient. The dataset considered as the main building block of the thyroid
is trained by the algorithm to get the accuracy and gland. It’s prostrated in few specific problems.
data cleaning is done to improve the accuracy. If Undersupply of these hormones can lead to
the patient has a risk of getting thyroid our system hyperthyroidism. There are many originations related
has to give suggestions like recommending Foods to hyperthyroidism and underactive thyroids. There
to eat and Foods to avoid, medication etc. are various kinds of medications like thyroid surgery
is liable to ionizing radiation, continual tenderness of
Keywords- Random Forest Classifier, XG Boost, the thyroid, deficiency of iodine and lack of enzyme
KNN Classifier, Logistic Regression. to make thyroid hormones.

1. INTRODUCTION

The evolvement computational biology is used in


healthcare industry. It allows collection of stored
patient data for the prediction of the disease. There
are prediction algorithms which are available for the
diagnosis of the disease at early stages. The medical
information systems are rich of datasets but there are
only few intelligent systems which can easily

www.jespublication.com Page No:237


Vol 13, Issue 06, JUNE/ 2022

ISSN NO: 0377-9254


Thyroid disease is a major cause of formation in
medical diagnosis and in the prediction, onset to
which it is a difficult axiom in the medical research.
Thyroid gland is one of the most important organs in
our body. The secretions of thyroid hormones are
culpable in controlling the metabolism.
Hyperthyroidism and hypothyroidism are one of the
two common diseases of the thyroid that releases
thyroid hormones in regulating the rate of body’s
metabolism. Data cleansing techniques were applied
to make the data primitive enough for performing
analytics to show the risk of patients obtaining
thyroid. The machine learning plays a decisive role in
the process of disease prediction and this paper
handles the analysis and classification models that are
being used in the thyroid disease based on the
information gathered from the dataset taken from
UCI machine learning repository. It is important to
Fig.1: Machine learning techniques for Thyroid ensure a decent knowledge base that can be
detection entrenched and used as a hybrid model in solving
complex learning task, such as in medical diagnosis
Dosing the levothyroxine is not an easy task since the and prognostic tasks. In this paper, we also proposed
treatments can vary greatly and strongly depend on different machine learning techniques and diagnosis
the amount of residual thyroid function of the patient, for the prevention of thyroid. Machine Learning
the body weight, and thyroid-stimulating hormone Algorithms, support vector machine (SVM), K-NN,
levels [12]. For this reason, the dose of levothyroxine Decision Trees were used to predict the estimated
should be administered over the patients lifetime and risk on a patient’s chance of obtaining thyroid
adjusted based on the physiological changes (a.e., disease.
weight or hormonal changes) throughout life and
concomitant medical conditions (a.e., pregnant 2.2 Comparison Study of Radiomics and Deep-
women). This requires continuous monitoring of the Learning Based Methods for Thyroid Nodules
patients status based on clinical and laboratory Classification using Ultrasound Images:
assessment and appropriate adjustment of their
levothyroxine therapy. Therefore, the prediction of Thyroid nodules have a high prevalence and a small
treatment trends could represent an useful support to percentage is malignant. Many non-invasive methods
the endocrinologist and can improve the quality of have been developed with the help of the Internet of
life of the patient. The use of machine learning Things to improve the detection rate of malignant
techniques can effectively support endocrinologists nodules. These methods can be roughly categorized
while monitoring patients. Recent studies have been into two classes: radiomics based and deep learning
successfully applied to classify and predict and, for based approaches. In general, convolutional neural
this reason, have been widely used in the diagnosis of networks based deep learning methods have achieved
many different problems such as heart disease [17], promising performance in many medical image
diabetes [14] and Parkinson’s disease [3, 4], reducing analysis and classification applications; however, no
the time and costs required for the treatment of a existing comparison has been done between
patient. This study proposes an approach based on radiomics based and deep learning based approaches.
machine learning techniques exploiting hormonal Therefore, in this paper, we aim to compare the
parameters related to the thyroid and other clinical performance of radiomics and deep learning based
data concerning the patient, to predict if the patient’s methods for the classification of thyroid nodules from
treatment needs to be increased, decreased, or remain ultrasound images. On one hand, we developed a
unchanged. radiomics based method, which consists of extracting
high throughput 302-dimensional statistical features
2. LITERATURE REVIEW from pre-processed images. Then dimension
reduction was performed using mutual information
2.1 Interactive Thyroid Disease Prediction System and linear discriminant analysis respectively to
using Machine Learning Techniques: achieve the final classification. On the other hand, a
deep learning based method was also developed and

www.jespublication.com Page No:238


Vol 13, Issue 06, JUNE/ 2022

ISSN NO: 0377-9254


tested by pre-training a VGG16 model with fine- use Feedforward neural network to classify the region
tuning. Ultrasound images including 3120 images using feature extraction and then segment it.
(1841 benign nodules and 1393 malignant nodules) Experiment and results are shown.
from 1040 cases were retrospectively collected. The
dataset was divided into 80% training and 20% 2.5 Diagnosis of thyroid disease using artificial
testing data. The highest accuracies yielded on the neural network methods:
testing data for radiomics and deep learning based
methods were 66.81% and 74.69%, respectively. A Proper interpretation of the thyroid gland functional
comparison result demonstrated that the deep data is an important issue on the diagnosis of thyroid
learning based method can achieve a better disease. The primary role of the thyroid gland is to
performance than using radiomics.. help regulation of the body's metabolism. Thyroid
hormone produced by the thyroid gland provides this.
2.3 Prediction of Thyroid Disease Using Machine Production of too little thyroid hormone (hypo-
Learning Techniques: thyroidism) or production of too much thyroid
hormone (hyper-thyroidism) definites the type of
The paper presents several methods of feature thyroid disease. In this work, various neural network
selection and classification for thyroid disease methods have been used to help diagnosis of thyroid
diagnosis, related to the machine learning disease.
classification problems. Two common diseases of the
thyroid gland, which releases thyroid hormones for 2.6 A novel hybrid method based on artificial
regulating the rate of body's metabolism, are immune recognition system (AIRS) with fuzzy
hyperthyroidism and hypothyroidism. Classification weighted preprocessing for thyroid disease
of these thyroid diseases is a considerable task. An diagnosis:
important problem of pattern recognition is to extract
or select feature set, which is included in the pre- Proper interpretation of the thyroid gland functional
processing stage. The proposed methods of feature data is an important issue in the diagnosis of thyroid
selection are Univariate Selection, Recursive Feature disease. The primary role of the thyroid gland is to
Elimination and Tree Based Feature Selection. Three help regulation of the body’s metabolism. Thyroid
classification techniques have been used namely hormone produced by the thyroid gland provides this.
Naïve Bayes, Support vector machines and Random Production of too little thyroid hormone
Forest. Results shows that the Support Vector (hypothyroidism) or production of too much thyroid
Machines are the most accurate technique and hence hormone (hyperthyroidism) defines the type of
this was used as a classifier to separate the symptoms thyroid disease. Artificial immune systems (AISs) is
of thyroid diseases into 4 classes namely a new but effective branch of artificial intelligence.
Hypothyroid, Hyperthyroid, Sick Euthyroid and Among the systems proposed in this field so far,
Euthyroid (negative). artificial immune recognition system (AIRS), which
was proposed by A. Watkins, has shown an effective
2.4 Segmentation of Thyroid Gland in Ultrasound and intriguing performance on the problems it was
image using Neural Network: applied. This study aims at diagnosing thyroid
disease with a new hybrid machine learning method
The thyroid gland is highly vascular organ, and lies in including this classification system. By hybridizing
the anterior part of the neck just below the thyroid AIRS with a developed Fuzzy weighted pre-
cartilage. Ultrasound imaging is most commonly processing, a method is obtained to solve this
used to detect and classify abnormalities of the diagnosis problem via classifying. The robustness of
thyroid gland. Other modalities (CT/MRI) are also this method with regard to sampling variations is
used. There is a challenge to segment ultrasound examined using a cross-validation method. We used
medical image which is often blurred and consists of thyroid disease dataset which is taken from UCI
noise as other modalities like CT contains ionizing machine learning respiratory. We obtained a
radiations and expensive. Thus, there is a need to classification accuracy of 85%, which is the highest
apply a method to automated segment well the one reached so far. The classification accuracy was
objects for future analysis without any assumptions obtained via a 10-fold cross-validation.
about the object's topology are made. Various
methods or techniques are used for automatic 3. IMPLEMENTATION
segmention of thyroid gand but the application of
neural network in image processing provides a better In the prediction process, machine learning plays a
solution to segmentation problem. In this paper we key role, and paper research and the classifications of

www.jespublication.com Page No:239


Vol 13, Issue 06, JUNE/ 2022

ISSN NO: 0377-9254


models used in thyroid disease detection. The data set fetch the result. For first part, thyroid data set is taken
is taken from Kaggle Website. We also proposed from UCI repository. The dataset of hyperthyroidism
different approaches for machine learning and thyroid and hypothyroidism is used where hyper and hypo
diagnosis. Machine Learning Algorithms: Support are the two labels. These data set need to be checked
Vector Machine, XG Boost, Random forest, and K- before feeding it to training. There may be presence
NN classifier were used to calculate an estimated of null data or unnecessary data, this should undergo
probability of a patient having thyroid disease. We data cleaning to remove such data. Cleaned data is
will also suggest what foods to eat and foods to used as training data and test data, which is fed as
avoid. We will use the Random Forest Classifier input to the algorithm. The algorithm extracts the
algorithm for the front end. The back end and Front features from different dataset to classify the data
end is connected by Flask. Data cleaning is done to according to the labels. To check the accuracy of the
improve the accuracy. prediction, test data is fed to the algorithm. Based on
the feature extracted, probability will be generated for
test data by comparing the features of both. Highest
probability value will be classified to that particular
label whether it is hyperthyroidism or
hypothyroidism.

We have used four algorithms such as Random forest


classifier, XG Boost, Logistic Regression, K-Nearest
Neighbor out of which Random Forest and XG boost
has achieved higher accuracies such as 98 and 99
respectively, as random forest is a robust algorithm. It
has been used for the front end implementation Our
main aim of the project is to detect the thyroid
disease at early stages and with minimum of
parameters with accurate results. The website is
designed in such a way that even a layman can use it
easily. The user has to login into the website with his
unique id and password which is already stored in
database.

After the login, the user needs to enter the values like
his age, Gender and there are some series of
questions in which he has to respond with yes or no.
The questions are whether the patient is sick or
whether he has undergone any thyroid surgery
previously, does he have thyroxine, Do you have
Goitre, Hypothyroid according to reports and
Hyperthyroid according to reports. The most
important parameters are TSH(Thyroid Stimulating
Hormone), T3(Triiodothyronine), TT4(Total
Thyroxine), FTI(Free Thyroxine Index). If there is an
Fig.2: Workflow diagram increase in FTI value, the patient is having more risk
of getting Thyroid disease. We even suggest that
The technique of preparing (cleaning and organizing) foods to eat and foods to avoid according to the
the raw data to make it suitable for a building and amount of thyroid present in the person We also
training Machine Learning models. suggest medicines to the patients for a speed
recovery.
For predicting Thyroid disease analyzing blood report
is required to analyze and predict disease. Thyroid 4. ALGORITHMS
blood test data set analysis will be conducted using
various supervised machine learning classifier SVM:
techniques. Based on the accuracy of different
algorithm, best accuracy algorithm will be chosen to A support vector machine (SVM) is machine
learning algorithm that analyzes data for

www.jespublication.com Page No:240


Vol 13, Issue 06, JUNE/ 2022

ISSN NO: 0377-9254


classification and regression analysis. SVM is a
supervised learning method that looks at data and
sorts it into one of two categories. An SVM outputs a
map of the sorted data with the margins between the
two as far apart as possible.

Support Vector Machine or SVM is one of the most


popular Supervised Learning algorithms, which is
used for Classification as well as Regression
problems. However, primarily, it is used for
Classification problems in Machine Learning. The
goal of the SVM algorithm is to create the best line or
decision boundary that can segregate n-dimensional Fig.4: Random forest model
space into classes so that we can easily put the new
data point in the correct category in the future. This XGBOOST:
best decision boundary is called a hyperplane.
XGBoost provides a wrapper class to allow models to
be treated like classifiers or regressors in the scikit-
learn framework. The XGBoost model for
classification is called XGBClassifier. XGBoost,
which stands for Extreme Gradient Boosting, is a
scalable, distributed gradient-boosted decision tree
(GBDT) machine learning library. It provides parallel
tree boosting and is the leading machine learning
library for regression, classification, and ranking
problems.

Fig.3: SVM model

RANDOM FOREST CLASSIFIER:

Random forest is a Supervised Machine Learning


Algorithm that is used widely in Classification and
Regression problems. It builds decision trees on
different samples and takes their majority vote for
classification and average in case of regression.
Regression will take all the mean, median of the Fig.5:Xgboost model
output. It depends on the distribution of the output
how the decision tree is given. Random forest is also KNN CLASSIFIER:
known as random decision forest which belongs to
the category of ensembled methods. Random Forest K-Nearest Neighbour is one of the simplest Machine
is a classifier that contains a number of decision trees Learning algorithms based on Supervised Learning
on various subsets of the given dataset and takes the technique. K-NN algorithm assumes the similarity
average to improve the predictive accuracy of that between the new case/data and available cases and
dataset. put the new case into the category that is most similar
to the available categories. K-NN algorithm stores all
the available data and classifies a new data point
based on the similarity. This means when new data
appears then it can be easily classified into a well
suite category by using K- NN algorithmK-NN
algorithm can be used for Regression as well as for
Classification.

www.jespublication.com Page No:241


Vol 13, Issue 06, JUNE/ 2022

ISSN NO: 0377-9254

Fig.6: KNN model

5. EXPERIMENTAL RESULTS

Fig.9: Correlation matrix

Correlation map is a 4*4 matrix which is in


symmetric manner which indicates that all the
diagonal values will have the same values and left
half of the diagonal is a mirror reflection of the right
half. Terrain defines the colour of the map and
annotation indicates that the values are True and
visible. Gcf function helps to plot the values In an
order.

Fig.7: Dataset

Fig.10: Home screen


Fig.8: Data visualization

Fig.11: User report

www.jespublication.com Page No:242


Vol 13, Issue 06, JUNE/ 2022

ISSN NO: 0377-9254


will be displayed on the screen. Our objective was to
give society an efficient and precise way of machine
learning which can be used in applications aiming to
perform disease detection..

7. FUTURE SCOPE

Further development can be do by using image


processing of ultrasonic scanning of thyroid images
to predict thyroid nodules and cancer, which cannot
be recognized in blood test report. By combining
both the results, thyroid disease prediction can cover
all thyroid related diseases..

Fig.12: Disease detection screen REFERENCES

[1]Ankita Tyagi and Ritika Mehra.


(2018).“Interactive Thyroid Disease Prediction
System using Machine Learning Techniques”
published on ResearchGate.

[2] YongFeng Wang,(2020). “Comparison Study of


Radiomics and Deep-Learning Based Methods for
Thyroid Nodules Classification using Ultrasound
Images” published on IEEEAccess.

[3] Sunila Godara,(2018). “Prediction of Thyroid


Disease Using Machine Learning Techniques”
published on IJEE.
Fig.13: Medication screen
[4] Hitesh Garg,(2013). “Segmentation of Thyroid
Gland in Ultrasound image using Neural Network”
published on IEEE.

[5] L. Ozyılmaz and T. Yıldırım,(2002). “Diagnosis


of thyroid disease using artificial neural network
methods,” in: Proceedings of ICONIP’02 9th
international conference on neural information
processing (Singapore: Orchid Country Club, pp.
2033–2036).

[6] K. Polat, S. Sahan and S. Gunes,(2007) “A novel


hybrid method based on artificial immune recognition
Fig.14: Medication output system (AIRS) with fuzzy weighted preprocessing for
thyroid disease diagnosis,” Expert Systems with
6. CONCLUSION Applications,(vol. 32, pp. 1141-1147).

Thyroid Detection using Machine Learning is a [7] F. Saiti, A. A. Naini, M. A. Shoorehdeli, and M.
project idea that aims a smart and precise way to Teshnehlab,(2009) “Thyroid Disease Diagnosis
predict thyroid disease. We have made use of logistic Based on Genetic Algorithms Using PNN and SVM,”
regression algorithm to train our dataset and to in 3rd International Conference on Bioinformatics
predict thyroid disease with more accuracy. Here the and BiomedicalEngineering. ICBBE 2009.
machine is trained to detect whether the person
normal, hyperhypothyroidism based on the user’s
input. So when user enters data in web app the data
will be processed in backend (model) and the result

www.jespublication.com Page No:243


Vol 13, Issue 06, JUNE/ 2022

ISSN NO: 0377-9254


[8] G. Zhang, L.V. Berardi,(2007) “An investigation
of neural networks in thyroid function diagnosis,”
Health Care Management Science,1998, (pp. 29-37.)

[9] V. Vapnik,(2012).Estimation of Dependences


Based on Empirical Data, Springer, New York.

[10] Obermeyer Z,(2016). Emanuel EJ. Predicting the


future— big data, machine learning, and clinical
medicine. N Engl ; (375:12161219).

[11] Breiman L.(2001) StatisticalModeling: the two


cultures.Stat Sci. ;16:199-231..

[12] Ehrenstein V, Nielsen H, Pedersen AB, Johnsen


SP, Pedersen L. (2017) Clinical epidemiology in the
era of big data: new opportunities, familiar
challenges. Clin Epidemiol. ; 9:245-250

[13] S. Godara and R. Singh,(2016) "Evaluation of


Predictive Machine Learning Techniques as Expert
Systems in Medical Diagnosis", Indian Journal of
Science and Technology, (Vol. 910).

[14] Sunila, Rishipal Singh and Sanjeev


Kumar.(2016) "A Novel Weighted Class based
Clustering for Medical Diagnostic Interface." Indian
Journal of Science and Technology (Vol 9).

www.jespublication.com Page No:244

You might also like