Disease Predictor Based On Symptoms Using Machine Learning
Disease Predictor Based On Symptoms Using Machine Learning
https://doi.org/10.22214/ijraset.2022.44408
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue VI June 2022- Available at www.ijraset.com
Abstract: Given how important the health sector is in curing prescribers' problems, Disease Prediction based on Symptoms with
Machine Learning is a system that predicts diseases based on the user's knowledge of clinical manifestations, assuring solid
findings based on such facts. If the user simply has to know a little bit about the sickness and the patient isn't in any danger, this
technique can be used to learn a little bit about minor ailments. It's a system that provides medical advice and tactics to clients,
as well as a tool to help them figure out what ailment they have utilizing this forecast. It's also a big benefit for the healthcare
sector as well as individuals who don't want to travel to a hospital or clinic for their initial diagnosis. The user can learn a lot
about the condition that has been revealed to him or her by simply inputting the side effects and other critical information, and
the health sector can benefit from this method by simply asking the patient for symptoms and giving them a diagnosis. To
achieve Disease Prediction based on Symptoms, we used Machine Learning techniques, Python Programming with Tkinter
Interface, and a dataset acquired from hospitals.
The phrases Disease Predictor, Machine Learning, and Tkinter Interface are used in this research.
Keywords: Disease Predictor, Machine Learning, Tkinter Interface
I. INTRODUCTION
A well-functioning healthcare system is critical to the economy and the well-being of humanity. Between the world, we live in now
and the world we lived in a few decades ago, there has been a substantial amount of change. Everything has gotten more disorderly
and unattractive. In this situation, doctors and nurses are doing everything they can to save people's lives, even if it means putting
their own lives in danger. Virtual doctors are board-certified doctors who choose to practice online using video and phone
consultations rather than in-person consultations, albeit this is not always practicable in an emergency. In the absence of human
error, machines are thought to be superior to humans because they can do jobs faster while maintaining a consistent level of
precision. A disease predictor, often known as a virtual doctor, may accurately predict a patient's sickness without the need for
human involvement. A disease predictor can save a person's life in extreme instances, such as COVID-19 and EBOLA, by
recognizing their health without requiring physical touch. There are virtual doctors on the market now, but they lack the capacity to
provide the kind of precision that is required. This Condition's Prognosis To forecast sickness, we'll use hospital data and Machine
Learning methods based on the Python programming language and the Tkinter interface. Doctors may make errors when diagnosing
a patient's disease, however, disease prediction systems with machine learning algorithms can help produce accurate results in these
situations. For this project, we employed a mix of approaches, algorithms, and technologies to develop a system that can forecast a
patient's status based on their symptoms. The symptoms are compared to the information previously saved in the system. We can
accurately forecast the percentage of disease in a patient by combining those datasets with the patient's symptoms. The dataset and
symptoms are uploaded to the system's prediction model, where the data is pre-processed for future references before the user picks
the features and enters the symptoms. The data is then classified using a variety of algorithms and approaches, such as Decision
Tree, KNN, and Naive Bayes, to mention a few.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2549
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue VI June 2022- Available at www.ijraset.com
Fig.1 Comparison of the accuracy values of the different ML algorithms. The Weighted KNN model gave the highest accuracy as
compared to the other ML algorithms. The RUSBoosted trees were the least accurate model. The Fine KNN performed better than
the Subspace, Medium, and Coarse KNN models. The least efficient KNN model was coarse KNN. The Gaussian and the Kernel
Naive Bayes algorithm had a comparable accuracy with each other though less than the KNN models. The Fine tree had a higher
accuracy than the medium and the coarse decision tree models.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2550
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue VI June 2022- Available at www.ijraset.com
IV. DATASET
This inquiry used data from a University of Columbia study done at New York-Presbyterian Hospital in 2004. The ailment is given
in the first column, followed by the symptom in the second column. The strongest connections between the 150 most prevalent
diseases have been identified, and symptoms have been categorized according to the strength of the connections. The technology
created UMLS codes for diseases and symptoms using the MedLEE natural language processing system, which were then examined
using statistical techniques based on frequency and co-occurrences.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2551
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue VI June 2022- Available at www.ijraset.com
A. Decision Tree
The most powerful and extensively used categorization and prediction tool is the decision tree. Each internal node represents an
attribute test, each branch represents a test outcome, and each leaf node represents a class label in a decision tree that resembles a
flowchart.
B. Random Forest
Random Forest, a well-known machine learning algorithm, employs the supervised learning method. In machine learning, it can be
utilized for both classification and regression issues. It is based on ensemble learning, which is a method for solving a complicated
problem by merging numerous classifiers and improving the model's performance.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2552
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue VI June 2022- Available at www.ijraset.com
C. KNN
One of the most fundamental Machine Learning algorithms is the K-Nearest Neighbour approach. It is based on the method of
Supervised Learning. Because K-NN considers the new case/data and previous cases to be comparable, the new case is assigned to
the category that is the most similar to the previous categories.
The K-NN method keeps track of all available data and categorizes new data points based on how similar they are to existing data.
As fresh data arrives, the K-NN algorithm can quickly filter it into the appropriate suite category. Although this method can be used
for both regression and Classification, classification is the most popular use.
D. Naïve Bayes
The Naive Bayes algorithm is a supervised learning approach based on the Bayes theorem for classification tasks. It's commonly
used in problems that require a large training dataset, such as text categorization. The Naive Bayes Classifier is a simple yet
powerful classification method for quickly developing machine learning models that can make correct predictions. It's a
probabilistic classifier, meaning it makes predictions based on the probability of an object. The Naive Bayes Algorithm can be used
for spam filtration, sentiment analysis, and article classification, to name a few.
D. Data Preprocessing
This step will remove any punctuation, HTML markups, hashtags, URLs, @names, and whitespace, as well as stop words,
lemmatizing, and stemming text.
E. Training
The system will compare the user's symptoms to the dataset as they are entered, the dataset is made up of binary 0s and 1s, and once
the model has assessed all of the user's symptoms, it will accurately forecast the disease associated with that manifestation.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2553
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue VI June 2022- Available at www.ijraset.com
F. Results/ Outputs
To demonstrate our accuracy, we created a confusion matrix, and the patient’s disease will be provided by the system
VII. CONCLUSION
Finally, I'd like to emphasize how important this project, Disease prediction using machine learning, is to everyone's daily lives, but
especially to those in the healthcare industry, who use these systems on a daily basis to predict patients' diseases based on their
general information and symptoms. Because the health industry now plays such a large role in curing patients' diseases, this is often
quite helpful for the health industry to inform the user, and it's also useful for the user if he or she doesn't want to travel to the
hospital or other clinics, because the user can learn about the disease he or she is suffering from simply by entering the symptoms
and any other relevant information, and the health industry can benefit from this system. Doctors' workload will be decreased if the
healthcare industry embraces this notion, and they will be better qualified to foresee a patient's sickness. Disease prediction is a
technique for foreseeing the onset of a range of common diseases that, if left untreated or ignored, can result in mortality and a slew
of other problems for the patient and their family.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2554
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue VI June 2022- Available at www.ijraset.com
REFERENCES
[1] Disease Prediction and Doctor Recommendation System by www.irjet.net
[2] Disease Prediction Based on Prior Knowledge by www.hcup- us.ahrq.gov/nisoverview.jsp
[3] Kaveeshwar,S.A.,and Cornwall,J.,2014,“Thecurrentstateofdiseasemellitusin India”. AMJ, 7(1), pp. 45-48.
[4] Dean, L., Mc Entyre, J., 2004, “The Genetic Landscape of Disease [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); Chapter
1, Introduction to Disease. 2004 Jul 7.
[5] MachineLearningMethods Used inDisease bywww.wikipedia.com
[6] https://www.researchgate.net/publication/325116774_disease_prediction_usin g_machine_learning_techniques
[7] https://ieeexplore.ieee.org/document/8819782/disease_prediction
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2555