Disease Prediction Using Machine Learning Algorithms
Disease Prediction Using Machine Learning Algorithms
https://doi.org/10.22214/ijraset.2023.53026
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue V May 2023- Available at www.ijraset.com
Abstract: The objective of this project is to develop a machine learning model that can predict the disease of a patient based on
their symptoms. While data mining has been successfully applied in many areas, such as market analysis and e-commerce, the
medical field still lacks powerful analytical tools to uncover hidden relationships and trends in data. Medical data contains a
wealth of information, but this knowledge is often not effectively utilized. Machine learning is a field of study that involves
developing algorithms that can improve automatically through experience and data. These algorithms use training data to build
a model that can make predictions or decisions without being explicitly programmed. In this project, techniques such as
association rule mining, classification, and clustering will be used to explore various general health problems. Classification is a
crucial problem in data mining, and decision trees are a popular classifier used to create class models. The ID3 Decision Tree
algorithm is commonly used for information classification. However, this algorithm can be inaccurate, so techniques such as
entropy-based cross-validation and partitioning will be used to improve the accuracy of the model. Finally, the results will be
compared to determine the best model. Introduction I would like to begin by highlighting the indispensability of computers in
our lives. Computers are integral components in virtually every aspect of our lives today, comprising various hardware and
software components. Software, which is a collection of programs designed to perform specific tasks, is an essential component
of computer systems. However, software development is a complex process that involves a team of professionals, as denoted by
the term "project." The term "project" is an acronym for Planning, Resource, Operating, Joint effort, Engineering, Co-
operation, and Technique. Planning involves conceptualizing and identifying the necessary steps to accomplish the project.
Resource refers to addressing the financial aspects and acquiring the resources required for the project. Operating entails the
systematic procedure for carrying out the project tasks. Joint effort relates to the collaborative effort of individuals working
towards achieving the project goals. Engineering signifies the importance of having well-educated professionals in the project
team to produce optimal results.
Co-operation is essential for the success and timely completion of the project. Finally, technique denotes the importance of
utilizing suitable methodologies to achieve project objectives. To conclude, software development is a crucial process that
requires a project-based approach that involves planning, resource acquisition, operating procedures, joint effort, engineering,
cooperation, and technique. This approach ensures successful completion of software development projects.
Keywords: Data mining, Data processing, Disease prediction, General body diseases, Prediction system.
I. INTRODUCTION
The importance of computers in our daily lives cannot be overstated. Computers comprise various hardware and software
components, with software being a crucial component designed to perform specific tasks. However, developing software is a
complex process that involves a team of professionals working on a project. The term "project" is an acronym for Planning,
Resource, Operating, Joint effort, Engineering, Co-operation, and Technique. Planning involves conceptualizing and identifying the
necessary steps to achieve project goals. Resource refers to addressing the financial aspects and acquiring the resources needed for
the project.
Operating entails the systematic procedure for carrying out project tasks. Joint effort relates to the collaborative effort of individuals
working towards achieving project goals. Engineering signifies the importance of having well-educated professionals in the project
team to produce optimal results.
Co-operation is essential for the success and timely completion of the project. Finally, technique denotes the importance of utilizing
suitable methodologies to achieve project objectives. In conclusion, software development is a critical process that requires a
project-based approach comprising planning, resource acquisition, operating procedures, joint effort, engineering, co-operation, and
technique. This approach ensures the successful completion of software development projects.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 5690
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue V May 2023- Available at www.ijraset.com
IV. METHODOLOGY
1) Step 1: Data collection and dataset preparation
The first step involves gathering medical information artifacts from various sources such as hospitals, patient discharge slips, and
UCI repository. After data collection, pre-processing will be applied to remove unnecessary data and extract important features.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 5691
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue V May 2023- Available at www.ijraset.com
2) Step 2: Developing a probabilistic modeling and deep learning approach (RNN) for Disease Prediction
The second step involves the development of a probabilistic modeling and deep learning approach based on RNN that can
effectively run on extensive healthcare databases. This approach will generate a decision tree and handle a large number of
information variables without variable deletion.
A. ID3 Algorithm
The ID3 algorithm starts with the root node representing the initial dataset. At each iteration, it evaluates the entropy or information
gain (IG(A)) of each unused attribute in the dataset. The algorithm then selects the attribute with the smallest entropy or the largest
information gain value. The dataset is then split into subsets based on the selected attribute (e.g., marks < 50, marks < 100, marks
>= 100). The ID3 algorithm recursively applies this process to each subset while considering only the attributes that have not been
selected before.
B. C4.5 Algorithm
The C4.5 algorithm is an extension of the ID3 algorithm that is used to generate decision trees. It improves the ID3 algorithm by
handling continuous and discrete attributes, as well as missing values, and by pruning the trees during construction. The decision
trees produced by C4.5 can be used for classification and are often referred to as a statistical classifier. The process of creating
decision trees using C4.5 is similar to that of the ID3 algorithm [17].
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 5692
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue V May 2023- Available at www.ijraset.com
E. SVM Algorithm
Support Vector Machines (SVMs) have gained significant attention and been actively utilized in various domains for applications
such as classification, regression, or ranking [20]. SVMs are built upon statistical learning theory and the principle of structural risk
reduction with the objective of determining the optimal location of decision boundaries or hyperplanes that produce the best
separation of classes [21]. The maximization of the margin, which creates the largest possible distance between the separating
hyperplane and the instances on both sides, has been proven to reduce the expected generalization error [22]. The efficiency of
SVM-based classification is not solely dependent on the dimension of the classified entities.
F. Approach
The General body disease prediction system employs data mining techniques utilizing the ID3 algorithm to predict diseases.
Decision trees are deemed easily interpretable models as a logical process can be provided for each decision. Knowledge models
under this paradigm can be directly transformed into a set of IF-THEN rules, which are one of the most popular forms of knowledge
representation [2].
1) Admin
The DPS administrator has the following capabilities:
2) Login:
The admin can log in to the system by selecting the user type and entering the required information.
3) System Training: The admin must train the system by uploading the dataset into the system. Experiments were conducted to
assess the performance and usefulness of various classification algorithms for predicting the disease present in a patient. The
performance of the learning techniques is highly dependent on the characteristics of the training data. Confusion matrices are
extremely helpful for assessing classifiers. The columns represent the predictions, and the rows represent the actual class [4].
4) User
The DPS user has the following abilities:
a) User login: A pre-registered user must log in to the system to access the services.
b) Enter Symptoms: The user must select the symptoms here.
c) Prediction and precaution: The model's computed result based on the rule set will be shown here.
VI. RESULT
The disease prediction system's results are illustrated through various snapshots. The first one displays the system prompt when the
patient name is not found. The second one shows a prediction based on only two symptoms. The third one is a prompt that appears
when the user enters less than two symptoms.
The last snapshot shows a prediction based on all five symptoms. Snapshots are used to demonstrate the disease prediction system's
results. The first one depicts the system's prompt when a patient's name is not found. The second snapshot shows a prediction made
using only two symptoms.
The third snapshot displays the system prompt that appears when the user enters less than two symptoms. Finally, the last snapshot
showcases a prediction made using all five symptoms. The disease prediction system's outcomes are presented through snapshots.
The first one portrays the system prompt that appears when the patient name is not found. The second snapshot displays a
prediction made using only two symptoms.
The third snapshot showcases the prompt that pops up when the user enters less than two symptoms. The fourth and final snapshot
illustrates a prediction made using all five symptoms.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 5693
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue V May 2023- Available at www.ijraset.com
VII. CONCLUSION
The disease prediction system has been implemented with an accuracy of 86.67% using a dataset of 120 patient data. Currently, the
system covers only commonly occurring diseases, but the plan is to include diseases of higher fatality, such as various cancers, in
the future. This will enable early prediction and treatment, leading to a decrease in the fatality rate of deadly diseases like cancer,
with an economic benefit in the long run. In conclusion, the disease prediction project is beneficial for everyone's day-to-day life,
especially for the healthcare sector. Health professionals can use this system to predict diseases of patients based on their general
information and symptoms. This project can help patients who do not want to go to the hospital or any other clinics. The system
provides a user-friendly environment and is easy to use. By adopting this project, the work of doctors can be reduced, and the
disease of the patient can be easily predicted. The main aim of this project is to predict diseases based on symptoms. The system
takes the symptoms of the user as input and generates the final output as a disease prediction with an average accuracy probability
of 100%. The system was successfully implemented using the grails framework and can be accessed from anywhere and at any time
as it is based on a web application. To effectively predict heart diseases, it is necessary to develop a system using machine learning
techniques. In this study, the accuracy score of Decision Tree, Logistic Regression, Random Forest, and Naive Bayes algorithms for
predicting heart disease using the UCI machine learning repository dataset was compared. The result indicates that the Random
Forest algorithm is the most efficient with an accuracy score of 90.16%. A web application based on the Random Forest algorithm
can be developed in the future using a larger dataset to provide better results and help health professionals predict heart disease more
effectively. Manually determining the odds of getting heart disease based on risk factors is challenging. Machine learning
techniques, such as the Naive Bayes algorithm, can be useful in predicting the output from existing data. However, the effectiveness
of the model is constrained by the size of the datasets and noisy, incorrect, or missing data values. The prototype developed so far
has been generally tested by computer experts and not by medical experts. Therefore, medical experts must work collaboratively to
test the prototypes to implement the system in real life and support medical experts in making clinical decisions. The disease
prediction system takes symptoms from the user as input and predicts the disease as output. The user can select a minimum of two
to a maximum of five symptoms. The accuracy of the system increases with the number of symptoms entered, with less accuracy
achieved when only two symptoms are entered.
REFERENCES
[1] Aditya Tomar, “Disease Prediction System using data mining techniques”, in International Journal of Advanced Research in computer and Communication
Engineering, ISO 3297, July 2016.
[2] Dr. Srinivasan, K. Pavya, “A study on data mining prediction techniques in healthcare sector”, in International Research Journal of Engineering and
Technology (IRJET), March2016.
[3] Megha Rathi, Vikas Pareek, “An integrated hybrid data mining approach for healthcare”, in IRACST -International Journal of Computer Science and
Information Technology Security (IJCSITS), ISSN: 2249-9555, Vol.6, No.6, Nov-Dec 2016.
[4] Feixiang Huang, Shengyong Wang, and Chien-Chung Chan, “Predicting Disease by Using Data Mining Based on Healthcare Information System”, in IEEE
2012.
[5] M.A. Nishara Banu, B Gomathy, “An approach to devise an Interactive software solution for smart health prediction using data mining, in International Journal
of Technical Research and Applications, eISSN, Nov-Dec 2013.
[6] Computational Intelligence and Communication Technology (IEEE-CICT 2017) Implementing WEKA for medical data classification and early disease
prediction. “3rd IEEE International Conference on"
[7] 2015 1st International Conference on Futuristic trend in Computational Analysis and Knowledge Management (ABLAZE-2015) “Predictions in Heart Disease
Using Techniques of Data Mining”.
[8] Marjia Sultana, Afrin Haider, and Mohammad Shorif Uddin “Analysis of Data Mining Techniques for Heart Disease Prediction”
[9] Dr. M.S. Shashidhara, M. Giri, Girija D.K “Data mining approach for prediction of fibroid Disease using Neural Networks,”
[10] Uma Ojha, Dr. Savita Goel. “Study on prediction of Breast cancer recurrence using Data mining techniques”
[11] Disease Prediction and Doctor Recommendation System by www.irjet.net 46
[12] GDPS - General Disease Prediction System by www.irjet.net
[13] Disease Prediction Using Machine Learning by International Research Journal of Engineering and Technology (IRJET).
[14] Kaveeshwar, S.A., and Cornwall, J., 2014, “The current state of disease mellitus in India”. AMJ, 7(1), pp. 45-48.
[15] Dean, L., McEntyre, J., 2004, “The Genetic Landscape of Disease [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); Chapter 1,
Introduction to Disease. 2004 Jul 7.
[16] Machine Learning Methods Used in Disease by www.wikipedia.com
[17] https://www.researchgate.net/publication/325116774_disease_prediction_using_machine_learning_techniques
[18] https://ieeexplore.ieee.org/document/8819782/disease_prediction
[19] Algorithms Details from www.dataspirant.com 20. https://www.youtube.com/disease_prediction
[20] https://www.slideshare.com/disease_prediction
[21] .https:/en.wikipedia.org/machine_learning_algorithms
[22] https://en.wikipedia.org/wiki/Python_(programming_language)
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 5694
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 11 Issue V May 2023- Available at www.ijraset.com
[23] https://wiki.python.org/TkInter
[24] https://creately.com/lp/uml-diagram-tool/
[25] https://app.diagrams.net/
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 5695