Latex Code
Latex Code
Bachelor of Technology
in
Computer Science & Engineering
By
November, 2024
HEART DISEASE PREDICTION
Bachelor of Technology
in
Computer Science & Engineering
By
November, 2024
CERTIFICATE
It is certified that the work contained in the project report titled ”HEART DISEASE PREDICTION”
by ”T.TARUNN TEZAA (22UECM2022), G.NIKHIL KUMAR (22UECT2005), K.UJWAL (22UECM2003)”
has been carried out under my supervision and that this work has not been submitted elsewhere for a
degree.
Signature of Supervisor
Mrs.A.SATHYA
B.E,M.E.
Computer Science & Engineering
School of Computing
Vel Tech Rangarajan Dr. Sagunthala R&D
Institute of Science & Technology
November, 2024
i
DECLARATION
We declare that this written submission represents my ideas in our own words and where others’
ideas or words have been included, we have adequately cited and referenced the original sources. We
also declare that we have adhered to all principles of academic honesty and integrity and have not
misrepresented or fabricated or falsified any idea/data/fact/source in our submission. We understand
that any violation of the above will be cause for disciplinary action by the Institute and can also
evoke penal action from the sources which have thus not been properly cited or from whom proper
permission has not been taken when needed.
(Signature)
T.TARUNN TEZAA
Date: / /
(Signature)
G.NIKHIL KUMAR
Date: / /
(Signature)
K.UJWAL
Date: / /
ii
APPROVAL SHEET
This project report entitled HEART DISEASE PREDICTION by T.TARUNN TEZAA (22UECM2022),
G.NIKHIL KUMAR (22UECT2005), K.UJWAL (22UECM2003) is approved for the degree of B.Tech
in Computer Science & Engineering.
Examiners Supervisor
Ms.A.SATHYA, B.E,M.E.,
Date: / /
Place:
iii
ACKNOWLEDGEMENT
We express our deepest gratitude to our Honorable Founder Chancellor and President Col.
Prof. Dr. R. RANGARAJAN B.E. (Electrical), B.E. (Mechanical), M.S (Automobile), D.Sc., and
Foundress President Dr. R. SAGUNTHALA RANGARAJAN M.B.B.S. Vel Tech Rangarajan Dr.
Sagunthala R&D Institute of Science and Technology, for her blessings.
We express our sincere thanks to our respected Chairperson and Managing Trustee Mrs. RAN-
GARAJAN MAHALAKSHMI KISHORE,B.E., Vel Tech Rangarajan Dr. Sagunthala R&D
Institute of Science and Technology, for her blessings.
We are very much grateful to our beloved Vice Chancellor Prof. Dr.RAJAT GUPTA, for provid-
ing us with an environment to complete our project successfully.
We record indebtedness to our Professor & Dean, Department of Computer Science & Engi-
neering, School of Computing, Dr. S P. CHOKKALINGAM, M.Tech., Ph.D., & Associate Dean,
Dr. V. DHILIP KUMAR,M.E.,Ph.D., for immense care and encouragement towards us throughout
the course of this project.
We are thankful to our Professor & Head, Department of Computer Science & Engineering,
Dr. N. VIJAYARAJ, M.E., Ph.D., and Associate Professor & Assistant Head, Dr. M. S. MURALI
DHAR, M.E., Ph.D.,for providing immense support in all our endeavors.
We also take this opportunity to express a deep sense of gratitude to our Internal Mrs.A.SATHYA
B.E,M.E., for her cordial support, valuable information and guidance,she helped us in completing
this project through various stages.
We thank our department faculty, supporting staff and friends for their help and guidance to com-
plete this project.
iv
ABSTRACT
Nowadays, prediction of Heart Disease has become one amongst the most chal-
lenging mission in medical sector. Heart is the most essential or crucial portion of
our body. Heart is used to maintain and conjugate blood in our body. There are a lot
of cases in the world related to heart diseases. In the present world, per every minute
proximately one person dies because of heart disease. As prediction of heart disease
is a complicated task, there is a requirement to computerize the foresight process to
bypass pitfalls interrelated with it and forewarn the patient beforehand. The building
of the model has made use of machine learning algorithms like random forest, K-
nearest neighbor, logistic regression, and decision tree. The study demonstrates that,
when compared to other ML techniques, logistic regression and KNN provide better
prediction accuracy in a shorter amount of time. The heart disease prediction GUI
allows the user to enter the values such as age, gender, cholesterol and the result is
displayed on the page after submitting the values.
Keywords:
- Heart Disease Prediction
- Machine Learning in Healthcare
- Cardiovascular Risk Assessment
- Risk Factors for Heart Disease
- Artificial Intelligence (AI) in Cardiology
- Predictive Analytics
- Logistic Regression
- Classification Algorithms
- Deep Learning for Health Prediction
- Decision Trees in Medical Diagnosis
- Support Vector Machine (SVM)
- Random Forest for Disease Prediction
- Neural Networks in Medicine
- Healthcare Data Analysis
- UCI Heart Disease Dataset
- Framingham Risk Score
- Medical Feature Engineering
- Healthcare Data Integration
- Explainable AI (XAI) in Healthcare
v
- Clinical Decision Support Systems (CDSS)
ONETwothree
vi
LIST OF FIGURES
6.1 Output 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
6.2 Output 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
vii
LIST OF TABLES
viii
LIST OF ACRONYMS AND
ABBREVIATIONS
ix
TABLE OF CONTENTS
Page.No
ABSTRACT v
1 INTRODUCTION 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Aim of the project . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Project Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Scope of the Project . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 LITERATURE REVIEW 1
2.1 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2.2 Gap Identification . . . . . . . . . . . . . . . . . . . . . . . . . . 1
3 PROJECT DESCRIPTION 3
3.1 Existing System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.2 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.3 System Specification . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.3.1 Hardware Specification . . . . . . . . . . . . . . . . . . . . 5
3.3.2 Software Specification . . . . . . . . . . . . . . . . . . . . 5
3.3.3 Standards and Policies . . . . . . . . . . . . . . . . . . . . 5
4 METHODOLOGY 6
4.1 Proposed System . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.2 General Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.3 Design Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.3.1 Data Flow Diagram . . . . . . . . . . . . . . . . . . . . . . 7
4.3.2 Use Case Diagram . . . . . . . . . . . . . . . . . . . . . . 8
4.3.3 Class Diagram . . . . . . . . . . . . . . . . . . . . . . . . 9
4.3.4 Sequence Diagram . . . . . . . . . . . . . . . . . . . . . . 10
4.3.5 Collaboration diagram . . . . . . . . . . . . . . . . . . . . 11
4.3.6 Activity Diagram . . . . . . . . . . . . . . . . . . . . . . . 12
4.4 Algorithm & Pseudo Code . . . . . . . . . . . . . . . . . . . . . . 12
4.4.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.4.2 Pseudo Code . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.4.3 Data Set / Generation of Data (Description only) . . . . . . 14
4.5 Module Description . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.5.1 Module1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.5.2 Module2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.5.3 Module3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
8 PLAGIARISM REPORT 24
Appendices 25
INTRODUCTION
1.1 Introduction
Heart Disease Prediction System, where innovation meets health to tackle the
rising tide of cardiovascular diseases. In response to the critical need for proactive
healthcare solutions, our project combines cutting-edge technology with a focus
on predictive analytics. By analyzing diverse datasets, from medical histories to
lifestyle factors, we provide personalized risk assessments aimed at early detection
and prevention. Our user-centric approach ensures that complex health insights
become accessible to all, fostering a proactive mindset towards heart health. .
1
1.3 Project Domain
The heart disease prediction project lies within the healthcare and medical informat-
ics domain, where the goal is to predict the likelihood of heart disease in individuals
based on various health metrics. By leveraging machine learning and predictive ana-
lytics, this project seeks to analyze patient data—such as blood pressure, cholesterol
levels, age, and lifestyle factors—to identify patterns associated with heart disease.
This predictive approach is valuable for preventive care and early diagnosis, as it can
assist healthcare providers in identifying high-risk patients and intervening before
the condition worsens. In a broader sense, this project supports the development
of Clinical Decision Support Systems (CDSS), which aid medical professionals in
decision-making, and it can also be integrated into telemedicine platforms to enable
remote monitoring. Ultimately, a heart disease prediction system can empower both
doctors and patients to make informed, data-driven decisions that improve health
outcomes and reduce risks associated with heart disease.
The scope of a heart disease prediction system is vast and impactful, extending across
various dimensions of healthcare. By leveraging advanced technologies such as ma-
chine learning and data analytics, these systems have the potential to revolutionize
cardiovascular health management. They offer early detection of potential risks, al-
lowing for timely intervention and preventive measures. The scope also encompasses
personalized risk assessments, tailoring predictions based on individual health pro-
files, including genetic predispositions and lifestyle factors. Beyond individual care,
these systems contribute to population health management by identifying trends and
risk factors within specific demographics. Moreover, they facilitate remote monitor-
ing through wearable devices and telehealth technologies, extending their reach to
remote or underserved areas. The overall impact includes not only improved health
outcomes for individuals but also a reduction in healthcare costs, as preventive mea-
sures prove more cost-effective than treating advanced cardiovascular conditions.
2
Chapter 2
LITERATURE REVIEW
Heart disease remains a leading cause of mortality worldwide, driving extensive re-
search into effective prediction and prevention strategies. Various studies have ex-
plored machine learning techniques and data-driven approaches to enhance the ac-
curacy and efficiency of heart disease prediction. This literature review summarizes
key findings and methodologies from notable works in the field.
Early heart disease prediction models primarily employed statistical methods such
as logistic regression and decision trees. For instance, studies have demonstrated that
logistic regression effectively identifies risk factors associated with coronary artery
disease by analyzing historical patient data. However, these models often struggle to
capture non-linear relationships among features, limiting their predictive power.
• Support Vector Machines (SVM): Research has shown that SVM can effec-
tively classify patients based on complex feature sets, outperforming traditional
methods in terms of accuracy.
• Random Forests: This ensemble method has gained popularity due to its ability
to handle large datasets and its robustness against overfitting. Studies report
improved accuracy and feature importance insights, making it a favorable choice
for heart disease prediction.
• Neural Networks: Deep learning approaches have emerged as powerful tools in
predictive modeling. Research indicates that artificial neural networks can learn
1
complex patterns from vast datasets, yielding high accuracy in diagnosing heart
disease.
3. Hybrid Models
Recent works have investigated hybrid models that combine multiple machine
learning algorithms to improve prediction performance. For example, studies have
integrated decision trees with neural networks, leveraging the strengths of both ap-
proaches to achieve superior results in heart disease prediction.
With the rise of mobile health technology, several studies have focused on devel-
oping real-time prediction applications. These applications leverage machine learn-
ing algorithms to provide instant risk assessments based on user-inputted health data.
Research indicates that such tools can improve patient engagement and empower in-
dividuals to manage their heart health proactively.
As machine learning models become more complex, the need for interpretabil-
ity grows. Recent research highlights the importance of explainable AI in health-
care, emphasizing that practitioners must understand how predictions are made.
Techniques such as SHAP (SHapley Additive exPlanations) and LIME (Local In-
terpretable Model-Agnostic Explanations) have been developed to provide insights
into model predictions, ensuring transparency in clinical decision-making.
2
on integrating electronic health records with machine learning models, utilizing real-
time data for improved predictions, and enhancing user engagement through intuitive
interfaces.
Identifying gaps in the heart disease prediction project is essential for improving the
model’s effectiveness, usability, and overall impact. Here are several potential gaps:
1. Data Limitations - Quality of Data: Existing datasets may contain missing
values, outliers, or inaccuracies, which can compromise the prediction accuracy. -
Diversity of Data: Many datasets are not representative of diverse populations, lead-
ing to potential biases. Gaps in demographic representation (age, gender, ethnicity)
can affect model performance across different groups.
2. Model Complexity and Performance - Algorithm Selection: The current model
might use simpler algorithms that cannot capture complex relationships in the data.
More advanced techniques (e.g., deep learning, ensemble methods) could improve
accuracy. - Overfitting/Underfitting: There may be issues related to model overfitting
(performing well on training data but poorly on unseen data) or underfitting (failing
to capture underlying patterns). 3. Interpretability - Lack of Insights: Many existing
models provide limited interpretability, making it difficult for users to understand
the reasoning behind predictions. Enhancing interpretability can help build trust and
guide users in making informed health decisions.
4. User Accessibility and Engagement - Limited User Interface: The existing
user interface may not be intuitive or engaging, which can discourage users from
interacting with the system. Improving usability can enhance user experience. -
Accessibility for Non-Experts: Healthcare providers or patients without technical
backgrounds may struggle to utilize the system effectively. Educational resources or
simplified interfaces can help bridge this gap.
5. Preventive Recommendations - Generic Advice: Current systems may provide
broad recommendations without personalization based on individual risk factors. En-
hancing the recommendation engine to deliver tailored health tips could improve user
engagement and effectiveness.
6. Integration with Healthcare Systems - Lack of Integration: The prediction
system may operate in isolation, lacking integration with electronic health records
(EHR) or other healthcare management systems. Integration can facilitate seamless
1
access to patient data and improve care coordination.
7. Continuous Learning and Updates - Static Model: Existing systems may not
adapt to new data or trends over time. Implementing a continuous learning frame-
work can ensure the model stays current with emerging research and changing health
patterns.
2
Chapter 3
PROJECT DESCRIPTION
The existing systems focus on heart disease prediction, utilizing a range of ap-
proaches and technologies. Notable examples include the Framingham Heart Study,
a pioneering cohort study providing foundational insights. The American College
of Cardiology and the American Heart Association’s ASCVD Risk Estimator and
the MESA Risk Score are widely used tools assessing cardiovascular risk based on
various factors. Machine learning-based models, including those applying logistic
regression and neural networks, analyse extensive datasets for accurate risk assess-
ments. Mobile applications like Cardiogram’s Heart Check leverage machine learn-
ing for heart health monitoring using wearable device data. IBM Watson Health
provides solutions for cardiovascular risk assessment, incorporating artificial intel-
ligence and analytics. Google Health Studies engages users in research studies to
gather health data from wearables for conditions such as heart disease. These sys-
tems collectively represent diverse approaches, showcasing the evolving landscape
of heart disease prediction with a blend of traditional methodologies and cutting-
edge technologies.
DISADVANTAGES
1. Data Limitations: - Quality issues with incomplete, outdated, or biased datasets. -
Poor generalization across diverse populations.
2. Complexity of Use: - Complicated user interfaces that are hard for non-experts
to navigate. - Limited accessibility for older adults and those without technology.
3. Interpretability and Transparency: - Complex models often lack transparency
and explainability. - Generic recommendations that are not personalized for individ-
ual risk factors.
4. Integration Challenges: - Poor interoperability with electronic health records
and healthcare technologies. - Fragmented data from various sources leading to a
disjointed view of health.
5. Static Models: - Inability to adapt or learn from new data and trends over time.
3
3.2 Problem statement
Heart disease remains one of the leading causes of mortality worldwide, necessi-
tating effective prediction and early intervention strategies. Current methods for
assessing cardiovascular risk often rely on traditional clinical evaluations and static
algorithms, which may not adequately capture the complexity of individual health
profiles or adapt to emerging health trends.
Existing systems tend to suffer from limitations such as reliance on outdated or bi-
ased datasets, lack of personalized insights, and inadequate user engagement, leading
to suboptimal decision-making in both clinical and personal contexts. Furthermore,
many prediction models lack transparency and interpretability, making it challenging
for healthcare providers and patients to trust and understand the outcomes.
This project aims to develop a robust heart disease prediction system that lever-
ages advanced machine learning techniques to analyze diverse and comprehensive
health data. The system will focus on providing accurate predictions, personalized
recommendations, and an intuitive user interface to enhance accessibility and user
engagement. By addressing the gaps in current methodologies and integrating inno-
vative approaches, this system seeks to improve early detection, reduce the incidence
of heart disease, and ultimately contribute to better health outcomes for individuals
at risk.
Advantages of Proposed system
1. Enhanced Accuracy: - Utilizes advanced machine learning algorithms to analyze
diverse datasets, leading to more accurate predictions of heart disease risk compared
to traditional methods.
2. Personalized Insights: - Provides tailored health recommendations based on
individual risk factors, lifestyle, and medical history, enhancing user engagement
and encouraging proactive health management.
3. Improved Data Handling: - Incorporates comprehensive data preprocessing
techniques to handle missing values, outliers, and normalization, resulting in cleaner
and more reliable input for the predictive model.
4. User-Friendly Interface: - Features an intuitive and accessible user interface
that allows users, regardless of technical expertise, to easily input data and under-
stand results.
4
3.3 System Specification
* Hardware:
* System: Intel Core i3, i5, i7 and 2GHz Minimum
* RAM: 4GB or above
* Hard Disk: 10GB or above
* Input: Keyboard and Mouse
* Output: Monitor or PC
• Software:
* Operating System: Windows 8 or Higher Versions
* Platform: Google Collaboratory, Anaconda Prompt
* Program Language: Python, Flask
Anaconda Prompt
Anaconda prompt is a type of command line interface which explicitly deals with
the ML( MachineLearning) modules.And navigator is available in all the Win-
dows,Linux and MacOS.The anaconda prompt has many number of IDE’s which
make the coding easier. The UI can also be implemented in python.
Standard Used: ISO/IEC 27001
Jupyter
It’s like an open source web application that allows us to share and create the doc-
uments which contains the live code, equations, visualizations and narrative text. It
can be used for data cleaning and transformation, numerical simulation, statistical
modeling, data visualization, machine learning.
Standard Used: ISO/IEC 27001
5
Chapter 4
METHODOLOGY
* The major challenge in heart disease is its detection. There are instruments avail-
able which can predict heart disease but either it are expensive or are not efficient to
calculate chance of heart disease in human. Many poor people can’t use it and they
don’t consult doctor due to financial problems.
* Since we have a good amount of data in today’s world, we can use various
machine learning algorithms to analyze the data for hidden patterns. The hidden
patterns can be used for health diagnosis in medicinal data.
6
Initially the patient registers by providing certain parameters. That registered data
is collected in a database by using machine learning techniques like data collection
techniques and when he went to check about his health condition the collected values
or data that has been stored in the database is been extracted by using some feature
extraction techniques. When data is extracted, it under goes certain processes and
therefore finally a disease is predicted and a report is generated. This is the overview
of the heart disease prediction system using machine learning techniques.
This is the initial idea for the flow of the data. The data has to be flown from user to
server and from server to the user for the prediction of the disease by entering details
and sending the data. Communication is done between user and the server.
7
4.3.2 Use Case Diagram
The steps from registering the user i.e., beginning step to the final generating of the report can all be explained
easily by using or easily be represented by using use case diagram where actors are used as users. Users register by
using certain parameters and then they login to their accounts and enter their health conditions and values which
were to be stored in the database i.e., data collection is taken into consideration i.e., the data from the users need
to be collected in the database. Whenever it’s needed the data is to be extracted and then it need to be match with
the values and check for the disease and predict the disease and finally a report need to be generated.
8
4.3.3 Class Diagram
A class diagram is a type of static structure diagram in the Unified Modeling Language (UML) that illustrates
the structure and relationships of the classes in a system. It provides a visual representation of the classes, their
9
4.3.4 Sequence Diagram
The data which is flown from user to the server, there it undergoes matching for data
from the user(input) and the data which we have i.e. data sets (train data). Finding
probability between them by comparing the values and then generating the report.
10
4.3.5 Collaboration diagram
The collaboration involves several key components working together to provide users
with health predictions and recommendations. The User interacts with the User
Interface (UI) by entering health-related data, such as age and cholesterol levels.
This data is then sent to the Flask Application Server, which manages the overall
data flow. The server forwards the information to the Data Preprocessing Module,
responsible for cleaning and preparing the data for analysis. Once processed, the
data is sent to the Machine Learning Model, which predicts the likelihood of heart
disease based on the input features. After receiving the prediction, the Flask server
communicates with the Recommendation Engine, which generates personalized
health tips based on the model’s output.
11
4.3.6 Activity Diagram
The above activity diagram has user as actor. The user first selects the document set
on which the classification must be performed. The user can then go for a classifi-
cation model build based on the loaded dataset. Once the dataset is built new patient
details (symptoms) can be entered through the predictor frame. Once the predictor
is appropriately populated he can then know the status of the heart disease.
4.4.1 Algorithm
1. Set Up: Import required libraries (Flask, numpy, pickle) and initialize the Flask
app.
2. Load Model: Load the pre-trained model (lr.pkl) using pickle.
3. Home Route (/): Display the main page (HOMEhtml.html) with a form for user
input.
4. Prediction Route (/predict):
- Collect and convert form input to a NumPy array.
12
- Predict the likelihood of heart disease using the model.
- Display the result:
- 0: Show ”Low likelihood of heart disease.”
- 1: Show ”High likelihood of heart disease” and health tips.
5. Run App: Start the Flask app with debugging enabled (debug=True).
import numpy as np
from flask import Flask, request, jsonify, rendert emplate
importpickle
f romsklearn.preprocessingimportM inM axScaler
scaler = M inM axScaler()
Createf laskapp
app = F lask(name)
model = pickle.load(open(”lr.pkl”, ”rb”))
@app.route(”/”,methods = [”GET”,”POST”])
def Home():
return rendert emplate(”HOM Ehtml.html”)
13
4.4.3 Data Set / Generation of Data (Description only)
The dataset for a heart disease prediction model typically includes medical and
lifestyle-related features known to influence heart health. Common datasets, such as
the *Cleveland Heart Disease Dataset* from the UCI Machine Learning Repository,
contain records of patients with both numerical and categorical data on various risk
factors. 1. Demographic Information:
- Age: Age of the individual, as heart disease risk generally increases with age.
- Gender: Male or female, as gender can influence risk levels differently.
2. Medical Metrics:
- Blood Pressure: Resting blood pressure (in mm Hg), as high blood pressure is a
known risk factor.
- Cholesterol Levels: Serum cholesterol (mg/dl), since high cholesterol can lead to
artery blockage.
- Resting Electrocardiographic Results: ECG results to detect abnormalities in heart
function.
- Fasting Blood Sugar: Blood sugar levels after fasting, to indicate potential diabetes
risks.
3. Lifestyle-Related Factors:
- Exercise-Induced Angina: Indicates if chest pain occurs during physical activity.
- Physical Activity Levels: Captures general activity level, which influences heart
health.
4. Target Variable:
- Presence of Heart Disease: The outcome (often 0 or 1), indicating the presence or
absence of heart disease.
14
4.5 Module Description
4.5.1 Module1
• Purpose: Gather data from various sources like medical records, patient ques-
tionnaires, or real-time health monitors.
• Components:
– Input Interfaces: Forms for patient input, integration with electronic health
records (EHR), wearables, or health monitoring devices.
– Data Validation: Ensures data accuracy and completeness (e.g., checking
for missing values or outliers).
• Types of Data:
– Demographic Data: Age, gender, ethnicity.
– Medical History: Previous conditions, family history of heart disease.
– Clinical Data: Blood pressure, cholesterol levels, ECG results.
– Lifestyle Data: Smoking, alcohol use, physical activity, diet.
4.5.2 Module2
• Purpose: Prepare raw data for analysis and prediction by cleaning and trans-
forming it into a usable format.
• Components:
– Data Cleaning: Handling missing values, noise, and inconsistencies.
– Feature Engineering: Extracting relevant features (e.g., creating a ”risk
factor” score).
– Normalization/Standardization: Scaling data (important for models like
SVM, neural networks).
– Encoding Categorical Data: Convert categorical variables into numerical
values using techniques like one-hot encoding or label encoding.
15
4.5.3 Module3
• Purpose: Identify and select the most relevant features that contribute to heart
disease prediction, improving model accuracy.
• Components:
– Correlation Analysis: Identify which features are strongly correlated with
heart disease.
– Dimensionality Reduction: Techniques like PCA (Principal Component
Analysis) or LDA (Linear Discriminant Analysis) to reduce the number of
features.
– Feature Ranking: Use statistical methods or machine learning techniques
(e.g., Recursive Feature Elimination) to rank features based on importance.
16
Chapter 5
5.2 Testing
Input
Test result
Input
Test result
Input
Test Result
17
5.3.4 Test Result
Figure 5.2: The above fig is the web application interface of heart disease predictor. After entering the results, we
get the accurate result
18
Chapter 6
19
preprocessing may be minimal, with limited handling of missing values, outliers, or normalization
techniques, which can reduce prediction accuracy.
2. Algorithms and Models: - Many traditional systems use simple models such as Logistic Re-
gression, Decision Trees, or Naive Bayes. These models are computationally inexpensive but often
lack the sophistication to capture complex, non-linear relationships within the data. - Some systems
may use statistical analysis rather than machine learning, which can be less adaptable to new patterns
and may generalize poorly on new patient data.
3. Prediction Accuracy: - These systems usually achieve moderate accuracy. Due to simpler
models and basic feature engineering, they may fail to capture intricate dependencies and interactions
between variables, resulting in limited predictive power. - Accuracy often stagnates around 70-80
4. Interpretability and Insights: - Most existing models provide only basic interpretability. For
instance, logistic regression might highlight some feature importance, but decision trees and simpler
models lack nuanced explanations. - Insights are often restricted to general risk levels (high or low)
without detailed recommendations or tailored insights for individual risk factors.
5. User Interface and Accessibility: - Existing systems are often limited to hospital or clinical en-
vironments and may not have user-friendly interfaces. The use of specialized software or dashboards
may require medical expertise to operate, reducing accessibility for non-expert users or patients di-
rectly. - These systems are generally not interactive or accessible outside medical facilities, which
limits their use for preventive care and early self-assessment.
6. Preventive Recommendations: - Most traditional systems simply provide a risk score without
offering personalized health recommendations or actionable steps to reduce heart disease risk. -
As a result, these systems serve as diagnostic aids but lack the capability to empower patients with
preventive strategies.
Summary of Limitations While existing systems for heart disease prediction provide valuable in-
sights, they are limited in terms of data handling, accuracy, interpretability, accessibility, and preven-
tive guidance.
1 i m p o r t numpy a s np
2 from f l a s k i m p o r t F l a s k , r e q u e s t , j s o n i f y , r e n d e r t e m p l a t e
3 import pickle
4 from s k l e a r n . p r e p r o c e s s i n g i m p o r t MinMaxScaler
5 s c a l e r = MinMaxScaler ( )
6 # C r e a t e f l a s k app
7 app = F l a s k ( name )
8 model = p i c k l e . l o a d ( open ( ” l r . p k l ” , ” r b ” ) )
9
20
21 else :
22 r e t u r n r e n d e r t e m p l a t e ( ”HOMEhtml . h t m l ” , p r e d i c t i o n t e x t = ”You a r e l i k e l y t o h a v e h e a r t
d i s e a s e {} ” . f o r m a t ( f l o a t ( p r e d i c t i o n ) ) )
23 i f name == ” main ” :
24 app . r u n ( debug = T r u e )
Output
21
Figure 6.2: Output 2
22
Chapter 7
7.1 Conclusion
After implementing a machine learning approach for training and testing, we found that the accuracy
of the Logistic Regression is significantly more effective than other methods. Each algorithm’s con-
fusion matrix, error metrics, and accuracy score are used to evaluate performance. We achieved a
93.55% accuracy using logistic regression using data that was taken from the UCI repository .KNN
likewise predicts well, with an accuracy of 93.01%.In the future, heart disease prediction could be
improved by incorporating more data sources such as genetics, lifestyle, and environmental factors.
Machine learning algorithms could be used to identify patterns in the data and predict the risk of
heart disease. Additionally, artificial intelligence (AI) could be used to better understand the complex
relationships between different risk factors and their impact on heart health. AI could also be used to
develop personalized treatments for individuals based on their individual risk profiles.
23
Chapter 8
PLAGIARISM REPORT
plagiarism report of heart disease prediction project, you can use plagiarism detection tools that com-
pare your content with extensive databases and online sources. Uploading your project content to
tools like Turnitin, Grammarly Premium, Quetext, or Copyscape will yield a detailed report, high-
lighting any sections that match other texts and providing a similarity score. If you’re looking for free
options, platforms such as SmallSEOTools Plagiarism Checker or Plagscan allow you to paste text for
a quick analysis, though these may not be as comprehensive as paid services. Additionally, if you’re
affiliated with an academic institution, you may have access to premium software like iThenticate or
Turnitin, both commonly used for academic work, which can offer a more thorough plagiarism report.
These tools will help ensure the originality of your project and provide insights into any areas that
may need rephrasing.
24
Appendices
25
Appendix A
The contents...
26
References
[1] For references on heart disease prediction, here are some key sources, studies, and resources:
1. Machine Learning in Heart Disease Prediction Chen, X., Lin, X. (2019). ”Machine Learn-
ing Techniques for Heart Disease Prediction: A Survey.” International Journal of Healthcare In-
formation Systems and Informatics, 14(1), 1-19. This paper surveys various machine learning
techniques used in heart disease prediction.
2. Risk Factors and Models for Cardiovascular Disease Yusuf, S., Hawken, S., Ounpuu, S., et al.
(2004). ”Effect of potentially modifiable risk factors associated with myocardial infarction in 52
countries (the INTERHEART study): case-control study.” The Lancet, 364(9438), 937–952. This
large-scale study identifies key risk factors for heart disease, serving as a foundation for predictive
models.
3. Framingham Heart Study D’Agostino, R. B., Vasan, R. S., Pencina, M. J., et al. (2008). ”Gen-
eral cardiovascular risk profile for use in primary care: the Framingham Heart Study.” Circulation,
117(6), 743-753. The Framingham study provides a widely used risk score model for predicting
heart disease based on longitudinal data.
4. Heart Disease Data Set (UCI Machine Learning Repository) UCI Ma-
chine Learning Repository. (1988). Heart Disease Data Set. Available at
(https://archive.ics.uci.edu/ml/datasets/Heart+Disease). This dataset is commonly used in
machine learning projects for heart disease prediction.
5. Use of Artificial Intelligence in Cardiovascular Risk Prediction Esteva, A., Robicquet, A.,
Ramsundar, B., et al. (2019). ”A guide to deep learning in healthcare.” Nature Medicine, 25(1),
24-29. This article discusses the application of AI, including deep learning, in healthcare, with a
focus on predictive modeling for cardiovascular conditions.
27
General Instructions
• Cover Page should be printed as per the color template and the next page also should be printed
in color as per the template
• Wherever Figures applicable in Report , that page should be printed in color
• Dont include general content , write more technical content
• Each chapter should minimum contain 3 pages
• Draw the notation of diagrams properly
• Every paragraph should be started with one tab space
• Literature review should be properly cited and described with content related to project
• All the diagrams should be properly described and dont include general information of any
diagram
• All diagrams,figures should be numbered according to the chapter number
• Test cases should be written with test input and test output
• All the references should be cited in the report
• Strictly dont change font style or font size of the template, and dont customize the latex
code of report
• Report should be prepared according to the template only
• Any deviations from the report template,will be summarily rejected
• For Standards and Policies refer the below link
https://law.resource.org/pub/in/manifest.in.html
• Plagiarism should be less than 15%
28