Final Report
Final Report
A
Project Report
On
“HEART ATTACK RISK PREDICTION USING RETINAL
EYE IMAGES”
SUBMITTED IN PARTIAL FULFILLMENT FOR THE AWARD OF DEGREE OF
BACHELOR OF ENGINEERING
IN
COMPUTER SCIENCE AND ENGINEERING
SUBMITTED BY
2024 - 2025
|| Jai Sri Gurudev ||
Sri Adichunchanagiri Shikshana Trust ®
SJB INSTITUTE OF TECHNOLOGY
No.67, BGS Health & Education City, Dr.Vishnuvardhan Rd, Kengeri, Bengaluru, Karnataka 560060
CERTIFICATE
Certified that the Project Work entitled “HEART ATTACK RISK PREDICTION USING
RETINAL EYE IMAGES” carried out by DEEKSHITH KUMAR S, DHANUSH B G, DHANUSH M
R, HARSHAN GOWDA L V bearing USN 1JB21CS038, 1JB21CS039, 1JB21CS040,
1JB21CS055 are bonafide students of SJB Institute of Technology in partial fulfilment
for 7th semester of BACHELOR OF ENGINEERING in COMPUTER SCIENCE AND
ENGINEERING of the Visvesvaraya Technological University, Belagavi during the
academic year 2024-25. It is certified that all corrections/suggestions indicated for Internal
Assessment have been incorporated in the Report deposited in the Departmental library. The
project report has been approved as it satisfies the academic requirements in respect of
Project work prescribed for the said Degree.
1. __________________________ ______________________________
2.___________________________ ______________________________
ACKNOWLEDGEMENT
We would like to express our profound grateful to His Divine Soul Jagadguru Padmabhushan Sri Sri
Sri Dr. Balagangadharanatha Mahaswamiji and His Holiness Jagadguru Sri Sri Sri Dr.
Nirmalanandanatha Mahaswamiji for providing us an opportunity to complete our academics in this
esteemed institution.
We would also like to express our profound thanks to Revered Sri Sri Dr. Prakashnath Swamiji, BGS
& SJB Group of Institutions, for his continuous support in providing amenities to carry out this Project
Work in this admired institution.
We express our gratitude to Dr. Puttaraju, Academic Director, BGS & SJB Group of Institutions, for
providing us an excellent facilities and academic ambience; which have helped us in satisfactory
completion of Project work.
We express our gratitude to Dr. K. V. Mahendra Prashanth, Principal, SJB Institute of Technology, for
providing us an excellent facilities and academic ambience; which have helped us in satisfactory
completion of Project work.
We extend our sincere thanks to Dr. Babu N V, Dean Academic, SJB Institute of Technology, for
providing us an invaluable support throughout the period of our Project work.
We extend our sincere thanks to Dr. Krishan A N, Head of the Department, Computer Science and
Engineering for providing us an invaluable support throughout the period of our Project work.
We wish to express our heartfelt gratitude to our Project Coordinator & guide Dr. Arun Kumar D R,
Assistant Professor, Department of CSE for /his valuable guidance, suggestions and cheerful
encouragement during the entire period of our Project work.
Finally, we take this opportunity to extend our earnest gratitude and respect to our parents, Teaching &
Non teaching staffs of the department, the library staff and all our friends, who have directly or indirectly
supported us during the period of our Project work.
Regards,
DEEKSHITH KUMAR S [1JB21CS038]
DHANUSH B G [1JB21CS039]
DHANUSH M R [1JB21CS040]
HARSHAN GOWDA L V [1JB21CS055]
ABSTRACT
The structure and function of the microvascular are significantly influenced by the key cardiovascular
disease risk factors of hypertension and heart attacks. Images taken with a fundus camera can be used
to spot irregularities in the blood vessels of retina that indicate the extent injury on blood vessels by
hypertension and heart attacks. Using machine learning and AI techniques, detecting the preclinical
signs that fall below the threshold of an observer. The proposed methodology aimed to investigate the
effects of hypertension and heart attacks on morphological characteristics of retinal blood vessels.
With a diagnosis of hypertension and heart attack, data scientists collect retinal images. Interference
data is removed— information about structures other than that retinal vasculature using the vessel
segmentation method, leaving only morphological details about the blood vessel of retina. The
method aims to create a system for visual image-based heart disease detection, especially in young
people, to identify heart disease. In the study, a dataset of retinal imaging is used, and retinal vessel
segmentation is used to separate the vessels in the images. In a number of specialties, such as
laryngology, neurosurgery, and ophthalmology, the analysis of blood vessels is crucial for diagnosis,
therapy planning and execution, and assessment of clinical outcomes. Therefore, vessel segmentation
is a crucial method for using the retinal image to detect heart disease. Changes in the eyes may be a
sign of many conditions.
CHAPTER 1
INTRODUCTION
1.1 BACKGROUND
The heart is a kind of muscular organ which pumps blood into the body and is the
central part of the body’s cardiovascular system which also contains lungs. Cardiovascular
system also comprises a network of blood vessels, for example, veins, arteries, and
capillaries. These blood vessels deliver blood all over the body. Abnormalities in normal
blood flow from the heart cause several types of heart diseases which are commonly known
as cardiovascular diseases (CVD). Heart diseases are the main reasons for death worldwide.
According to the survey of the World Health Organization (WHO), 17.5 million total global
deaths occur because of heart attacks and strokes. More than 75% of deaths from
cardiovascular diseases occur mostly in middle-income and low-income countries. Also, 80%
of the deaths that occur due to CVDs are because of stroke and heart attack . Therefore,
prediction of cardiac abnormalities at the early stage and tools for the prediction of heart
diseases can save a lot of life and help doctors to design an effective treatment plan which
ultimately reduces the mortality rate due to cardiovascular diseases.
Due to the development of advance healthcare systems, lots of patient data are
nowadays available (i.e. Big Data in Electronic Health Record System) which can be used
for designing predictive models for Cardiovascular diseases. Data mining or machine
learning is a discovery method for analyzing big data from an assorted perspective and
encapsulating it into useful information. “Data Mining is a non-trivial extraction of implicit,
previously unknown and potentially useful information about data”. Nowadays, a huge
amount of data pertaining to disease diagnosis, patients etc. are generated by healthcare
industries. Data mining provides a number of techniques which discover hidden patterns or
similarities from data.
Therefore, in this paper, a machine learning algorithm is proposed for the
implementation of a heart disease prediction system which was validated on two open access
heart disease prediction datasets. Data mining is the computer based process of extracting
useful information from enormous sets of databases. Data mining is most helpful in an
explorative analysis because of nontrivial information from large volumes of evidence.
HEART ATTACK RISK PREDICTION INTRODUCTION
However, the available raw medical data are widely distributed, voluminous and
heterogeneous in nature .This data needs to be collected in an organized form. This
collected data can be then integrated to form a medical information system. Data mining
provides a user-oriented approach to novel and hidden patterns in the Data The data mining
tools are useful for answering business questions and techniques for predicting the various
diseases in the healthcare field. Disease prediction plays a significant role in data mining.
This paper analyzes the heart disease predictions using classification algorithms. These
invisible patterns can be utilized for health diagnosis in healthcare data.
Data mining technology affords an efficient approach to the latest and indefinite
patterns in the data. The information which is identified can be used by the healthcare
administrators to get better services. Heart disease was the most crucial reason for victims in
the countries like India, United States. In this project we are predicting the heart disease
using classification algorithms. Machine learning techniques like Classification algorithms
such as DNN Classifications, Logistic Regression are used to explore different kinds of
heart based problems.
1.2 MOTIVATION
Many hospital information systems are designed to support patient billing, inventory
management and generation of simple statistics. Some hospitals use decision support systems,
but they are largely limited. They can answer simple queries like “What is the average age of
patients who have heart disease?”, “How many surgeries had resulted in hospital stays longer
than 10 days?” “Identify the female patients who are single, above 30 years old, and who
have been treated for cancer.” However, they cannot answer complex queries like “Identify
the important preoperative predictors that increase the length of hospital stay”, “Given patient
records on cancer, should treatment include chemotherapy alone, radiation alone, or both
chemotherapy and radiation?”, and “Given patient records, predict the probability of patients
getting a heart disease.”
Clinical decisions are often made based on doctors’intuition and experience rather
than on the knowledge-rich data hidden in the database. This practice leads to unwanted
biases, errors and excessive medical costs which affects the quality of service provided to
patients. Wu, et alproposed that integration of clinical decision support with computer-based
patient records could reduce medical errors, enhance patient safety, decrease unwanted
practice variation, and improve patient outcome [17]. This suggestion is promising as data
modeling and analysis tools, e.g., data mining, have the potential to generate a knowledge-
rich environment which can help to significantly improve the quality of clinical decisions.
The main objective of this research is to develop a prototype Intelligent Heart Disease
Prediction System (IHDPS) using three data mining modeling techniques, namely, Decision
Trees, Naïve Bayes and Neural Network.
IHDPS can discover and extract hidden knowledge (patterns and relationships)
associated with heart disease from a historical heart disease database. It can answer complex
queries for diagnosing heart disease and thus assist healthcare practitioners to make
intelligent clinical decisions which traditional decision support systems cannot. By providing
effe
1.5 SCOPE
The scope of the project is to develop a fully functional, real-time healthcare application that
integrates machine learning models with an intuitive user interface. The key features
include:
1. User Authentication:
The application will include a robust authentication system to ensure that only authorized
users (doctors, scientists, patients) can access their respective dashboards. Users will be
able to sign up, log in, and reset their passwords as needed Role-Based Dashboards:
Three types of users will interact with the system: doctors, scientists, and patients. Each user
type will have a personalized dashboard:
• Doctors will have access to patient reports, predictions, and tools for making medical
decisions.
• Scientists will have access to aggregated health data for research purposes, with
advancedanalytics tools for exploring trends and patterns.
• Patients will have a simple, intuitive interface to submit their medical reports and view
predictions about their health.
Patients will be able to submit medical reports, including lab results, test reports, and
historical health data. This data will be used to generate predictions and track health
progress over time.
The system will use machine learning models to generate predictions based on the data
submittedby patients. Streamlit will be used to create interactive visualizations and graphs
that represent the predictions, trends, and health data, helping users better understand their
health status.
LITERATURE SURVEY
Machine Learning techniques are used to analyze and predict the medical data
information resources. Diagnosis of heart disease is a significant and tedious task in medicine.
The term Heart disease encompasses the various diseases that affect the heart. The exposure
of heart disease from various factors or symptom is an issue which is not complimentary from
false presumptions often accompanied by unpredictable effects. The data classification is
based on Supervised Machine Learning algorithm which results in better accuracy. Here we
are using the DNN Classifications as the training algorithm to train the heart disease dataset
and to predict the heart disease. The results showed that the medicinal prescription and
designed prediction system is capable of prophesying the heart attack successfully. Machine
Learning techniques are used to indicate the early mortality by analyzing the heart disease
patients and their clinical records (Richards, G. et al., 2001). (Sung, S.F. et al., 2015) have
brought about the two Machine Learning techniques, k-nearest neighbor model and
existing multi linear regression to predict the stroke severity index (SSI) of the patients.
Their study show that k- nearest neighbor performed better than Multi Linear Regression
model. (Arslan, A. K. et al., 2016) have suggested various Machine Learning techniques
such as support vector machine (SVM), penalized logistic regression (PLR) to predict the
heart stroke. Their results show that SVM produced the best performance in prediction
when compared to other models.Boshra Brahmi et al, [20] developed different Machine
Learning techniques to evaluate the prediction and diagnosis of heart disease. The main
objective is to evaluate the different classification techniques such as J48, Decision Tree,
KNN and Naïve Bayes. After this, evaluating some performance in measures of accuracy,
precision, sensitivity, specificity are evaluated. Deep Neural Networks (DNN), Support
Vector Machines (SVM), k-Nearest Neighbors (KNN), and Naïve Bayes have shown
promising results in predicting heart disease with high accuracy. Studies have demonstrated
that SVMs perform exceptionally well in prediction tasks, while KNN outperforms multi-
linear regression models in certain scenarios like stroke severity index prediction.
HEART ATTACK RISK PREDICTION LITERATURE SURVEY
Data Source
Clinical databases have collected a significant amount of information about patients and
their medical conditions. Records set with medical attributes were obtained from the
Cleveland Heart Disease database. With the help of the dataset, the patterns significant to the
heart attack diagnosis are extracted. The records were split equally into two datasets: training
dataset and testing dataset. A total of 303 records with 76 medical attributes were obtained.
All the attributes are numeric-valued. We are working on a reduced set of attributes, i.e. only
14 attributes.
1. Arti Gupta, Maneesh Shreevastava IJETAE, 2011. Medical Diagnosis using Back
Propagation Algorithm. In this paper, feed forward Back Propagation algorithm is
described which is used as a classifier to distinguish between infected and non-infected
person in medical diagnosis. The back propagation algorithm presented in this paper used
for training depends on a multilayer neural network with a very small learning rate,
especially when using a large training set size.
2. Shraddha Subhash Shirsath, Prof. Shubhangi Patil IJIRSET, June 2018. Disease
Prediction using Machine Learning over Big Data. This paper discusses about machine
learning algorithm which is used for the accurate disease prediction. Here to achieve the
incomplete data latent factor model is used. DNN algorithm is used for clarification of
large volume of data from hospital and then Convolutional Neural Network Based
Multimodal Disease Prediction (DNN-MDRP) algorithm helps to provide result of a
disease prediction.
3. Nikita Kamble, Manjiri Harmalkar, Manali Bhoir, Supriya Chaudhary, IJSRCSEIT, 2017.
Smart Health Prediction System Using Machine Learning. The paper presents an
overview of the Machine Learning techniques with its applications, medical and
educational aspects of Clinical Predictions. In medical and health care areas, due to
regulations and due to the availability of computers, a large amount of data is becoming
available. Such a large amount of data cannot be processed by humans in a short time to
make diagnosis, and treatment schedules. A major objective is to evaluate Machine
Learning techniques in clinical and health care applications to develop accurate decisions.
Dept. of CSE, SJBIT 2024-25 Page|6
HEART ATTACK RISK PREDICTION LITERATURE SURVEY
It also gives a detailed discussion of medical Machine Learning techniques which can
improve various aspects of Clinical Predictions. It is a new powerful technology which is
of high interest in computer world. It is a sub field of computer science that uses already
existing data in different databases to transform it into new researches and results. It
makes use of machine learning and database management to extract new patterns from
large data sets and the knowledge associated with these patterns. The actual task is to
extract data by automatic or semi-automatic means. The different parameters included in
Machine Learning include clustering, forecasting, path analysis and predictive analysis.
4. Nilesh Borisagar, Dipa Barad, Priyanka Raval, Conference paper (PICCN), April 2017.
Chronic Kidney Disease Prediction using Back Propagation Neural Network
Algorithm. In this paper, various training algorithms like Levenberg, Bayesian
regularization, Scaled Conjugate and Resilient back propagation algorithm are discussed.
After neural network is trained using back propagation algorithms, this trained neural
network system is used for detection of kidney disease in the human body. The back
propagation algorithms presented here have capacity for distinguishing amongst infected
patients or non-infected person.
5. Sellappan Palaniappan, Rafiah Awang IEEE, 2008. Intelligent Heart Disease Prediction
System Using Machine Learning Technique. This paper discusses about the
development of prototype using Machine Learning techniques, namely, DNN
Classification, DNN and Neural Network. It can answer complex “what if “queries which
traditional decision support system cannot.it is web-based, user-friendly, scalable, reliable
and expandable.
6. M.A. Nishara Banu, B Gomathy, IJTRA, Dec 2013. Disease Prediction System Using
Machine Learning Techniques. This paper analyzes the heart disease predictions using
different classification algorithms. Here medical Machine Learning techniques like
Association Rule Mining, Clustering and Classification Algorithms such as DNN
Classification, C4.5 Algorithm are implemented to analyze the different kinds of heart
based problems. Maximal Frequent Itemset Algorithm (MAFIA) is used for mining
maximal frequent item sets from a transactional database.
The literature survey on heart disease prediction using Machine Learning (ML) techniques
has provided several significant insights into the effectiveness, challenges, and practical
applications of these methods:
• Accuracy: Widely used but may be misleading for imbalanced datasets where the
majority class dominates.
• Precision, Recall, and F1-Score: These metrics provide a more nuanced
understanding of model performance, especially for datasets with class imbalances
(e.g., heart disease vs. no heart disease cases).
• Specificity and Sensitivity: Critical for medical applications to minimize false
negatives (missing a diagnosis) and false positives (incorrectly diagnosing heart
disease).
• ROC-AUC Curve: Provides a comprehensive measure of the model's ability.
3. Data Challenges
• High Dimensionality: Medical datasets often contain numerous features, requiring
dimensionality reduction techniques such as Principal Component Analysis (PCA) or
feature selection methods.
• Class Imbalance: Heart disease datasets frequently exhibit imbalanced distributions,
with significantly fewer positive cases. Techniques like SMOTE (Synthetic Minority
Oversampling Technique) are employed to address this.
• Missing Data: Clinical records often have missing values, necessitating imputation
methods or careful preprocessing to avoid bias.
• Heterogeneous Data: Data may come from various sources (e.g., electronic health
records, imaging, or wearable devices), requiring standardization and integration.
• Feature selection (e.g., age, blood pressure, cholesterol, ECG results) plays a crucial
role in improving model performance.
• Scaling techniques (e.g., normalization or standardization) are essential for algorithms
sensitive to feature magnitudes, such as SVM and KNN.
• Temporal data (e.g., time-series analysis of patient history) requires specialized
preprocessing techniques to capture trends over time.
5. Algorithm Loacations
• ML models have been integrated into decision support systems for clinicians, aiding in
early diagnosis and personalized treatment planning.
• Predictive models are being used in public health for identifying high-risk populations
and designing preventive interventions.
• Continuous monitoring systems leveraging wearable devices and ML algorithms
enable real-time prediction and alerts for heart disease patients.
7. Emerging Trends
These observations highlight the potential of ML in transforming heart disease diagnosis and
prevention while emphasizing the need for addressing specific challenges to maximize its
impact. These insights underscore the transformative role of Machine Learning in
revolutionizing heart disease diagnosis and prediction. By enabling early detection,
personalized treatment planning, and real-time monitoring, ML has the potential to
significantly improve patient outcomes and reduce mortality rates. However, to fully harness
its capabilities, it is essential to address critical challenges such as ensuring high-quality and
diverse datasets, enhancing model interpretability for clinical adoption, mitigating overfitting
and computational overhead, and integrating ML models seamlessly into existing healthcare
infrastructures. With continued advancements in algorithms, data processing techniques, and
interdisciplinary collaboration, ML can pave the way for more precise, accessible, and
impactful
1. Operating System:
• Windows: For development and testing environments.
• macOS: For compatibility with developers using Apple devices.
• Linux (Ubuntu Recommended): Preferred for server deployment due to
stability and compatibility with Python libraries and machine learning
frameworks.
2. Programming Language:
• Python: Version 3.8 or higher to ensure compatibility with the latest machine
learning and deep learning libraries.
3. Frameworks and Libraries:
• Deep Learning Frameworks:
• TensorFlow: For building and training deep learning models.
• PyTorch: Alternative framework for experimentation and model
prototyping.
• Machine Learning Libraries:
• Scikit-learn: For preprocessing and classical machine learning tasks.
• XGBoost: For high-performance gradient boosting models.
• Image Processing Libraries:
• OpenCV: For handling and preprocessing retinal images.
• PIL (Pillow): For image manipulation and augmentation.
• Data Analysis and Visualization Libraries:
• Numpy and Pandas: For numerical computations and data handling.
• Matplotlib and Seaborn: For creating plots and analyzing visual data
trends.
• Plotly: For interactive dashboards and advanced data visualization.
4. Database Management System (DBMS):
HEART ATTACK RISK PREDICTION SYSTEM REQUIREMENTS SPECIFICATION
1. Development Machine:
o Processor: Intel i5 or higher / AMD Ryzen 5 or higher.
o RAM: Minimum 16 GB for efficient model training and data processing.
o Storage: SSD with at least 512 GB for storing datasets and models.
o GPU: NVIDIA GPU with CUDA support (e.g., NVIDIA GTX 1660 or higher).
2. Server Requirements:
o Processor: Intel Xeon or equivalent with multi-core architecture.
o RAM: Minimum 32 GB for handling concurrent user requests and large
datasets.
o Storage: 2 TB HDD for logs and backups, 1 TB SSD for operational data.
o GPU: NVIDIA Tesla series or equivalent for deep learning inference.
3. User Devices:
o Desktop or Laptop:
▪ Processor: Intel i3 or equivalent.
▪ RAM: 4 GB or higher.
▪ Browser: Latest version of Chrome, Firefox, or Edge.
o Mobile Devices:
▪ Android or iOS with at least 2 GB of RAM and a modern browser.
1. Data Acquisition:
o Validate image quality and provide feedback if images are blurry or improperly
formatted.
2. Preprocessing:
3. Model Training:
o Validate model performance using metrics like accuracy, precision, recall, and
F1-score.
5. Data Storage:
6. User Interface:
o Notify users about results via email or SMS (if opted in).
1. Performance:
2. Scalability:
3. Security:
4. Availability:
5. Usability:
6. Maintainability:
1. Input Data:
2. Output Data:
3. Training Data:
1. Compliance:
2. Bias Mitigation:
3. Transparency:
ENVIRONMENT
Anaconda is a powerful distribution of Python and R that is widely used for data science,
machine learning, and scientific computing tasks. It simplifies package management and
deployment, especially when working with multiple projects or datasets. Anaconda
provides a complete and self-contained environment for Python development, making it
the ideal platform for data scientists and researchers. It includes Python, essential libraries,
tools, and an easy-to-use package manager called conda.
This section outlines the specific software requirements for setting up Python with
Anaconda as the development environment.
1. Development Machine:
o Processor: Intel i5 or AMD Ryzen 5 (or higher) for efficient model training and
development.
o RAM: Minimum of 16 GB of RAM is recommended to handle large datasets
and models.
o Storage: SSD with a minimum of 512 GB to store datasets, models, and
intermediate results.
o Graphics Processing Unit (GPU): NVIDIA GPU with CUDA support (e.g.,
NVIDIA GTX 1660 or higher) for accelerated model training using libraries
like TensorFlow and PyTorch.
2. Server Requirements (if using Anaconda for server-side applications or
deployments):
o Processor: Intel Xeon or equivalent, multi-core architecture for handling large-
scale operations and concurrent user requests.
o RAM: 32 GB or higher to manage large data and simultaneous processes.
o Storage: 1 TB SSD for operational data and 2 TB HDD for logs and backups.
o GPU: NVIDIA Tesla series or equivalent for deep learning model inference.
3. User Devices:
o Desktop or Laptop:
▪ Processor: Intel i3 or equivalent.
▪ RAM: 4 GB or higher for running small models or viewing results.
▪ Browser: Latest versions of Chrome, Firefox, or Edge for accessing
Jupyter Notebooks or web applications.
o Mobile Devices:
▪ Android or iOS with at least 2 GB of RAM for interacting with
deployed applications or APIs.
▪
2. Access Control:
• Role-based access control (RBAC) can be implemented for restricting access to certain
parts of the application or environment, especially in shared or multi-user settings.
The system requirements outlined for the project highlight the critical role that
software, hardware, and data play in the development and deployment of an AI-based
solution for heart attack risk prediction from retinal images. The chosen environment, such
as Anaconda, Python, and associated libraries, ensures a stable and efficient platform for
machine learning and deep learning tasks. Additionally, the hardware specifications,
including high-performance GPUs for model training and robust server capabilities for
handling large-scale user data and concurrent requests, are crucial for the seamless
functioning of the system.
The functional requirements emphasize the need for accurate image preprocessing,
effective model training, and reliable risk predictions that are essential for the clinical
decision-making process. The system should ensure that all patient data is handled
securely, complying with privacy regulations like HIPAA and GDPR, which is a core
component of building trust in such a system.
In summary, the combination of advanced technologies, rigorous regulatory
adherence, and a user-centered approach will make this system not only efficient and
scalable but also trustworthy and effective in predicting heart attack risks, ultimately
improving patient care and health outcomes.
A use case diagram shows the system’s functionality and the interactions between users
(or external systems) and the system itself. For your heart attack risk prediction system,
the following use cases can be defined:
Actors:
• Patient/User: The end user who uploads retinal images and views the results.
• Admin: The user managing the backend, including training the model, monitoring
system performance, and maintaining user accounts.
• Model (System): The machine learning model that processes retinal images and
provides risk predictions.
Use Cases:
• Upload Retinal Image: A patient uploads an image of their retina in a supported
format (e.g., JPEG, PNG).
• Preprocess Image: The system preprocesses the uploaded image (grayscale
conversion, resizing).
• Predict Heart Attack Risk: The system uses the trained model to predict the risk
of heart attack based on the processed retinal image.
• View Prediction Results: The patient views the heart attack risk prediction,
categorized as low, medium, or high.
• Store Data: The system stores the image and its corresponding prediction in the
database.
• Generate Report: The system generates a detailed risk assessment report.
• Admin Dashboard: Admin monitors system logs, updates the model, and
manages user data.
Diagram Overview:
The diagram will show the relationships between the actors and use cases, with the
patient primarily interacting with the system to upload images and view results. The
admin would interact with the backend to monitor, update, and maintain the system.
HEART ATTACK RISK PREDICTION SYSTEM DESIGN
An activity diagram represents the workflow of the system for a specific use case,
detailing the steps and decisions involved in processing an uploaded retinal image and
predicting the heart attack risk.
Workflow for Predicting Heart Attack Risk:
1. Start
2. Upload Retinal Image:
o The patient uploads a retinal image.
3. Preprocessing:
o Convert image to grayscale.
o Resize image to a standard size (e.g., 224x224 pixels).
o Augment image (optional).
4. Image Validation:
o Check if the image format is correct (JPEG/PNG).
o Validate the image quality (e.g., ensure it is not blurry).
A sequence diagram illustrates the interaction between objects or components over time
for a specific process. For your project, a sequence diagram can describe the process of
predicting heart attack risk from the moment the patient uploads a retinal image to when
the prediction is displayed.
Sequence of Operations:
1. Patient: Uploads the retinal image.
2. System: Receives the image and initiates preprocessing.
o Converts the image to grayscale.
o Resizes the image to the correct dimensions.
o Validates the image quality.
3. Model: Processes the preprocessed image and returns the risk score.
4. System: Categorizes the result (low, medium, or high risk).
Diagram Overview:
• The sequence diagram will show each step as a vertical lifeline (e.g., Patient,
System, Model) with horizontal arrows representing interactions between them
(e.g., uploading the image, calling the model, displaying the result).
In Level 0, the DFD represents the entire system as a single process. This provides a broad
overview of how data flows from collection to the output.
Processes:
1. Collection
o External Entity: Patient
o Description: The patient uploads retinal images for analysis.
o Data Flow: Retinal Image Data → System
2. Processing
o External Entity: Data Processing Service
o Description: The system processes the image data to generate predictions.
It involves all necessary steps like preprocessing, applying algorithms, and
extracting features from the image.
o Data Flow: Retinal Image Data → Processed Data → Prediction Model
3. Random Selection
o External Entity: Algorithm Selection Module
o Description: Randomly selects a specific model or set of features for
training, which helps in optimizing and testing various configurations of the
model.
o Data Flow: Processed Data → Random Selection → Trained Model
4. Trained Dataset
o External Entity: Model Training
o Description: The processed data is used to train the deep learning model.
After model training, the dataset is applied for predictions to determine
heart attack risks.
o Data Flow: Trained Data → Model Training → Predictions → Output
Results
In Level 1 of the Data Flow Diagram (DFD), we break down the high-level processes
identified in Level 0 into smaller, more detailed components. This provides an in-depth
look at how data is processed within each phase and the interactions between the
processes, data stores, and the system components. Below is an elaboration of each process
and its corresponding flow.
1. Collection Process
Input:
• Retinal Images uploaded by the patient: These images are captured by the
patient and uploaded to the system for analysis. The images may come in various
formats such as JPEG, PNG, or TIFF.
Process:
• Receive and store retinal images:
o The Collection process begins when the patient uploads their retinal images
to the system. These images are first received by the system and stored in a
Data Store for further processing.
o The system may verify the format and quality of the images to ensure they
are suitable for further processing.
Output:
• Image Data: The images are stored and organized in the system for later use in the
Preprocessing stage. They are saved in a Retinal Image Data Store where they
can be easily accessed by subsequent processes.
Data Flow:
• Retinal Image Data → System: The images are transferred from the patient to the
system, initiating the collection process.
2. Preprocessing
Input:
• Retinal Image Data: The raw retinal images collected in the first step are fed into
the preprocessing system for further refinement.
Process:
• Preprocessing:
o The Preprocessing stage is essential for improving the quality of the
images before extracting meaningful features. The process includes several
steps:
▪ Noise Removal: Filters are applied to remove any irrelevant or
distracting data (e.g., blur or distortion) from the images.
▪ Resizing: The images are resized to a uniform dimension to make
them easier to analyze and reduce computational load during later
stages.
▪ Grayscale Conversion: The images are converted to grayscale, as
this reduces complexity by eliminating color data and focusing
Dept. of CSE, SJBIT 2024-25 P a g e | 21
HEART ATTACK RISK PREDICTION SYSTEM DESIGN
3. Feature Extraction
Input:
• Preprocessed Image Data: The processed images (post noise removal, resizing,
and grayscale conversion) are fed into the Feature Extraction system for the next
phase of analysis.
Process:
• Feature Extraction:
o During the Feature Extraction process, specific patterns and attributes are
identified in the retinal images that are most relevant for heart attack risk
prediction. These features help the machine learning model to understand
the image and make accurate predictions.
o Common features that might be extracted from retinal images include:
▪ Blood Vessel Patterns: Analysis of blood vessel morphology
(shape, branching patterns) which can indicate cardiovascular
conditions.
▪ Anomalies: Detection of abnormal regions such as hemorrhages,
exudates, or other vascular irregularities, which are indicative of
heart problems.
o Specialized algorithms like edge detection or texture analysis could be used
to identify these features from the images.
Output:
• Extracted Features: After processing, the key features of the image (such as blood
vessel patterns or anomalies) are isolated and stored in a structured format. These
Dept. of CSE, SJBIT 2024-25 P a g e | 21
HEART ATTACK RISK PREDICTION SYSTEM DESIGN
extracted features will be used as input for the predictive algorithm in the next
stage.
Data Flow:
• Preprocessed Image → Extracted Features: The preprocessed image is passed to
the feature extraction system, which generates a set of meaningful features.
4. Apply Algorithm
Input:
• Extracted Features: These are the relevant patterns and attributes that have been
identified from the retinal images. They serve as the input for the machine learning
model to generate predictions.
Process:
• Apply Algorithm:
o The Apply Algorithm process is where the actual prediction happens. The
extracted features are input into a machine learning model or algorithm to
evaluate the risk of a heart attack.
o Various types of algorithms can be used for this prediction, including:
▪ Convolutional Neural Networks (CNNs): These are deep learning
algorithms particularly suited for image analysis, as they can
automatically detect patterns like blood vessel structures or
anomalies in the retinal images.
▪ Other Machine Learning Models: Depending on the architecture
of the system, you may also apply other algorithms such as decision
trees, random forests, or support vector machines, based on the
feature extraction.
o The machine learning model processes the features and outputs a prediction
about the heart attack risk. This prediction could be a:
▪ Numerical Score: A value representing the likelihood of the patient
experiencing a heart attack (e.g., 0.75 means 75% risk).
▪ Classification: The risk could be categorized into low, medium, or
high based on thresholds set during training.
Output:
• Risk Prediction Result: This is the final output of the system, where a prediction
Dept. of CSE, SJBIT 2024-25 P a g e | 21
HEART ATTACK RISK PREDICTION SYSTEM DESIGN
about the heart attack risk is made. The result is typically a numerical value or a
categorical classification (low, medium, high).
Data Flow:
• Extracted Features → Algorithm → Prediction Model → Prediction Result:
The extracted features are passed into the prediction model (algorithm), which
generates a risk prediction that flows to the final output stage.
4.6 ALGORITHMS
LogR models the data points using the standard logistic function, which is
an S- shaped curve also called as sigmoid curve and is given by the equation:
• For a binary regression, the factor level 1 of the dependent variable should
represent the desired outcome.
• Even though, logistic (logit) regression is frequently used for binary variables (2
classes), it can be used for categorical dependent variables with more than 2
classes.
In this case it’s called Multinomial Logistic Regression.
• First ,start with the selection of random samples from a given dataset.
• Next ,this algorithm will construct a decision tree for every sample .Then it will get
the prediction result from every decision tree .
The Heart Attack Risk Prediction System analyzes retinal images to predict heart disease
risk and health indicators like age group, SBP, BMI, and HbA1c. The system begins with an
input layer, where retinal images are fed into the model. In the preprocessing layer, images
are normalized, resized, and cleaned for better quality. Feature extraction is done through
methods like handcrafted feature extraction, focusing on key retinal characteristics such as
blood vessel patterns. The extracted features are then passed to a Recurrent Neural
Network (RNN), which classifies the data and predicts health metrics. The RNN model
assigns probabilities to risk levels (No Risk, Low Risk, Mild Risk, High Risk) and
continuous values like SBP, BMI, and HbA1c. A softmax function is used for classification
probabilities, while regression techniques predict continuous outputs. The output layer
displays the final predictions for heart attack risk and health metrics.
Dept. of CSE, SJBIT 2024-25 P a g e | 21
HEART ATTACK RISK PREDICTION SYSTEM DESIGN
1. Input Features:
o Features extracted from the retinal images, such as blood vessel density, optic disk
size, and other biomarkers.
o These features are numerical values that represent potential indicators of heart attack
risk.
2. Trained RNN Model:
o A pre-trained RNN model that has learned the relationship between the input features
and the risk levels.
o The model uses its layers to process the input features and predict the risk
probabilities.
3. Output Probabilities:
o The model outputs a probability distribution across the four risk levels:
▪ No Risk: P1
▪ Low Risk: P2
▪ Mild Risk: P3
▪ High Risk: P4
4. Classification Logic:
o The logic selects the class with the highest probability.
o Example: If the probabilities are [0.1, 0.6, 0.2, 0.1], the classification result is Low
Risk.
5. Final Risk Classification:
o The final output of the system, which could be displayed to the user as No Risk, Low
Risk, Mild Risk, or High Risk.
IMPLEMENTATION
5.1 DJANGO
Django is a high-level Python web framework that promotes rapid development and clean,
pragmatic design. It’s one of the most popular frameworks for building web applications
because it helps developers efficiently create robust, scalable, and secure websites.
layer (HTML).
View: Contains the business logic and interacts with the model and template.
Built-in Admin Interface: Django provides an automatic and customizable admin interface
formanaging the website’s data.
ORM (Object-Relational Mapping): Django includes a powerful ORM to interact with the
database.Instead of writing raw SQL queries, you can manipulate the database using Python
code.
Security: Django includes built-in protection against common security threats, such as SQL
injection, cross-site scripting (XSS), cross-site request forgery (CSRF), and clickjacking.
Scalability: Django is designed to handle high traffic, making it suitable for large-scale
applications.
Batteries-Included: Django comes with many built-in features like user authentication,
sessionmanagement, caching, and more, so developers don’t have to reinvent the wheel.
HEART ATTACK RISK PREDICTION IMPLEMENTATION
5.1.1 FRONTEND
The frontend of the application is built using Django, which handles the presentation layer,
user interface, and data interaction. Key components of the frontend include user
authentication, role- specific dashboards, and data submission.
1. User Authentication:
• Signup and Login: The signup and login forms are created using Django forms. Users
can register by providing basic details, and after successful registration, they are
redirected to their respective dashboards.
• Role-Based Authentication: Based on the user’s role (Doctor, Scientist, or Patient), they
are assigned a role-specific dashboard after login. This is achieved through Django’s
Group and Permission system, ensuring that users only access data and features relevant
to their roles.
2. Dashboard:
• Doctor Dashboard: Displays a list of patients, their medical reports, and prediction
results.
Doctors can add, update, or view patient information and manage appointments.
• Patient Dashboard: Patients can view their personal health data, submitted medical
reports, and any predictions made by the system. They can also upload new reports for
further analysis.
• Scientist Dashboard: Scientists have access to anonymized patient data for research
purposes, allowing them to analyze trends and run predictive models for healthcare
research.
• Data Display: The dashboard displays key metrics, trends, and predictions in a user-
friendly format, such as tables, charts, and graphs, making it easier for users to interpret
the data.
3. Data Submission:
• Medical Reports: Patients can submit medical reports directly through the dashboard.
The submission form allows patients to upload files, such as lab reports or diagnostic
tests, in a structured format.
• Data Validation: Django’s form validation ensures that the data submitted by patients is
accurate and complete. Upon successful submission, the reports are stored in the
database and linked to the corresponding patient record.
• File Handling: The reports are stored as files in the backend, with file paths saved in the
database. Django handles the file management efficiently, ensuring easy access and
retrieval when required.
5.1.2 BACKEND
The backend of the system is responsible for data processing, integrating machine learning
models for predictions, and managing communication between the frontend and Streamlit
for real-time interactions.
1. Data Processing:
• Handling User Data: Upon receiving data from the frontend, Django processes the
information before storing it in the database. This includes ensuring that the data entered
by users (e.g., medical reports, patient history) is correctly formatted and validated.
• Integration with Machine Learning Models: Django integrates machine learning models
to make predictions based on the patient’s medical data. When a patient submits a report,
the system processes the data and runs it through pre-trained models to generate
predictions, such as the likelihood of a disease, possible health risks, or treatment
recommendations.
• Prediction Flow: Once predictions are made, they are stored in the database and
displayed to the user on their dashboard. This helps doctors, patients, and scientists
make informed decisions based on real-time analysis.
• Data Communication: Django communicates with Streamlit through API calls, ensuring
that the system can pass patient data to Streamlit for real-time predictions and
visualizations.
• Real-Time Predictions: When a patient submits a medical report, the backend processes
the data, sends it to Streamlit for prediction, and then displays the results in the
frontend. This allows fora seamless flow of data between the backend and the Streamlit
app, providing instant predictions and visualizations.
• Streamlit API Integration: Streamlit is embedded within the Django application, where it
is accessed through an iframe or URL redirection. This ensures that the predictions and
visualizations provided by Streamlit are easily accessible within the user interface.
Streamlit serves as the core tool for real-time predictions and interactive data visualizations.
It helps make machine learning models and predictions accessible to users, offering a simple
and intuitive interface. Streamlit for prediction, and then displays the results in the
frontend. This allows fora seamless flow of data between the backend and the Streamlit app,
providing instant predictions and visualizations.
1. User-Friendly Interface:
2. Real-Time Visualizations:
• Prediction Results: The real-time predictions generated by machine learning models are
displayed in the form of charts, graphs, and statistical analyses. These visualizations help
doctorsand patients understand the implications of the predictions and how they can take
appropriate action.
• Health Data Trends: Streamlit allows users to visualize trends in health data, such as the
progression of a disease or the effects of treatments over time. These visualizations are
crucial for both doctors and scientists to track patient health and make informed
decisions.
• Model Deployment: Streamlit is used to deploy machine learning models that analyze
patient data and provide predictions in real-time. The models may include algorithms for
disease prediction, risk analysis, or treatment recommendations.
• Instant Predictions: When new data is entered or when a patient submits a medical
report, Streamlit processes this data using the deployed machine learning models and
instantly generates predictions. These predictions are then displayed to the user via the
interactive dashboard.
• Instant Results: As the patient submits new data or modifies existing information,
Streamlit updates the visualizations and predictions in real-time. This ensures that
healthcare professionals can access the most up-to-date information at any time.
• Charts and Graphs: Streamlit provides built-in support for rendering dynamic charts and
graphs (e.g., line charts, bar graphs, and pie charts), making it easier for users to interpret
complex medical data visually.
Transformer Architecture
The Transformer consists of two main components: the encoder and the decoder. GPT-2,
however, uses only the decoder portion of the Transformer, which is optimized for
generating sequences of text.
• Pre-training: In this phase, the model is trained on a massive corpus of text data using an
unsupervised learning approach. During pre-training, GPT-2 learns to predict the next
word ina sentence given the previous words, thereby learning grammar, facts about the
world, and some level of reasoning. The pre-training dataset for GPT-2 includes a wide
variety of internet text sources, such as articles, books, and websites. This diverse
training data enables GPT-2 to generate coherent text across various domains.
• Pre-training: In this phase, the model is trained on a massive corpus of text data using an
unsupervised learning approach. During pre-training, GPT-2 learns to predict the next
word ina sentence given the previous words, thereby learning grammar, facts about the
world, and some level of reasoning. The pre-training dataset for GPT-2 includes a wide
variety of internet text sources, such as articles, books, and websites. This diverse
training data enables GPT-2 to generate coherent text across various domains.
GPT-2 is available in several versions, each with a different number of parameters, ranging
from 117 million to 1.5 billion parameters. These parameters refer to the weights in the
neural network that the model learns during training. The larger the model, the more
complex patterns and relationships it can capture, enabling it to generate more coherent and
contextually appropriate text. However, larger models require more computational
resources for both training and inference.
The largest GPT-2 model, with 1.5 billion parameters, is capable of generating high-quality
text that is often indistinguishable from text written by humans. It can handle a wide range
of NLP tasks, including writing essays, generating code, creating poetry, and answering
questions.
2. Text Generation: One of GPT-2’s most notable features is its ability to generate human-
like text based on a given prompt. Given an initial seed, GPT-2 generates a continuation
that fits naturallywith the prompt, showing a high level of fluency and creativity.
4. Scalability: The performance of GPT-2 improves as the size of the model increases.
Larger models, with more parameters, can handle more complex language tasks and
generate more sophisticated outputs.
Applications
• Content Creation: GPT-2 is widely used in automated content generation for articles,
• Code Generation: GPT-2 has been used to generate code snippets based on user input,
assistingin programming tasks.
Ethical Considerations
While GPT-2 is powerful, it raises important ethical concerns. One significant concern is the
potential for misuse in gen2erating misleading or harmful content, such as fake news or
deepfakes. OpenAI initially withheld the release of the largest version of GPT-2, citing
these concerns. Additionally, GPT-2’s ability to generate text with little oversight means
that it could be used to create harmful content on a large scale, leading to issues related to
misinformation, bias, andfairness.
The system design for the heart attack risk prediction model emphasizes the importance of
using advanced machine learning techniques to analyze diverse patient data and predict heart
attack risks. By considering factors such as medical history, age, gender, blood pressure,
cholesterol levels, smoking habits, and physical activity, the model leverages predictive
analytics to provide accurate risk assessments. The architecture ensures that data is securely
collected, processed, and analyzed, prioritizing patient confidentiality and compliance with
healthcare standards.
Furthermore, the design is highly modular, allowing for easy updates and integration with
new medical research or datasets. The use of APIs ensures seamless interaction with other
healthcare systems, facilitating quick access to real-time patient information. The application
also provides actionable insights through visualizations and personalized recommendations,
helping healthcare professionals make more informed decisions.
TESTING
Testing and evaluation are critical steps in ensuring that the system functions as expected, meets
performance requirements, and provides a positive user experience. This chapter outlines the testing
strategies used to evaluate the functionality, performance, and usability of the healthcare
application.
Functional testing focuses on verifying that the application performs its intended functions correctly.
This includes testing the core features like user authentication, data submission, and prediction
accuracy.
1. User Authentication:
• Sign-Up and Login: Test the entire user authentication flow to ensure users can sign up with
valid credentials, log in successfully, and are directed to their respective dashboards based on
their roles (Doctor, Patient, or Scientist).
• Role-Based Access: Ensure that the application correctly assigns permissions and displays role-
specific content. This involves verifying that a user with the Doctor role can access doctor-
related functionalities, while a Patient can only access personal health data and prediction
results.
2. Data Submission:
• Medical Report Submission: Verify that the data submission feature works correctly. Patients
should be able to upload medical reports without errors. The reports should be validated for
completeness and accuracy before being stored in the database.
• Database Integrity: Test the system’s ability to store and retrieve patient data accurately. Ensure
that each medical report is linked to the correct patient and that all relevant data (such as patient
history, medical details, etc.) is stored appropriately.
3. Prediction Accuracy:
HEART ATTACK RISK PREDICTION TESTING
• Machine Learning Model Testing: Evaluate the performance and accuracy of the machine
learning models integrated within the Streamlit application. This includes assessing how well
the models predict health outcomes based on patient data and medical reports.
• Test Dataset: Use a test dataset with known outcomes to verify the accuracy of predictions.
Compare the predictions with actual outcomes and compute performance metrics such as
accuracy, precision, recall, and F1 score.
• Real-Time Predictions: Test the real-time prediction feature to ensure that predictions are
accurate and updated promptly when new data is submitted.
Performance testing ensures that the application can handle the required load and performs
efficiently, even under heavy usage conditions. This section evaluates the responsiveness,
scalability, and efficiency of the system.
1. Response Time:
• Prediction Response Time: Measure the system’s response time when a user submits data for
predictions. The system should be able to provide predictions in real-time or within an
acceptable timeframe.
• Load Time: Test the time it takes for the application to load user dashboards and display data.
Any delay in loading dashboards or visualizations could hinder user experience.
2. Scalability:
• User Load Testing: Simulate a large number of users accessing the system simultaneously.
Measure how well the application handles multiple users interacting with the system at the same
time. The system should scale to accommodate a growing user base without significant
degradation in performance.
• Data Volume Testing: Evaluate the system's ability to handle large volumes of data, such as
medical reports and predictions, especially when dealing with multiple patient records. The
database should perform well under heavy data loads, and the application should continue to
function smoothly.
3. Stress Testing:
• Peak Load: Simulate peak load conditions to test the system’s resilience. Test how the system
behaves under stress and if it can recover gracefully in case of high traffic or overload situations.
• Server Resources: Monitor server resources (CPU, memory, network bandwidth) during stress
tests to ensure the system does not exceed resource limits and crash during high traffic.
Usability testing assesses how easy and intuitive the system is for users to interact with. It involves
testing the application with real users from different roles (Doctors, Patients, Scientists) to gather
feedback and ensure that the system meets user expectations.
• Ease of Use: Evaluate the user interface (UI) for intuitiveness and ease of navigation. Ensure
that users can easily understand how to perform tasks like signing up, logging in, submitting
reports, and viewing predictions.
• Visualization Clarity: Collect feedback on the clarity and usefulness of the interactive
visualizations (charts, graphs, etc.) displayed in the Streamlit dashboards. Ensure that
predictions and trends are presented in a way that is easy for users to understand and act upon.
• Interactivity: Test the interactive features, such as adjusting input parameters to see real-time
prediction updates. Ensure that users can interact with the visualizations smoothly without
delays or errors.
3. User Feedback:
• Patient Feedback: Gather feedback from patients regarding the ease of submitting reports and
accessing their predictions. Patients should be able to navigate the application without difficulty
and find the information they need quickly.
• Doctor and Scientist Feedback: Doctors and scientists should provide feedback on how useful
the system is for patient analysis, including the accuracy of predictions and the clarity of
visualizations. Their feedback is essential in refining the system’s medical features.
• Satisfaction Survey: After conducting usability testing, administer a satisfaction survey to gauge
overall user experience. This can help identify areas for improvement, such as UI design,
functionality, or data presentation.
4. Accessibility Testing:
• Cross-Platform Testing: Ensure that the application works seamlessly on various devices and
platforms (e.g., desktop, mobile, and tablet) to cater to different user needs.
• Accessibility Features: Test the application’s accessibility features for users with disabilities,
such as providing alternative text for images, ensuring keyboard navigation works, and testing
color contrast for readability.
Once testing is completed, the next step is to evaluate the results based on the functional,
performance, and usability tests. The findings will be analyzed to identify areas of improvement,
such as:
• Optimizing Prediction Models: If the prediction accuracy is not up to the required standard, the
machine learning models can be fine-tuned or retrained using a larger dataset.
• Improving System Performance: If the system’s response time is slower than desired, it may
require optimization through better database indexing, query optimization, or caching strategies.
This chapter presents the outcomes of the testing phase, followed by a detailed discussion on the
effectiveness of combining Django and Streamlit for the healthcare application. The focus is on
assessing the functionality, performance, and accuracy of the prediction models, as well as the
challenges faced during development.
The results section presents the findings from the functional, performance, and machine learning
model tests.
7.1 RESULT
• User Authentication: The authentication system performed as expected. All users (Doctors,
Patients, Scientists) were able to sign up, log in, and access their role-based dashboards without
any issues. The role-based access control worked seamlessly, ensuring that users could only
view and interact with relevant data.
• Data Submission: Patients were able to successfully upload medical reports to the system. The
submission feature was tested across various file formats, and the system accurately processed
and stored the data in the database without errors.
• Real-Time Predictions: Real-time predictions worked effectively. When medical reports were
submitted, the machine learning models provided predictions almost instantaneously through
Streamlit. The predictions were displayed on interactive visualizations, allowing users to
analyze the results in real time.
2. Performance Benchmarks:
• Response Time: The application’s response time for data submission and real-time prediction
was satisfactory. On average, it took about 3–5 seconds for predictions to be generated after data
submission, which is well within the expected range for such applications.
• Scalability: During load testing with a high number of simultaneous users, the system
demonstrated scalability. The application was able to handle 100+ concurrent users without
HEART ATTACK RISK PREDICTION RESULTS AND SNAPSHOTS
significant delays or performance degradation. Server resource usage was monitored, and the
application did not experience any memory or CPU spikes during peak loads.
• Stress Testing: Stress tests revealed that the system could handle a high volume of data
submissions (medical reports) without any performance degradation. However, as the volume
of data increased exponentially, the system showed slight slowdowns in data retrieval times,
indicating a need for further optimization in database queries and indexing.
The accuracy of the machine learning models used for predictions was evaluated using a test dataset.
The models achieved an accuracy rate of 93%, which is considered satisfactory for a real-time
healthcare prediction system. The predictions were particularly accurate in diagnosing common
conditions but showed some variability for more complex cases.
• Precision and Recall: For disease prediction tasks, the models demonstrated high precision and
recall (93% and 100%, respectively), which means that the system was good at predicting
positivecases and minimizing false negatives.
• Real-Time Prediction Evaluation: In real-time, predictions were generated based on the data
submitted by users. The results were consistent with offline testing, where the models produced
reliable predictions for disease diagnosis and patient health analysis.
7.2 DISCUSSION
This section discusses the effectiveness of combining Django and Streamlit in this healthcare
application and highlights some of the challenges encountered during development.
• Seamless Integration: Combining Django for backend and frontend functionalities with
Streamlit for real-time predictions proved to be highly effective. Django handled user
authentication, data submission, and database management efficiently, while Streamlit provided
a user-friendly interface for displaying real-time predictions and visualizations. This separation
of concerns allowed for easier management of both the application’s backend and its interactive
frontend.
• Real-Time Data Interaction: Streamlit's ability to display real-time predictions and interactive
visualizations added significant value to the application. Doctors and scientists could analyze
patient reports in real time, make quick decisions, and provide timely medical interventions. The
integration of machine learning models in Streamlit allowed for instant feedback based on the
submitted medical data, which improved the overall user experience.
• User Experience: The role-specific dashboards (for Doctors, Patients, and Scientists) allowed
users to focus on their respective tasks and access relevant data. Streamlit’s interactive
visualizations helped doctors and scientists easily interpret predictions and spot potential issues.
Patients were also able to view their health predictions and access their medical reports with
minimal navigation.
2. Challenges Faced:
• Real-Time Predictions with Streamlit: One of the main challenges was integrating real-time
predictions within the Streamlit interface. While Streamlit is designed for quick and easy
deployment of machine learning models, ensuring the predictions were updated in real-time as
users interacted with the system required careful handling of data flow between Django and
Streamlit. This involved establishing an efficient data pipeline to send data from Django to
Streamlit without introducing latency.
• Handling Large Datasets: Another challenge encountered during the development phase was
handling large datasets. As the number of medical reports and patient records grew, the system
faced slowdowns in retrieving data from the database and generating predictions. This issue was
mitigated by optimizing database queries and indexing, but it highlighted the need for future
improvements in data storage and retrieval strategies to handle large volumes of healthcare data.
• Model Performance on Complex Cases: While the machine learning models performed well for
common diseases, they showed some inconsistencies when predicting more complex medical
conditions. This is a known issue in healthcare prediction systems, as data quality and model
training play a significant role in performance. More diverse training datasets and model tuning
would be required to enhance the accuracy of predictions for complex or rare conditions.
• User Interface Improvements: Although the system provided role-specific dashboards, feedback
from user testing indicated that some users found the interface slightly cluttered, especially when
viewing predictions. Future iterations of the system will focus on improving the layout and
simplifying the visual elements to make it more user-friendly.
3. Future Improvements:
• Improving Prediction Models: One of the primary future goals is to improve the accuracy of the
machine learning models, particularly for complex medical conditions. This can be achieved by
incorporating more diverse datasets, experimenting with different algorithms, and conducting
further model training.
• Enhanced Scalability: To address the performance issues related to large datasets, the system
will benefit from improved database architecture, such as introducing sharding or partitioning
strategies for large patient records and reports. Additionally, optimizing the deployment
environment for handling higher traffic will ensure the system remains performant as the number
of users increases.
• AI-based Assistance: Future versions of the system could incorporate AI-based assistance to
help doctors and patients interpret the predictions. RNN could be used to generate reports and
explanations for the predictions, making the results more comprehensible and actionable.
7.3 SNAPSHOTS
Fig 7.1 shows the home page of the Heart Attack Risk Prediction System serves as an introduction to
the project, displaying a concise abstract that outlines its purpose and functionality. It provides users
with a brief overview of how the system predicts the likelihood of a heart attack based on various
medical. This page is designed to offer an easy-to-understand summary of the project's goal factors.
Fig7.2 shows the Admin Login Page is a secure gateway for administrators to access and manage the
Heart Attack Risk Prediction System. It ensures that only authorized personnel can view, update, and
maintain the system's data, including user records and system configurations.
Fig 7.3 shows the Retinal Image Upload Page enables users to upload retinal images for analysis as
part of the heart attack risk prediction process. This page provides a simple and intuitive interface,
ensuring a seamless upload experience.
Fig 7.4 shows the Image Clustering functionality, demonstrated in the command line interface
(CMD), showcases the system's ability to group similar retinal images based on extracted features.
Using advanced clustering algorithms, such as K-Means or hierarchical clustering, this step helps in
organizing and analyzing image data efficiently.
Fig 7.5 shows the Clustered Image Page visually represents the results of the image clustering
process, displaying retinal images grouped into their respective clusters. Each cluster is organized
based on shared features, making it easier to identify patterns and correlations among the data.
Fig 7.6 shows the Prediction Page displays the outcomes of the heart attack risk prediction model.
Users can view the predicted risk level based on the uploaded retinal image and associated data. The
page provides clear and concise results, ensuring that users can easily understand the prediction and
take necessary actions or seek further medical advice based on the outcome.
Fig 7.7 shows the Test Image Upload Page allows users to upload retinal images for testing the heart
attack risk prediction model. This page ensures a user-friendly experience with straight forward
options for selecting and uploading images. It serves as a crucial step in the workflow, facilitating
accurate predictions based on the uploaded medical data.
Fig 7.8 shows the Risk Prediction Result Page displays the outcome of the heart attack risk analysis
based on the uploaded retinal image. This page provides users with clear and concise results,
highlighting the risk level with detailed insights. It ensures that users receive actionable information,
aiding in better understanding and potential decision-making regarding their health.
Fig 7.9 shows the Standard Normal Conditions Page for a Healthy Heart presents baseline parameters
and guidelines that define a healthy cardiovascular state. It serves as a reference for users to compare
their prediction results with normal standards. This page promotes awareness of ideal health metrics,
empowering users to take proactive measures towards maintaining a healthy heart.
Fig 7.10 shows the Accuracy Page for Test and Training Images showcases the performance metrics
of the heart attack risk prediction model. It includes visual representations, such as graphs or tables,
highlighting the accuracy achieved during the model's training and testing phases. This page
emphasizes the model's reliability and effectiveness in analyzing and predicting heart health based
on retinal images.
The snapshots of the Heart Attack Risk Prediction project visually capture the system's key
functionalities and flow. The Home page presents an overview of the project's objective, providing
users with a concise introduction to the application. The Admin Login page showcases the secure
authentication process, ensuring that only authorized personnel can access sensitive system features.
The Retinal Image Upload page allows users to upload images for analysis, demonstrating the
system's capability to process medical data seamlessly. The Image Clustering in CMD snapshot
highlights the backend processes, where retinal images are clustered for improved data segmentation
and feature extraction. The Clustered Image page displays how the system organizes images into
meaningful clusters to assist in prediction accuracy. The Prediction page serves as the interface for
users to interact with the model and view predictions, while the Risk Prediction Result page clearly
displays the outcome, giving users an understanding of their heart attack risk. The Standard Normal
Conditions page shows the baseline for a healthy heart, offering a comparison point for the
prediction results. The Accuracy for Test and Training Image page demonstrates the system's
effectiveness in both training and testing phases, validating its performance.
To conclude, the Heart Attack Risk Prediction System showcases the potential of artificial
intelligence and machine learning in revolutionizing healthcare. By utilizing retinal images for risk
analysis, the system offers a novel, non-invasive method for predicting heart attack risks. The
application of image clustering techniques, coupled with a user-friendly interface for test uploads
and results interpretation, makes it both practical and accessible. Through training with a
comprehensive dataset, the system achieves high accuracy, ensuring reliable predictions. This
project highlights the importance of integrating advanced technologies with healthcare, paving the
way for more proactive and personalized health monitoring. The system's effectiveness and
efficiency could serve as a stepping stone for further developments in predictive healthcare.
P a g e | 53
REFERENCES
[1] P .K. Anooj, ―Clinical decision support system: Risk level prediction of heart disease using
weighted fuzzy rulesǁ; Journal of King Saud University – Computer and Information Sciences (2012)
24, 27–40. Computer Science & Information Technology (CS & IT) 59
[2] Nidhi Bhatla, Kiran Jyoti”An Analysis of Heart Disease Prediction using Different Data Mining
Techniques”.International Journal of Engineering Research & Technology
[3] Jyoti Soni Ujma Ansari Dipesh Sharma, Sunita Soni. “Predictive Data Mining for Medical Diagnosis:
An Overview of Heart Disease Prediction”.
[4] Chaitrali S. Dangare Sulabha S. Apte, Improved Study of Heart Disease Prediction System
using Data Mining Classification Techniques” International Journal of Computer Applications (0975
– 888)
[5] Dane Bertram, Amy Voida, Saul Greenberg, Robert Walker, “Communication,
Collaboration, and Bugs: The Social Nature of Issue Tracking in Small, Collocated Teams”.
[7] Ankita Dewan, Meghna Sharma,” Prediction of Heart Disease Using a Hybrid Technique in Data
Mining Classification”, 2nd International Conference on Computing for Sustainable Global
Development IEEE 2015 pp 704-706. [2].
[9]Shadab Adam Pattekari and Asma Parveen,” PREDICTION SYSTEM FOR HEART DISEASE
USING NAIVE BAYES”, International Journal of Advanced Computer and Mathematical Sciences
ISSN 2230-9624, Vol 3, Issue 3, 2012, pp 290-294.
P a g e | 54