First Synopsis of The Project
First Synopsis of The Project
Submitted to:
THE DEGREE OF
CONTENTS
INTRODUCTION
The healthcare sector generates a lot of data regarding patients, diseases,
and diagnoses, but it is not being appropriately analyzed, so it is not
providing the value it should be. Heart illness is the prime reason of death.
Rendering to the World Health Organization, CVDs are the largest cause of
mortality globally, resulting in the deaths of an estimated 17.9 million
individuals each year. The healthcare industry generates a lot of data
regarding patient, diseases, and diagnoses, but it is not properly analyzed,
so it does not have the same impact as it should on patient health.
Proposed Methodology
The proposed methodology is to build a machine learning system that can predict whether
a person has heart disease or not. Here are the steps involved:
1. Get the data: This involves downloading a dataset containing information about people's
health, including features like age, sex, chest pain type, and blood sugar level, and a target
variable indicating the presence or absence of heart disease.
2. Process the data: The data is preprocessed to make it suitable for the machine learning
model. This may involve handling missing values, converting categorical variables into
numerical ones, and scaling the data.
3. Split the data: The data is split into two sets: training data and testing data. The training
data is used to train the machine learning model, and the testing data is used to evaluate
the model's performance.
4. Train the model: A logistic regression model is chosen because it is suitable for binary
classification tasks like this one. The model is trained on the training data, learning the
relationships between the features and the target variable.
~5~
5. Evaluate the model: The model's performance is evaluated on the testing data using
accuracy score. This metric measures the proportion of predictions that the model makes
correctly.
6. Build a predictive system: Once the model is trained and evaluated, it can be used to
make predictions on new data. This involves feeding the features of a new person into the
model and getting a prediction of whether they have heart disease or not.
PROBLEM STATEMENT
OBJECTIVES
1. Main Objectives
The main objective of this research is to develop a heart prediction system. The
system can discover and extract hidden knowledge associated with diseases from a
historical heart data set Heart disease prediction system aims to exploit data mining
techniques on medical data set to assist in the prediction of the heart diseases.
2. Specific Objectives
• Provides new approach to concealed patterns in the data.
3. Justification
Clinical decisions are often made based on doctor’s insight and experience rather than
on the knowledge rich data hidden in the dataset. This practice leads to unwanted
biases, errors and excessive medical costs which affects the quality of service
provided to patients. The proposed system will integrate clinical decision support with
computer-based patient records (Data Sets). This will reduce medical errors, enhance
patient safety, decrease unwanted practice variation, and improve patient outcome.
This suggestion is promising as data modeling and analysis tools, e.g., data mining,
have the potential to generate a knowledge rich environment which can help to
significantly improve the quality of clinical decisions. There are voluminous records
in medical data domain and because of this, it has become necessary to use data
mining techniques to help in decision support and prediction in the field of healthcare.
Therefore, medical data mining contributes to business intelligence which is useful for
diagnosing of disease.
~8~
Scope
Here the scope of the project is that integration of clinical decision support with
computer-based patient records could reduce medical errors, enhance patient safety,
decrease unwanted practice variation, and improve patient outcome. This suggestion is
promising as data modeling and analysis tools, e.g., data mining, have the potential to
generate a knowledge-rich environment which can help to significantly improve the
quality of clinical decisions.
Limitations
Medical diagnosis is considered as a significant yet intricate task that needs to be
carried out precisely and efficiently. The automation of the same would be highly
beneficial. Clinical decisions are often made based on doctor’s intuition and
experience rather than on the knowledge rich data hidden in the database. This
practice leads to unwanted biases, errors and excessive medical costs which affects the
quality of service provided to patients. Data mining have the potential to generate a
knowledge-rich environment which can help to significantly improve the quality of
clinical decisions.
~9~
FEATURES
5. Clinical Test: Result of ECG, stress test, and other diagnostic tests.
Resting Electrocardiographic Result (Restecg): Result from resting
ECG tests, which can indicate abnormal heart rhythms or other cardiac
abnormalities.
ST Depression (Oldpeak): The amount of ST- segment depression
observed during exercise stress testing, which can indicate ischemia
(lack of blood flow to the heart).
Maximum Heart Rate Achieved (Thalach): The maximum heart rate
achieved during exercise stress testing.
Thallium Stress Test Result (thal): Result of the thallium stress
testing, which can indicate areas of reduced blood flow to the heart
muscle.
Target: Have a heart disease or not.
~ 11 ~
Software requirements
NumPy: A library for the Python programming language, adding support for
large, multi-dimensional arrays and, along with a large collection of high-level
mathematical functions to operate on these arrays.
Pandas: A software library written for the Python programming language for
data manipulation and analysis.
Matplotlib: A plotting library for the Python programming language and its
numerical mathematics extension NumPy.
explain the data and relationship between one dependent binary variable and one
or more nominal, ordinal, interval, or ratio-level independent variables.
These are the main software used in this project, but there might be other
libraries and tools used as well in Colab notebook.
HARDWARE REQUIREMENT
The hardware requirements for running the code in the provided Colab notebook
are:
At least 8 GB RAM.
The server needs to have at least 50 Mbps downstream capacity from the
internet.
The connection between client and server should have at least 20 Mbps
bandwidth, and no more than 200ms latency.
For the best performance, enabling Swap and using local SSD storage is
recommended. Additionally, the machine should have at least 8GB available.
Note that these requirements are for running the code on a local machine. If we
using a remote development environment, the requirements may be different.
~ 16 ~
TECHNOLOGY LANGUAGES
Technology and Language that we are used in this project is Python, which is a
popular high-level programming language used for general-purpose
programming. Python is known for its simplicity, readability, and large standard
library, making it a great choice for a wide range of applications, from web
development to data analysis.
In addition to Python, the code also uses several libraries and tools, including
pandas, numpy, matplotlib, seaborn, and sklearn. These libraries provide
additional functionality for data manipulation, visualization, and machine
learning.
Pandas is a library for data manipulation and analysis, providing data structures
and functions for working with tabular data. Numpy is a library for numerical
computing, providing support for large, multi-dimensional arrays and matrices,
as well as a wide range of mathematical functions. Matplotlib and Seaborn are
libraries for data visualization, providing functions for creating charts, graphs,
and other visual representations of data.
Sklearn is a library for machine learning, providing a wide range of algorithms
and tools for building predictive models. In the code you provided, Sklearn is
used to train a logistic regression model for predicting heart disease. Overall, the
technology used in this code is a combination of Python and several popular
libraries for data manipulation, visualization, and machine learning.