0% found this document useful (0 votes)
13 views6 pages

Asd 1

Uploaded by

viratstest
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views6 pages

Asd 1

Uploaded by

viratstest
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

The identification of health of heart through predictive

ML based technique in integration with existing system


Mr. S. Prabakaran, Asst. prof.,
Harish S, Tamilselvan B,
Department of Computer
Department of computer science Department of computer science
Science and Engineering,
V.S.B. Engineering College, and Engineering, and Engineering,
Karur, India V.S.B. Engineering College, V.S.B. Engineering College,
[email protected] Karur, India Karur, India
[email protected] [email protected]
Praveen Kumar B,
Gobinath B,
Department of computer science
and Engineering, Department of computer science
V.S.B. Engineering College, and Engineering,
Karur, India V.S.B. Engineering College,
Karur, India
[email protected]
[email protected]

Abstract— New classes of diseases include; cardiovascular employed process, it is crucial to carefully preprocess the
diseases: These are diseases associated with the heart and blood heart disease dataset and identify the features to train and
vessels and they are some of the classes of emerging infectious test every machine learning algorithm. The efficiency of the
diseases that threaten human health around the world. for
developed models is evaluated by using the performance
public health, is a big blow to complicated the issue that this is
becoming a problem gradually, many researchers have been
indices, such as accuracy scores.
developing complicated systems that integrate patient’s For the most part, the general objective aims at
electronic health records, as well as many others, and artificial identifying the model that depicts higher accuracy in
neural networks algorithms in a bid to produce high diagnostic predicting the occurrence of heart diseases after conducting
results. most of the frequent artificial intelligence particularly
several tests and trials. The application of machine learning
in a context where full patient data are instantiated for the
purpose of inferring the heart diseases. and thereby merely in healthcare can dramatically change the clinics’ work and
claiming the most primordial facets of our population’s health. help in earlier recognition of cardiological diseases, tailor
describes with exemplary accuracy the use of four separate individual prevention and effective treatment strategies, and
classification methods: The used models in the current study optimize the overall pace of interventions, which can
are a) Multilayer Perceptron (MLP) b) RandomForest (RF) c) improve outcomes for patients. Through this research, the
Support Vector Machine (SVM), and d) XGBoost. Before the field of machine-learning is pointed out as a grand way to
development of the conceptual models, this study went through improving the statistical models of heart disease and, thus,
pre-processing and doing features extraction to these analyses the diagnostic conventions, which would serve the
These were done in line with what is regarded as the best
improvement of patient care and treatments in the long run
approach when doing the last phase in a given research
process. When evaluating the results, the following standards [11].
were used: accuracy, precision, recall, F1-score as it has been
mentioned earlier, the XGBoost model shows a very high level
II. LITERATURE SURVEY
of effectiveness The accuracy rate is rather high, 88%, which With the focus of obtaining pertinent findings
means the effectiveness of the instrument is high. Thus, this regarding the current critical area of elaborating effective
research study will fill this research gap of knowledge, as heart disease monitoring and prediction, the current paper,
mentioned earlier, in a way that is appreciable. is a clear step in in the process of conducting the contemporary research,
the right direction in addressing the high error rate that is reviewed nine papers. The examined areas included datasets
associated with cardiovascular diagnosis while at the same time as well as the methods for collecting the data, types of
enhancing the validity with the MLP as well as the SVM. algorithms used for prediction, health indicators, and the
degree of accuracy in the predictions made.
Keywords— Heart Disease, Machine learning, MLP, SVM,
RF, NB, XGBC. B. Keerthi Samhitha (2020) conducted the survey
study based on the analysis of 30 journal papers to reveal
I. INTRODUCTION trends in the development of heart disease predicting [1].
They apply wearable smart devices or IoT devices to observe
Globally, heart diseases are leading cause of patients’ data on a real-time basis and transfer such data to
death coupled with the fact that they create pressure on cloud servers for analyzing by methods such as Support
currently existing health facilities. The early and proper Vector Machines, Random Forest, or Convolutional Neural
diagnosis of the cardiac conditions is vital for better health Networks. Indeed, the purpose was to focus on assessing the
and management of the patients by timely interventional metrics that best estimate the state of cardiovascular system.
therapeutic procedures and specific treatment plan Ch Raja Shaker and Anisetti Sidhartha (2022)
formulation. The availability of machine learning examined low-power sensor-based cardiac health monitoring
technologies and the ever-growing database on digital devices for home use that capture vital sign information from
health records have boosted the scientific interest in the use the patient and transfer the information to personal
of prediction models of cardiovascular diseases. computers using Bluetooth [10][5]. These data help to predict
In the context of the proposed broad area of the patients who are likely to pose serious risks and also
healthcare predictive analytics, using machine learning enhance the paradigm of Online doctor-patient relationship.
models brings benefits because of their ability to find Samir, Patel, and Santosh Kumar Bharti (2020) give detailed
intricate patterns and dependencies in data, so these results analysis of it from 1995 to 2020 along with clinical features,
are useful to the healthcare workers. Regarding the imaging and ECG data for early recognition of heart disease
XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE
[3]. Their study compared previous practice and made
recommendations on how automated monitoring and
DATASET
diagnosis of the heart could be enhanced [4]. INPUT
With reference to 2021, Harshit Jindal and Sarthak
Agrawal aimed at precisely determining heart disease with
the help of CNNs [14]. To evaluate the performance of the SPLIT THE
presented model, they cleaned the used UCI dataset and DATASET INTO
TRAIN AND TEST
measured the results based on such criteria as accuracy, the DATASET
F1 score, memory, and precision.
Such works show the development of Heart disease
prediction in the future with technologies, dataset, and yes
IF MULTIPLE IF MULTIPLE no SOME DATAHOLES
method. This review is therefore very useful in underlining DATAHOLES THEN HOLES PRESENT THEN FILL IT
DELETE IT
the need to adopt machine learning in the prediction and IN DATASET

monitoring of heart diseases to enhance on the treatment


being provided to these patients [6].
III. PROPOSED SYSTEM METHODOLOGY TRAIN THE
MODEL
A. Introduction to system
It suggests the powerful That is the order of the proposed
system.XGBoostClassifier as the form of a predictive tool TEST THE
PREDICTION
along with the Sequential neural Network(NN) model [13]. ALGORITHM

B. Methodology overview Tools Utilized:


Methodology overview and tools utilized: Working for CHECK THE
someone is incomparable to having a job where one is ACCURACY SCORE
WITH DIFFERENT
expected to work until the person above him dies: This is ALGORITHMS
the complete working for another. System was developed
using Python programming language that can be
incorporated and executed efficiently with the Jupyter
SAVE THE BEST
Notebook environment. ACCURACY SCORE

Hyperparameter Tuning: XGBoost parameter tuning


Models performance involved using GridSearchCV, which
Fig.1.Block Diagram
is a reliable method of searching and browsing through a set
of items which are determined in advance grid of D. Initiation of the Project and preparation of data
hyperparameter values. This grid included variations The planning or forming phase and the data collection
concerning the number of estimators (n_estimators), or acquiring phase are the two key phases in this case
learning rate > (number of layers), the learning rate, the
among some of the essential initial processes in any
maximum depth of each tree(max_depth).
machine learning project. They lay the foundation for
Work Flow: Specifically, the machine learning work the achievement of the desired goal within the project is
flow is initiated with outlining the project objectives and clearly defined, and the dataset of the trained model is
data collection, the latter of which is stored in a database. consistent and loaded and ready to be analyzed [8].
The data was divided into train and test data sets. Data that
are missing are dealt with by either dropping the whole E. Data storage and accessibility
databases or filling them via different implied techniques. Data was identified as a major component on machine
multiple gaps or where appropriate filling single gaps. The learning projects with security and accessible data.
model is trained, and then checked for performance on the Picking the right storage option varies with data volume,
test set while using the different algorithms. and the final type and sort of access and types of database. Cloud
best model is selected for use in the later part of the study. storage presents extensibility and plenty of options, local
In this refined explanation, five key passages are Petersburg
storage may be used when the datasets are relatively
either represents the irrelevant patriarchal culture or ignores
small.
it fully, creating an entirely new world, which supports the
arguments presented in the five selected passages. [12] F. Division of the training and Test data
In this refined explanation, five key passages are Data division in the case of Machine Learning is
highlighted: essential for generalizability. It includes splitting the data
1)Initiation of the project and preparation of data into two exclusive sets: Mainly, people are split between the
2)Data storage and accessibility training (70-80%) and testing (20-30%). The used to say
that the model learns from the training data, which the
3)Division of the training and Test data
unseen testing evaluation is carried using data to determine
4)Handling missing data its effectiveness on new samples.
5)Model training, model evaluation and selection

C. Block diagram
START

XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE


G. Handling missing data  std: Coefficient of variation, which is the standard
This paper captures the general idea of missing data in a deviation of the values in the column that is
machine learning dataset and they should be handled needed to explain how dispersed the data is from
carefully. Just dropping them can prove to be dangerous as average.
one risks to lose important information and bias the model.  25%: the lower quartile or the first quartile, q1 of
It is suitable to use some default values mean, mode, or any the column.
other value that appears most often in the given column.  50%: The median of the given set of data.
Otherwise, one should use some other data points in order  75%: the value at the 75th percentile or the third
to predict the missing ones, for example, using K-Nearest quartile, denoted as Q3, of the column.
Neighbors techniques. These consequences will depend
with the specifics of your data; thus, you should assess
However, for string or categorical data where one does
these matters keenly before taking appropriate measures.
not find column totals, df. describe() gives mainly the count
H. Model training, model evaluation and selection and different values that could be found in that specific column.
Machine learning involves a three-step process: The It remains useful to get the immediate sense of central
other major subcategories proposed for the taxonomy tendency, dispersion, and distribution of numerical data in your
DataFrame and the loss efficiency’s characteristics.
include training, evaluation, and selection. In the first step,
the model builds up an internal representation of the data,
during the training phase on the training data. Then, it is Choleste Heart Old Heart
Age BP
assessed with unseen test data with a view of determining rol Rate peak Disease
whether the algorithm has the capacity to generalize over count 1000 1000 1000 1000 1000 1000
mean 53.54 132.45 198.789 136.78 0.88734 0.55378
other situations that were not encountered during training.
std 9.43 18.51 109.87 25.467 1.066 0.498
Finally, according to the assessment results, the latter best-
min 28.001 0.001 0.001 60.001 -2.6001 0.001
performed model is selected. This approach of iteration
max 47.001 77.001 603.001 1.0001 6.2001 1
leads to ensuring the model arrived at is able to feed from
the data provided and is workable in practice.

IV. PROJECT DESCRIPTION Table.2. Describing the data used


A. Reading data D. Heart Disease by Age
The code retrieves data from a CSV file (‘dataset path’); The cultural visualizations of the data objects, when it
this is from the library called Pandas abbreviated as (pd). It comes to age playing out the heart disease in a numeric data
then writes the data in form of a structure that a programmer set, is eased by libraries such as seaborn in the python
can easily handle known as a DataFrame(df). Finally, it prints environment [9]. To start off, an environment is defined,
out the column names of df, as well as first few rows of df. essential libraries are imported, and it is assumed that there
head( ) for just a sneak peek to what the data looks like. is a dataFrame with features like age and whether the patient
se CP
Chol
Fasting Max Old Heart has heart disease or not. The emphasis of the visualization is
Age BP estro
x Type
l
BS HR peak Disease a line plot with age on the x-axis and the occurrence of heart
40 M ATA 140 289 0 172 0 0 diseases on the y-axis [2]. Another aspect that the code pays
49 F NAP 160 180 0 156 1 1 special attention to is how clear it is: it specifies that the
37 M ATA 130 283 0 98 0 0 thickness of the lines drawn must be reasonable and
48 F ASY 138 214 0 108 1.5 1
informative labels for the axes are provided, along with a
Table.1. Reading data title for the plot. In the less frequently altered area, the
background color may be made to complement the text for
B. Data Types improved readability. We can use to consider such
In Server Side Programming, to melt a dataframe prospective trends as the probability of heart
in python’s pandas library, df. as you see, dtypes is a more diseases as age rises [7].
succinct way of checking the data types of all of the
columns in a DataFrame that is named df. It returns a Series
object which is nothing but a one-dimensional array with
label for each element and the data which type is
based on the label.
C. Describing the data used
The function that fits most appropriately in the apply
section is the describe() function applied on a DataFrame
gives statistics over all the numerical columns in the
“dataframe.” It calculates various descriptive statistics for
each numeric column, including:
 Count: The sum of all the not missing elements
in the column
 mean: This simulation is the same as finding the
arithmetic mean of the whole column.
Fig.2. Heart disease based on Age
E. Feature Selection equation 4, the values that has to be tested are as follows:
Firstly, it creates a new DataFrame named X by removing XGBoost model. Subsequently, a GridSearchCV object is
the column named "HeartDisease" from the original defined. This object applies the XGBoost classifier to
DataFrame. Secondly, it extracts the values from the accomplish the hyperparameter grid as arguments. It also
"HeartDisease" column and assigns them to a new variable defines the evaluation metric(scoring=’roc_auc’).
named y. Essentially, this code separates the data in the Additionally, it sets that what number of folds of cross
"HeartDisease" column from the rest of the DataFrame. The validation (cv = 10) shall be employed to determine the
resulting DataFrame X excludes this column, while the models, and works on all the splits of the data and takes into
variable y holds the isolated values for potential use in tasks account all the available Cores (-n_jobs=-1) of the CPU to
like machine learning, where X could represent the features elevate the speed of the process.
and y the target variable.
F. Scaling the Data  GridSearch CV
The StandardScaler class from scikit-learn to  Estimator : XGBClassifier
preprocess the data in a DataFrame named X.
Standardization is a technique that aims to bring all features
XGB Classifier
in the data to a similar scale. The process starts by creating a
StandardScaler object (scaler). This object is then "fitted" to
the data in X (scaler.fit(X)), which essentially calculates the Fig.4.XGBC
mean and standard deviation for each feature (column) H. Prediction
within the data. Subsequently, the actual transformation After this we get to the model that performed the best
occurs (X_scaled = scaler.transform(X)). During this step, through Interpreting the results of the Grid Search and storing
each element in X is standardized by subtracting the the best model as best_model, the next step is to assess how
previously calculated mean and dividing by the well it will fare on new data on which it has not previously
corresponding standard deviation. The resulting been trained. This is accomplished by executing the line of
standardized data is stored in a new DataFrame named code: Y_pre = best_model. predict (X_test). Here's what
X_scaled. the code snippet prints the transformed data . In happens:
essence, this process removes the average value from each
feature and expresses them in terms of standard deviations  The code also requires that the variable best_model
from the mean, creating a dataset where all features are on a already be present.
similar scale, often beneficial for machine learning  The model which has been uncovered with the help
algorithms. of the Grid is the champion model.
 The predict function of best model is invoked which
is referred to as the predict method passed into the
function the unseen data from the dataframe X_test.
 The model analyses the features present in X_test
and generates predictions for the corresponding class
labels.
The ability the make accurate predictions on data it
has not confronted during the training process and its phases.
To compare the predicted values with the actual value the
follows steps target values then the performance of the model
can be computed appropriately. The accuracy, the precision,
the recall or the f1-score for example. From this evaluation
one is able to identify the strengths of the model flaws, in
Fig.3. Scaling the Data
order to continue the optimization if necessary.

G. Training the model and fitting the model Fig.5.Prediction of Heart Disease
Modeling the model & Fitting the model
The larger problem of searching for appropriate
hyperparameters for the issues of your XGBoost
classification model is treated through Grid
cross_validation. Here’s how it works: At first, based on
our problem statement, necessary libraries have been V. EQUATIONS
imported: K- fold GridSearchCV as well as the XGBoost
To recall, let the hyperparameter grid be where the
classifier to build XGboost model itself. After that, a
‘param_gridparam_grid’ is the grid parameters and the
classifier object is created The train command trains a
‘xgb’ as XGBoost classifier. Rename the Label &
model that comprises the usage of several classifiers
Parameters as grid_searchgrid_search and the best model as
depending on the given algorithm preferred. However, the
best_modelbest_model. The process can be expressed as:
major critical point is in defining the hyperparameter The
two attempts fail for the following reasons: BEST_MODEL=BEST_MODELARGMAX
grid(param_grid). From the mentioned dictionary, one can GRIDSEARCHCV(XGBCLASSIFIER(... PARAMETERS ...),
easily get understanding of the variety of parameters in PARAM_GRID, SCORING=′ROC_AUC′, CV=10, N_JOBS=−1)
VI. RESULT ANALYSIS necessity to develop an environment of trust coming from
As far as I understood from what I overheard from Brian explainable AI professionals. Validation on other datasets will
there are functions from scikit-learn that help in evaluation Therefore strengthen generalizability Addressing ethical
on how well, echild a model. The performance indicators considerations concerning data bias and fairness in the space
that are used in the evaluation of the developed model between populations is paramount. Thus, the necessary skills
includes; Accuracy of the XGBoostClassifier model as well must be refined to build a comprehensive and practical
as the Precision of the model and recall. First, it imports the approach for UA applications based on these factors, this
necessary functions:First of all, it can be stated that it project has possibilities to a large extent be synonymous with
included all the desired functions: precision_score, the development of preventative healthcare and optimal health
recall_score, and accuracy_score. Then, it achieves such of the disease and its treatment impacting such aspects as
measures with the assistance of the labels, which is patient satisfaction and functionality potentially saving lives.
determined with the aid of the model. These are The REFERENCES
predicted test labels (>n_pred) and the actual test labels
[1] B.KeerthiSamhitha, Sarika Priya.M.R, Sanjana.C,
(>n_test). The scores are accounted for prior to the ‘%’ sign SujaCherukullapurath Mana and Jithina Jose, “Improving the
with the bonus of the enhancement of presentation. Accuracy in Prediction of Heart Disease using Machine Learning
Precision helped to be presented the opportunities which the Algorithms”, International Conference on Communication and
model demonstrates a real existence of the disease in terms Signal Processing, July 28 - 30, 2020, India
[2] Sathyanarayanan, S., & Srikanta, M.K. (2024). Heart Sound
of fractions admitted alternatively among its positive Analysis Using SAINet Incorporating CNN and Transfer Learning
predictions. Retention is the manner by which the degree of for Detecting Heart Diseases. Journal of Wireless Mobile Networks,
recall among the learners is determined. with regard to Ubiquitous Computing, and Dependable Applications (JoWUA),
positional accuracy, the model is 100 percent due to the fact 15(2), 152-169. https://doi.org/10.58346/JOWUA.2024.I2.011
[3] Devansh Shah, Samir Patel, Santosh Kumar Bharti, “Heart Disease
that the model puts every true positive in the actual positive Prediction using Machine Learning Techniques”, SN Computer
region. Purity establishes that the formation of the model is Science (2020) 1:345, doi:10.1007/s42979-020-00365-y
good. predictions across all cases Evaluating these metrics [4] F. Otoom, E.E. Abdallah, Y. Kilani, A. Kefaye and M. Ashour.
which, directly to the reader explaining the specifics of the ”Effective diagnosis and monitoring of heart disease.” Int. J. Softw.
Eng. its Appl, vol. 9, no. 1, pp. 143-156.
model’s performance, would be appropriate. [5] S. Neelima, Manoj Govindaraj, Dr.K. Subramani, Ahmed
ALkhayyat, & Dr. Chippy Mohan. (2024). Factors Influencing Data
Utilization and Performance of Health Management Information
Systems: A Case Study. Indian Journal of Information Sources and
Services, 14(2), 146–152. https://doi.org/10.51983/ijiss-2024.14.2.21
[6] Himanshu Sharma, M.A. Rizvi. ”Prediction of Heart Disease using
Machine Learning Algorithms: A Survey.” International Journal on
Recent and Innovation Trends in Computing and Communication,
Volume-5, Issue-8, pp.99-104, 2017.
[7] Stephen, K. V. K., Mathivanan, V., Manalang, A. R., Udinookkaran,
P., De Vera, R. P. N., Shaikh, M. T., & Al-Harthy, F. R. A. (2023).
IOT-Based Generic Health Monitoring with Cardiac Classification
Using Edge Computing. Journal of Internet Services and Information
Security, 13(2), 128-145.
[8] J. Geralds, "Sega Ends Production of Dreamcast," vnunet.com,
para. 2, Jan. 31, 2001. [Online]. Available:
http://nl1.vnunet.com/news/1116995. [Accessed: Sept. 12, 2004].
(General Internet site)
[9] K. Deb, S. Agrawal, A. Pratab, T. Meyarivan, “A Fast Elitist Non-
dominated Sorting Genetic Algorithms for Multiobjective
Optimization: NSGA II,” KanGAL report 200001, Indian Institute
Fig.6.XGBoostClassifier of Technology, Kanpur, India, 2000. (technical report style)
[10] Ch Raja Shaker, AnisettiSidhartha, AntoPraveena, A. Chrsity, B.
Bharati, “An Analysis of Heart Disease Prediction using Machine
Learning and Deep Learning Techniques”,2022 6th International
Conference on Trends in Electronics and Informatics (ICOEI),DOI:
VII. CONCLUSION 10.1109/ICOEI53556.2022.9776745
[11] Bobir, A.O., Askariy, M., Otabek, Y.Y., Nodir, R.K., Rakhima, A.,
This project also focused on the inequalities between Zukhra, Z.Y., Sherzod, A.A. (2024). Utilizing Deep Learning and
machine learning algorithms for prediction of heart diseases the Internet of Things to Monitor the Health of Aquatic Ecosystems
their effectiveness has also been tested and good accuracy has to Conserve Biodiversity. Natural and Engineering Sciences, 9(1),
72-83.
been achieved. Some of them are 88 % with some models, [12] IndrajaniSutedja, “Descriptive and Predictive Analysis on Heart
which makes them viable for identifying. High-risk Disease with Machine Learning and Deep Learning”,2021 3rd
individuals. However, improvements are still possible. Future International Conference on Cybernetics and Intelligent System
work could incorporate richer data sources like Facility, (ICORIS),DOI:10.1109/ICORIS52787.2021.9649
[13] Arora, G. (2024). Desing of VLSI Architecture for a flexible testbed
Company, and Product data to obtain a clearer picture of of Artificial Neural Network for training and testing on FPGA.
patients’ needs heredity and external influences typical of Journal of VLSI Circuits and Systems, 6(1), 30-35.
human beings and the presence of an advanced feature [14] Harshit Jindal, Sarthak Agrawal, Rishabh Khera, Rachna Jain, and
engineering, and deep learning exploration as a method to Preeti Nagrath. ”Heart disease prediction using machine learning
algorithms.” IOP Conf. Series: Materials Science and Engineering
bring more accuracy. 1022 (2021) 012072. doi:10.1088/1757-899X/1022/1/012072

Health data collection using wearable technology in


enforcing real-time information might allow for constant risk
evaluation and in turn timely prevention. For healthcare, it is a

You might also like