Disese Prediction Final-1
Disese Prediction Final-1
PROJECT REPORT
ON
Project-II
DEC- 2024
DECLARATION
We, Piyush and Prajwal hereby declare that the report of the project entitled “ Disease
Prediction Application Using Machine Learning” has not presented as a part of any other
academic work to get our degree or certificate except Chandigarh Engineering College
Jhanjeri, Mohali, affiliated to I.K. Gujral Punjab Technical University, Jalandhar, for the
fulfillment of the requirements for the degree of B.Tech in Computer Science & Engineering.
II
Chandigarh Engineering College, Jhanjeri, Mohali
(An Autonomous College)
Department of Computer Science & Engineering
ACKNOWLEDGEMENT
It gives us great pleasure to deliver this report on the Project-II, We worked on for our B.Tech
in Computer Science & Engineering 3rd year, which was titled "Disease Prediction
Application Using Machine Learning “. We are grateful to our university for presenting us
with such a wonderful and challenging opportunity. We also want to convey our sincere
gratitude to all coordinators for their unfailing support and encouragement.
We are extremely thankful to the HOD and Project Coordinator of Computer Science &
Engineering at Chandigarh Engineering College Jhanjeri, Mohali (Punjab) for valuable
suggestions and heartiest co-operation.
We are also grateful to the management of the institute, for giving us the chance to acquire the
information. We are also appreciative of all of our faculty members, who have instructed us
throughout our degree.
III
Chandigarh Engineering College, Jhanjeri, Mohali
(An Autonomous College)
Department of Computer Science & Engineering
ABSTRACT
The health care system collects data and reports from the hospitals or patient's database by
machine learning and data processing techniques which is employed to predict the disease so
as to create reports supported the results which used for various kinds of predictions for
disease and which is that the leading explanation for the human's death since past years.
Medical reports and data had been extracted from various databases to predict a number of the
required diseases which are commonly found in people nowadays breast cancer, heart disease
and diabetes disease and make their life more critical to measure. Nowadays technology
advancement within the health care industry has been helping people to create their process
easier by suggesting hospitals and doctors to travel to for his or her treatment, where to admit
and which hospitals are the simplest for the treating the desired disease. we've implemented
this sort of system in our application to form people’s life simpler by predicting the disease by
inputting certain data from their reports which can give the result positive or negative
supported the disease prediction they are going to be having a choice to get recommendation
of best hospitals with best doctors nearby from the past users or guardians.
IV
Chandigarh Engineering College, Jhanjeri, Mohali
(An Autonomous College)
Department of Computer Science & Engineering
TABLE OF CONTENTS
TITLE I
DECLARATION II
ACKNOWLEDGMENT III
ABSTRACT IV
1. INTRODUCTION 1
2. LITERATURE SURVEY 2
4.3 METHODOLOGY 15
5.1 RESULTS 29
iv
Chandigarh Engineering College, Jhanjeri, Mohali
(An Autonomous College)
Department of Computer Science & Engineering
v
Chandigarh Engineering College, Jhanjeri, Mohali
(An Autonomous College)
Department of Computer Science & Engineering
LIST OF FIGURES
ii
Chandigarh Engineering College, Jhanjeri, Mohali
(An Autonomous College)
Department of Computer Science & Engineering
CHAPTER 1
INTRODUCTION
Properly analyzing clinical documents about patients’ health anticipate the possibility of
occurrence of various diseases. In addition, acquiring information regarding specialists of that
particular disease as per the requirement facilitates proper and efficient diagnosis. This Project
provides a novel method that uses data mining technique, namely, Logistic regression and
random forest classification algorithm for prediction of disease. Using medical profiles such as
heart rate, blood pressure through sensors and other externally observable symptoms such as
fever, cold, headache etc. that patient has, prediction of likelihood of a disease is done.
Logistic regression and random forest classification algorithm takes these symptoms and
predicts disease. Furthermore, all the needful and adequate information regarding the predicted
disease as well as the recommended doctors is provided. Recommendation (Future
implementation) suggests the location , contact and other necessary details of the disease
specialists based on the filters chosen by the user out of less fees, more experience, nearest
location and feedback reviews of the doctors.
algorithm. Thus user can get appropriate treatment and necessary medical advice as fast as
possible. Additionally, users provide their feedback for the recommended doctors which are
then added for analysis in order to make further recommendations based on reviews.
Healthcare industry generates terabytes of data every year. The medical documents maintained
are a pool of information regarding patients. The task of extracting useful in formation or
quality healthcare is tricky and important. By analyzing these voluminous data we can predict
the occurrence of the disease and safe guard people. Thus, an intelligent system for disease
prediction plays a major role in controlling the disease and maintaining the good health status
for people by providing accurate and trustworthy disease risk prediction.
i
Chandigarh Engineering College, Jhanjeri, Mohali
(An Autonomous College)
Department of Computer Science & Engineering
CHAPTER 2
LITERATURE SURVEY
iii
Chandigarh Engineering College, Jhanjeri, Mohali
(An Autonomous College)
Department of Computer Science & Engineering
collected data for
swine flu using naïve
bayes classifier for
classifying the
patients of swine flu
into three categories
(least possible,
probable or most
probable), resulting
into an accuracy of
nearly 63.33%.
Datasets used for
this classification
were limited in
number
iv
Chandigarh Engineering College, Jhanjeri, Mohali
(An Autonomous College)
Department of Computer Science & Engineering
CHAPTER 3
When we see around there are many patients that does not get the right treatment at the right
time because of their lack of decision taking about the choice of hospital and doctors, they
don’t know what you do now and end up very serious at the end.
The objective of the project is to provide the service to patients by suggesting them the best
hospital to find their cure for their existing disease. The project is to provide a very easy
solution for the patients to get recommendation to what doctor or hospital they need to go after
diagnosed with a severe disease. This web application can find the solution to that, no need of
thinking about what should be done after diagnosed with a severe disease. This web
application handles reports to make predictions and give results accordingly to that, a best
hospital can be selected for their treatment and more lives can be saved.
The scope of the project is to provide a very easy solution for the patients to get
recommendation to what doctor or hospital they need to go after diagnosed with a severe
disease. In this project we will be using web development to develop a web application that
will help us to achieve our target and with machine learning algorithms to predict the disease
by using random forest classifier algorithm and recommend the best doctors and the best
hospitals nearby by using collaborative filtering technique.
v
Chandigarh Engineering College, Jhanjeri, Mohali
(An Autonomous College)
Department of Computer Science & Engineering
CHAPTER 4
• Several online health care system has invented new ideas to benefit people and so
many online applications have features to give recommendations on hospital and
doctors.
• But they have lack of reliability and accuracy where they need to do improvisations in
the features and modules. Genuinely health care systems might not upload the opinions
of people in some cases for the negative response and by doing manually while
collecting feedbacks from the patients, might be patients hesitate to give complete
opinion of doctors or hospitals in front of persons where we will find the lack of
quality.
• In total we have not found all features and modules at a time in one application and
there are different types of applications for different type of diseases where they have
different applications separately for doctors and hospitals to give recommendations.
vi
Chandigarh Engineering College, Jhanjeri, Mohali
(An Autonomous College)
Department of Computer Science & Engineering
• In this research we have found the solution for the issues facing in existing system
where we have proposed the accuracy, reliability and efficiency by developing the
features of three diseases called Heart disease, cancer disease and diabetes where we
will find most common diseases in people health and we have installed in one
application with prediction of three diseases by analyzing the symptoms collected from
the patient’s record and taking positive and negative opinions from patient’s according
to that we will give ratings to the hospitals and doctors from best to worst.
• Guardians opinions is also very much important and they can give feedback of them
like how they were treating their patients? Was it friendly or strictly? And how the
hospital management is? Was it clean? How is the hospitality ? When the feedback
comes to online so that patients and guardians can give both positive and negative
opinions completely without any hesitation.
• Based on that we can provide truthful recommendations of hospital and doctors for
the people and can predict the results. According to that prediction of particular disease
we will predict best suitable hospital and doctor to consult and to get admit into it.
vii
Chandigarh Engineering College, Jhanjeri, Mohali
(An Autonomous College)
Department of Computer Science & Engineering
Hardware requirements
Hardware Minimum requirements
Software requirements
Microsoft .Net Framework v4.6.1 The HelpMaster Web Portal has been written to
(or higher)use Microsoft IIS ASP.NET technology and as such requires the machine that
Extensibility 4.5 features enabled.
viii
Chandigarh Engineering College, Jhanjeri, Mohali
(An Autonomous College)
Department of Computer Science & Engineering
A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds
to one or more database tables, where every column of a table represents a particular variable,
and each row corresponds to a given record of the data set in question. The data set lists values
for each of the variables, such as height and weight of an object, for each member of the data
set. Data sets can also consist of a collection of documents or files. The sources of the
datasets are from Kaggle.com.
The datasets that are used are:
ix
Chandigarh Engineering College, Jhanjeri, Mohali
(An Autonomous College)
Department of Computer Science & Engineering
x
Chandigarh Engineering College, Jhanjeri, Mohali
(An Autonomous College)
Department of Computer Science & Engineering
xi
Chandigarh Engineering College, Jhanjeri, Mohali
(An Autonomous College)
Department of Computer Science & Engineering
From the above 7 input fields we are only choosing the two input fields [Glucose, BMI] based
on Correlation Pearson method.
xii
Chandigarh Engineering College, Jhanjeri, Mohali
(An Autonomous College)
Department of Computer Science & Engineering
Fig 4.3: Diabetes disease dataset details.
xiii
Chandigarh Engineering College, Jhanjeri, Mohali
(An Autonomous College)
Department of Computer Science & Engineering
concavity_se 569 non-null float64
concave points_se 569 non-null float64
symmetry_se 569 non-null float64
fractal_dimension_se 569 non-null float64
radius_worst 569 non-null float64
texture_worst 569 non-null float64
perimeter_worst 569 non-null float64
area_worst 569 non-null float64
smoothness_worst 569 non-null float64
compactness_worst 569 non-null float64
concavity_worst 569 non-null float64
concave points_worst 569 non-null float64
symmetry_worst 569 non-null float64
fractal_dimension_worst 569 non-null float64
xi
v
Chandigarh Engineering College, Jhanjeri, Mohali
(An Autonomous College)
Department of Computer Science & Engineering
4.3 METHODOLOGY
The user has to input the data where it will be stored in database and then according to their
choice the prediction will be made. After collecting the user data from the database and the
choice of predicting the disease is to be predicted. If negative then end the process and if
positive the user will get hospital recommendations at which their best treatment can be done.
SYSTEM ARCHITECTURE
xv
Chandigarh Engineering College, Jhanjeri, Mohali
(An Autonomous College)
Department of Computer Science & Engineering
WebApp. Architecture design is tied to the goals establish for a WebApp, the content to be
presented, the users who will visit, and also the navigation philosophy that has been
established. Content architecture, focuses on the way within which content objects and
structured for presentation and navigation. WebApp architecture, addresses the way the
applying is structure to manage user interaction, handle internal processing tasks, effect
navigation, and present content. WebApp architecture is defined within the context of the
event environment during which the appliance is to be implemented.
MODULES IMPLEMENTED
The user has to input the data where it will be stored in database and then according to their
choice the prediction will be made. After collecting the user data from the database and the
choice of predicting the disease is to be predicted. If negative then end the process and if
positive the user will get hospital recommendations (future ) at which their best treatment can
be done.
• Application Flowchart.
• Data collection (from the user) to make dataset.
• Importing packages.
• Data pre-processing.
• Data fitting and training.
• Prediction as opted by the user.
• Result or output.
xv
i
Chandigarh Engineering College, Jhanjeri, Mohali
(An Autonomous College)
Department of Computer Science & Engineering
FLOWCHART DIAGRAM
xv
ii
Chandigarh Engineering College, Jhanjeri, Mohali
(An Autonomous College)
Department of Computer Science & Engineering
DATA COLLECTION
Data Collection is one of the most important tasks in building a machine learning model. We
collect the specific data based on requirements from users to make the dataset. The dataset
contains some unwanted data also. So first we need to pre- process the data and obtain perfect
data set for algorithm.
PACKAGES IMPORTED
• Pandas : Pandas is a software library written for python for data manipulation and
analysis. It offers data structures and operations for manipulating numerical tables and
time series.
• Numpy: It is a library for the Python Programming Language, adding support for
large, multiple-dimensional arrays and matrices, along with a large collection of high-
level mathematical functions to operate on these arrays.
xv
iii
Chandigarh Engineering College, Jhanjeri, Mohali
(An Autonomous College)
Department of Computer Science & Engineering
• Classification Report:The classification report visualizer displays the precision, recall,
F1, and support scores for the model.
Syntax: from sklearn.metrics import classification_report.
DATA PRE-PROCESSING
It is the gathering of task related information based on some targeted variables to analyse and
produce some valuable outcome. However, some of the data may be noisy, i.e. may contain
inaccurate values, incomplete values or incorrect values. Hence, it is must to process the data
before analysing it and coming to the results. Data pre-processing can be done by data
cleaning, data transformation, data selection
Data pre processing is a process of preparing the raw data and making it suitable for a
machine learning model. It is the first and crucial step while creating a machine learning
model.
When creating a machine learning project, it is not always a case that we come across
the clean and formatted data. And while doing any operation with data, it is mandatory to
clean it and put in a formatted way. So for this, we use data pre processing task.
A real-world data generally contains noises, missing values, and maybe in an unusable format
which cannot be directly used for machine learning models. Data pre processing is required
tasks for cleaning the data and making it suitable for a
xi
x
Chandigarh Engineering College, Jhanjeri, Mohali
(An Autonomous College)
Department of Computer Science & Engineering
machine learning model which also increases the accuracy and efficiency of a machine
learning model.
DATA TRAINING
Model fitting is a measure of how well a machine learning model generalizes to similar data
to that on which it was trained. A model that is well-fitted produces more accurate outcomes.
A model that is overfitted matches the data too closely. A model that is underfitted doesn't
match closely enough
Training data is the initial dataset used to train machine learning algorithms. Models create
and refine their rules using this data. It's a set of data samples used to fit the parameters of a
machine learning model to training it by example. Training data is also known as training
dataset, learning set, and training set. It's an essential component of every machine learning
model and helps them make accurate predictions or perform a desired task.
ALGORITHM SELECTION
• The datasets has been tested with different supervised machine learning
xx
Chandigarh Engineering College, Jhanjeri, Mohali
(An Autonomous College)
Department of Computer Science & Engineering
algorithms and it is found that the best solution with accuracy is given by
1. Diabetes – Logistic regression algorithm
2. Heart disease – Random Forest algorithm
3. Breast Cancer – Random Forest algorithm
• For hospital recommendation used collaborative filtering algorithm
xx
i
Chandigarh Engineering College, Jhanjeri, Mohali
(An Autonomous College)
Department of Computer Science & Engineering
xx
ii
Chandigarh Engineering College, Jhanjeri, Mohali
(An Autonomous College)
Department of Computer Science & Engineering
PREDICTION AS OPTED BY THE USED
Logistic regression is one of the most popular Machine Learning algorithms, which comes
under the Supervised Learning technique. It is used for predicting the categorical dependent
variable using a given set of independent variables. Logistic regression predicts the output of a
categorical dependent variable. Therefore the outcome must be a categorical or discrete value.
It can be either Yes or No, 0 or 1, true or False, etc. but instead of giving the exact value as 0
and 1, it gives the probabilistic values which lie between 0 and 1. Logistic Regression is much
similar to the Linear Regression except that how they are used. Linear Regression is used for
solving Regression problems, whereas Logistic regression is used for solving the classification
problems. In Logistic regression, instead of fitting a regression line, we fit an "S" shaped
logistic function, which predicts two maximum values (0 or 1). The
xx
iii
Chandigarh Engineering College, Jhanjeri, Mohali
(An Autonomous College)
Department of Computer Science & Engineering
curve from the logistic function indicates the likelihood of something such as whether the cells
are cancerous or not, a mouse is obese or not based on its weight, etc. Logistic Regression is a
significant machine learning algorithm because it has the ability to provide probabilities and
classify new data using continuous and discrete datasets. Logistic Regression can be used to
classify the observations using different types of data and can easily determine the most
effective variables used for the classification. The below image is showing the logistic
function:
The sigmoid function is a mathematical function used to map the predicted values to
probabilities. It maps any real value into another value within a range of 0 and 1. The value
of the logistic regression must be between 0 and 1, which cannot go beyond this limit, so it
forms a curve like the "S" form. The S-form curve is called the Sigmoid function or the
logistic function.In logistic regression, we use the concept of the threshold value, which
defines the probability of either 0 or 1. Such as values above the threshold value tends to 1,
and a value below the threshold values tends to 0.
xx
iv
Chandigarh Engineering College, Jhanjeri, Mohali
(An Autonomous College)
Department of Computer Science & Engineering
To implement the Logistic Regression using Python, we will use the same steps as we
have done in previous topics of Regression. Below are the steps:
xx
v
Chandigarh Engineering College, Jhanjeri, Mohali
(An Autonomous College)
Department of Computer Science & Engineering
For heart disease and breast cancer prediction RANDOM FOREST CLASSIFIER: Random
forest is a supervised learning algorithm which is used for both classification as well as
regression. But however, it is mainly used for classification problems. As we know that a
forest is made up of trees and more trees means more robust forest. Similarly, random forest
algorithm creates decision trees on data samples and then gets the prediction from each of
them and finally selects the best solution by means of voting. It is an ensemble method which
is better than a single decision tree because it reduces the over-fitting by averaging the result.
Random Forest is a popular machine learning algorithm that belongs to the supervised
learning technique. It can be used for both Classification and Regression problems in ML. It
is based on the concept of ensemble learning, which is a process of combining multiple
classifiers to solve a complex problem and to improve the performance of the model. As the
name suggests, "Random Forest is a classifier that contains a number of decision trees on
various subsets of the given dataset and takes the average to improve the predictive
accuracy of that dataset." Instead of relying on one decision tree, the random forest takes the
prediction from each tree and based on the majority votes of predictions, and it predicts the
final output.
xx
vi
Chandigarh Engineering College, Jhanjeri, Mohali
(An Autonomous College)
Department of Computer Science & Engineering
The greater number of trees in the forest leads to higher accuracy and prevents the problem
of overfitting.
xx
vii
Chandigarh Engineering College, Jhanjeri, Mohali
(An Autonomous College)
Department of Computer Science & Engineering
Random Forest works in two-phase first is to create the random forest by combining N
decision tree, and second is to make predictions for each tree created in the first phase.
The Working process can be explained in the below steps and diagram:
Step-2: Build the decision trees associated with the selected data points (Subsets).
Step-3: Choose the number N for decision trees that you want to build.
Step-5: For new data points, find the predictions of each decision tree, and assign the new
data points to the category that wins the majority votes.
Normal woods can be a gathering of trees. Here, the independence is partitioned into vectors,
and each tree gives an underlying stage division called a x distribution. Customary
timberlands give a gathering of guaranteed trees to make a fundamental variety of trees, and
Breiman picked the best strategy, the technique for cooking or grouping each tree in one of the
Random Forests, and Breiman followed the accompanying advances: Randomly organized N
archives, yet additionally supplanted, as should be visible from the first numbers, this is a boot
test. An illustration of this is tree establishing preparing. In the event that there is another M
info, m << M chooses something similar for every hub, and m is a variable chosen from M, so
a positive detachment from m addresses the property to be utilized for separation. The
consistent worth of m during woods improvement. Each tree develops as large as could be
expected. try not to cut. In this manner many trees are brought into the woods; The quantity of
trees anticipated by the ntree boundary. The greatest number of factors (m) chose for every
hub is again called "mtry" or k. The profundity of the tree can be constrained by hub
boundaries (for instance, the quantity of leaves), and now and then by something like one. As
referenced above, it streams from every one of the trees that fill in the backwoods to decide
the degree of
xx
viii
Chandigarh Engineering College, Jhanjeri, Mohali
(An Autonomous College)
Department of Computer Science & Engineering
substitution in the wake of preparing or catching the woodland. Each tree gives another
example class to casting a ballot. All tree ideas were merged and the greater part (larger part
vote) grouping was affirmed at another level. Going on here, the woodland characterizes a
tree backwoods assembled utilizing the RI timberland. In the ranger service area, each tree
was chosen and a freight test was made for substitution, yet around 1/3 of the first material
was absent. This rundown of models is called OOB (Out of pocket) data. Each tree has its
own OOB data, which is utilized to look at the breaks in each tree in the timberland, and is
known as the OOB break estimation.
CHAPTER 5
5.1 RESULTS
When we see around there are many patients that does not get the right treatment at the right
time because of their lack of decision taking about the choice of hospital and doctors, they
don’t know what you do now and end up very serious at the end. The objective of the project
is to provide the service to patients by suggesting them the best hospital to find their cure for
their existing disease . The project is to provide a very easy solution for the patients to get
recommendation to what doctor or hospital they need to go after diagnosed with a severe
disease. This web application can find the solution to that, no need of thinking about what
should be done after diagnosed with a severe disease. This web application handles reports to
make predictions and give results accordingly to that, a best hospitals can be selected more for
their treatment and more lives can be saved. After easy login or registering into the app the
patient can predict their disease after inputting certain reports from their medical diagnosis
report which will display accurately that the patient has the particular
xx
ix
Chandigarh Engineering College, Jhanjeri, Mohali
(An Autonomous College)
Department of Computer Science & Engineering
disease or not it will show in form of positive or negative. After the Prediction they will be
having an option to get recommended hospital which are best for the treatment of their disease
nearby. By this way the app can save many more lives more before its too late to get the
treatment.
SCREENSHOTS OF RESULT
xx
x
Chandigarh Engineering College, Jhanjeri, Mohali
(An Autonomous College)
Department of Computer Science & Engineering
xx
xi
Chandigarh Engineering College, Jhanjeri, Mohali
(An Autonomous College)
Department of Computer Science & Engineering
Fig 5.4: Performance analysis of logistic regression for diabetes disease prediction.
Fig 5.5: Performance analysis of Random Forest classifier for heart disease
prediction.
xx
xii
Chandigarh Engineering College, Jhanjeri, Mohali
(An Autonomous College)
Department of Computer Science & Engineering
Fig 5.6: Performance analysis of random forest classifier for breast cancer
prediction.
xx
xiii
Chandigarh Engineering College, Jhanjeri, Mohali
(An Autonomous College)
Department of Computer Science & Engineering
CHAPTER 6
Earlier days in hospitals they lack in technological aspects for testing and issuing the reports
which might take one day or may be more than that to issue the report for the lab related work
that are being executed manually to predict the disease also they lack in efficiency and
accuracy. But nowadays we have ample amount of data to show that these similar aspects
or components can lead to this disease (exception may occur), so with the help of machine
learning we have tried to implement similar system to predict the above stated disease which
are most commonly found in person these days. In this application we have tried to implement
a similar system which focuses on the three most deadly disease heart disease, breast cancer
and diabetes diseases. We have implemented an effective way to reduce the dimensionality,
reducing and eliminating the irrelevant data and increasing the accuracy. After the prediction
of the disease a positive and negative report will be displayed according to which the patients
can get best and nearby hospitals recommendations. It is to make easier way for patients to
find the hospitals with good quality care of doctors. In total we are implementing our
innovation ideas to give benefits to the people who are suffering from the health issues and
they can make use of this application where they will find all good options at a time in
one appeal. Opinions given by people on hospitals and doctors plays an important role and
easily they can make decision. The goal was to use such associations to create a patient
satisfaction based the recommendation system for hospitals.
xx
xi
Chandigarh Engineering College, Jhanjeri, Mohali
(An Autonomous College)
Department of Computer Science & Engineering
Future implementation is to recommend hospitals based on users review with the algorithm
Collaborative Filtering: The motivation behind the CF calculation is to ascertain the benefits
of a specific item for another item that is offered or for a chose client in view of the client's
related knowledge and afterward the thoughts of different clients.
In view of authenticity and effortlessness, we expect to pay attention to what different clients
share for all intents and purpose and love comparable preferences. Consolidating the
inclinations of the two clients is considered by the orientation correspondence of the past. All
CF techniques share the capacity to anticipate or give groundbreaking plans to individual
clients who will appreciate utilizing past clients. Central issues depend on the possibility of
connecting customers or items, and network is characterized as the demonstration of
contracting between the first or the best. The two most significant CF modes are typically
executed as client based objects, while the joined solicitation technique is separated into two
gatherings: memory-based and model-based.
xx
xv
Chandigarh Engineering College, Jhanjeri, Mohali
(An Autonomous College)
Department of Computer Science & Engineering
The CF framework requires a lot of information handling, including broadband, for example,
web-based business and web facilitating.
Throughout the course of recent years, CF has advanced and has at long last become perhaps
the most well-known method for significantly impacting the manner in which you approach
directing. Today, PCs, as well as the Internet, assist us with contemplating the thoughts of an
extraordinary spot with numerous individuals. People can profit from local gatherings,
permitting them to acquire information from different clients and gaining from an assortment
of items. Also, data can assist clients with making their own thoughts or check significant
items out. Specifically, CF methods are utilized to assist clients with observing new items they
might like, get guidance on explicit items, and associate with different clients who have
comparative issues.
xx
xv
Chandigarh Engineering College, Jhanjeri, Mohali
(An Autonomous College)
Department of Computer Science & Engineering
REFERENCE
[1] M. Denil, D. Matheson, and N. De Freitas, “Narrowing the Gap: Random Forests In
Therein, M., Matheson, D., & De Freitas, N. (2014). Narrowing the Gap:
[3] Watson, F. Marir "Using retrospect, they concluded that non-Spanish whites on average
tend to go to hospitals that offer a better patient experience for all patients compared to
hospitals commonly used by African American, Hispanic, Asian / Pacific Islander, or
multiracial
patients" 1994.
[4] Binal A. Thakkar, Mosin I. Hasan, Mansi A. Desai, "Healthcare decision support
system for swine flu prediction using naïve bayes classifier",IEEE", 101-105,2010.
[9] Disease prediction and doctor recommendation system, International Research Journal
of Engineering and Technology (IRJET)
xx
xv
Chandigarh Engineering College, Jhanjeri, Mohali
(An Autonomous College)
Department of Computer Science & Engineering
APPENDICES
A. SAMPLE CODE
# Importing packages.
import numpy as np
import pandas as pd
import joblib
xx
xv
Chandigarh Engineering College, Jhanjeri, Mohali
(An Autonomous College)
Department of Computer Science & Engineering
cmap=sns.diverging_palette(220,10,as_cmap=True)
sns.heatmap(corr,cmap=cmap,vmax=.3,square=True,linewidths=6,cbar_kws={"shrink":.5})
colormap=plt.cm.viridis
plt.figure(figsize=(12,12))
plt.title('Pearson Correlation of Features', y=1.05, size=15)
sns.heatmap(data.corr(),linewidths=0.1,vmax=1.0, square=True, cmap=colormap,
linecolor='white',annot=True)
x=data.iloc[:,0:-1]
y=data.iloc[:,-1]
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=0) sc =
StandardScaler()
x_train=sc.fit_transform(x_train)
x_test=sc.fit_transform(x_test)
log = LogisticRegression()
log.fit(x_train, y_train)
y_pred=log.predict(x_test)
print(classification_report(y_test,y_pred))
print(accuracy_score(y_test,y_pred))
xx
xi
Chandigarh Engineering College, Jhanjeri, Mohali
(An Autonomous College)
Department of Computer Science & Engineering
2. HEART DISEASE PREDICTION SOURCE CODE
import numpy as np
import pandas as pd
import joblib
print(hd.columns)
print(hd.info())
corr=hd.corr("pearson")
print(corr)
x=hd.iloc[:,0:-1]
y=hd.iloc[:,-1]
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=0)
xl
Chandigarh Engineering College, Jhanjeri, Mohali
(An Autonomous College)
Department of Computer Science & Engineering
sc = StandardScaler()
x_train=sc.fit_transform(x_train)
x_test=sc.fit_transform(x_test)
rfc=RandomForestClassifier(n_estimators=10,criterion='entropy',random_state=0) rfc.fit(x_train,y_train)
y_pred = rfc.predict(x_test) score2
= rfc.score(x_test,y_test)
print(score2)
xli
Chandigarh Engineering College, Jhanjeri, Mohali
(An Autonomous College)
Department of Computer Science & Engineering
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
import joblib
data=pd.read_csv("data.csv") print(data)
s=LabelEncoder() data.iloc[:,0]=s.fit_transform(data.iloc[:,0].values)
# 1=M,0=B
data.iloc[:,0]
print(data.corr())
plt.figure(figsize=(10,10))
sns.heatmap(data.iloc[:,:12].corr(),annot=True,fmt='.0%')
x=data.iloc[:,1:-1].values
y=data.iloc[:,0].values
xlii
Chandigarh Engineering College, Jhanjeri, Mohali
(An Autonomous College)
Department of Computer Science & Engineering
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=0)
sc = StandardScaler()
x_train=sc.fit_transform(x_train)
x_test=sc.fit_transform(x_test)
#RandomforestClassifier
rfc=RandomForestClassifier(n_estimators=10,criterion='entropy',random_state=0) rfc.fit(x_train,y_train)
y_pred=rfc.predict(x_test)
xlii
i
Chandigarh Engineering College, Jhanjeri, Mohali
(An Autonomous College)
Department of Computer Science & Engineering
<?php
$servername="localhost";
$user="root";
$password="";
$dbname = "dia123";
if ($_SERVER["REQUEST_METHOD"] == "POST") {
$n1= $_POST["n1"];
$age= $_POST["age"];
$pr = $_POST["pr"];
$gl = $_POST["gl"];
xli
v
Chandigarh Engineering College, Jhanjeri, Mohali
(An Autonomous College)
Department of Computer Science & Engineering
$bp = $_POST["bp"];
$st = $_POST["st"];
$isn = $_POST["isn"];
$bmi = $_POST["bmi"];
$dpf = $_POST["dpf"];
}
<!DOCTYPE html>
<html>
<head>
<style>
#body-color
{
background-color:"#fff";
}
#student1
{
color: black; margin-
top:150px;
margin-bottom:150px; margin-
right:150px; margin-left:150px;
border:3px solid #a1a1a1;
padding:30px 35px;
background:#E6E6FA; width:
400px;
border-radius:20px;
/* box-shadow: 7px 7px 6px; */
}
#submit{
border-radius:10px; width:100px;
xl
v
Chandigarh Engineering College, Jhanjeri, Mohali
(An Autonomous College)
Department of Computer Science & Engineering
height:40px;
background:#; font-
weight:bold; font-
size:20px;
}
#reset{
border-radius:10px;
width:100px;
height:40px;
background:#fff; font-
weight:bold; font-
size:20px;
}
</style>
</head>
<title>Diagno-Care Diabetes_page</title>
<!-- <link rel="stylesheet" type="text/css" href="style-component.css"> -->
<body>
<nav class="navigation">
<div class="nav-brand">Diagno-Care</div>
<ul class="list-non-bullet nav-pills">
<li class="list-item-inline">
<a class="link " href="welcome.php">Dashboard</a>
</li>
<li class="list-item-inline">
<a class="link" href="logout .php">logout</a>
</li>
</ul>
</nav>
<div id="student1">
<p class="login-text" style="font-size: 2rem; font-weight: 800;">Diabetes Prediction</p>
<form method="POST" action="http://localhost/Diagnocare/diabetes_page.php"> Name <br> <input
id="n1" name="n1"></br></br>
Age <br> <input id="age" name="age"></br></br> Pragnacies <br>
<input id="pr" name="pr"></br></br> Glucose <br> <input id="gl"
name="gl"></br></br>
Blood Pressure <br> <input id="bp" name="bp"></br></br> Skin
Thickness <br> <input id="st" name="st"></br></br> Insulin <br> <input
id="isn" name="isn"></br></br>
xl
vi
Chandigarh Engineering College, Jhanjeri, Mohali
(An Autonomous College)
Department of Computer Science & Engineering
</br></br>
<div class="input-group">
<input type="submit" id="submit" value="Submit">
<input type="reset" id="reset" value="Reset">
</div>
</form>
</div>
</body>
</html>
B. SCREENSHOTS
Screenshot 1: Application Homepage (There are options for the user to register, login
and to know about the application)
xl
vii
Chandigarh Engineering College, Jhanjeri, Mohali
(An Autonomous College)
Department of Computer Science & Engineering
Screenshot 2: Application About Us page (To know about the application what is the
purpose and what it does).
Screenshot 3: Application Register page (new users can register here into the
application with id and password to safeguard their reports and reviews)
xl
viii
Chandigarh Engineering College, Jhanjeri, Mohali
(An Autonomous College)
Department of Computer Science & Engineering
Screenshot 4: Application Login page (Already registered users can use user id and
password to login to the application and use the application for their benefits which is
very user friendly to use.)
Screenshot 5: Application Dashboard page (where users are given different options for
their disease prediction and to get recommendation )
xli
x
Chandigarh Engineering College, Jhanjeri, Mohali
(An Autonomous College)
Department of Computer Science & Engineering
Screenshot 6: Report entry page (where users can give enter according to their diagonised
report)
Screenshot 7: Result page (where User get their report after the prediction
whether he is suffering from the disease or not.)