Web Application
Web Application
Scratch
BE G I NNE R M A C HI NE LE A RNI NG PRO G RA M M I NG PYT HO N S T RUC T URE D D AT A T E C HNI Q UE UNC AT E G O RI Z E D
The data science lifecycle is designed for big data issues and data science projects. Generally, the data
science project consists of seven steps which are problem definition, data collection, data preparation,
data exploration, data modeling, model evaluation and model deployment. This article goes through the
data science lifecycle in order to build a web application for heart disease classification.
If you would like to look at a specific step in the lifecycle, you can read it without looking deeply at the
other steps.
Problem Definition
Clinical decisions are often made based on doctors’ experience and intuition rather than on the knowledge-
rich hidden in the data. This leads to errors and many costs that affect the quality of medical services.
Using analytic tools and data modeling can help in enhancing the clinical decisions. Thus, the goal here is
to build a web application to help doctors in diagnosing hear t diseases. The full code of is available in my
GitHub repository.
Data Collection
I collected the heart disease dataset from UCI ML. The dataset has the following 14 attributes:
From the first look, the dataset contains 14 columns, 5 of them contain numerical values and 9 of them
contain categorical values.
The dataset is clean and contains all the information needed for each variable. By using info(),
describe(), isnull() functions, no errors, missing values, and inconsistencies values are detected.
By checking the percentage of the persons with and without heart diseases, it was found that 56% of the
persons in the dataset have heart disease. So, the dataset is relatively balanced.
Attributes Correlation
This heatmap shows the correlations between the dataset attributes, and how the attributes interact with
each other. From the heatmap, we can observe that the chest pain type (cp), exercise-induced angina
(exang), ST depression induced by exercise relative to rest (oldpeak), the slope of the peak exercise ST
segment (slope), number of major vessels (0–3) colored by flourosopy (ca) and thalassemia (thal) are
highly correlated with the heart disease (target). We observe also that there is an inverse proportion
between heart disease and maximum heart rate (thalch).
Moreover, we can see that the age is correlated with number of major vessels (0–3) colored by flourosopy
(ca) and maximum heart rate (thalch). There is also a relation between ST depression induced by exercise
relative to rest (oldpeak) and the slope of the peak exercise ST segment (slope). Moreover, there is a
relation between chest pain type (cp) and exercise-induced angina (exang).
2. Chest Pain
There are four types of chest pain: typical angina, atypical angina, non-anginal pain, and asymptomatic.
Most of the heart disease patients are found to have asymptomatic chest pain.
3. Thalassemia
Most of the heart disease patients are old and they have one or more major vessels colored by Flourosopy.
Data Modeling
Let’s create the machine learning model. We are trying to predict whether a person has heart disease. We
will use the ‘target’ column as the class, and all the other columns as features for the model.
– Data Splitting
We will divide the data into a training set and test set. 80% of the data will be for training and 20% for
testing.
# Split the data into training set and testing set X_train, X_test, y_train, y_test =
train_test_split(features, target, test_size = 0.2, random_state = 0)
– Machine Learning Model
Here, we will try the below machine learning algorithms then we will select the best one based on its
classification report.
def fit_eval_model(model, train_features, y_train, test_features, y_test): results = {} # Train the model
model.fit(train_features, y_train) # Test the model train_predicted = model.predict(train_features)
test_predicted = model.predict(test_features) # Classification report and Confusion Matrix
models results = {} for cls in [sv, rf, ab, gb]: cls_name = cls.__class__.__name__ results[cls_name] = {}
results[cls_name] = fit_eval_model(cls, X_train, y_train, X_test, y_test)
# Print classifiers results for result in results: print (result) print()for i in results[result]: print (i,
‘:’) print(results[result][i]) print() print (‘ — — -’) print()
From the above results, the best model is Gradient Boosting. So, I will save this model to use it for web
applications.
Model Deployment
It is time to start deploying and building the web application using Flask web application framework. For
the web app, we have to create:
1. Web app python code (API) to load the model, get user input from the HTML template, make the
prediction, and return the result.
2. An HTML template for the front end to allow the user to input heart disease symptoms of the patient and
display if the patient has heart disease or not.
/ ├── model.pkl ├── heart_disease_app.py ├── templates/ └── Heart Disease Classifier.html
You can find the full code of the web app here.
import numpy as np import pickle from flask import Flask, request, render_template
After that, we need to load the saved model model.pkl in the app.
After that home() function is called when the root endpoint ‘/’ is hit. The function redirects to the home
page Heart Disease Classifier.html of the website.
# Bind home function to URL @app.route(‘/’) def home(): return render_template(‘Heart Disease
Classifier.html’)
Now, create predict() function for the endpoint ‘/predict’. The function is defined as this endpoint
with POST method. When the user submits the form, the API receives a POST request, the API extracts all
data from the form using flask.request.form function. Then, the API uses the model to predict the
result. Finally, the function renders the Heart Disease Classifier.html template and returns the
result.
# Bind predict function to URL @app.route(‘/predict’, methods =[‘POST’]) def predict(): # Put all form
entries values in a list features = [float(i) for i in request.form.values()] # Convert features to array
array_features = [np.array(features)] # Predict features prediction = model.predict(array_features) output =
prediction # Check the output values and retrieve the result with html tag based on the value if output == 1:
return render_template(‘Heart Disease Classifier.html’, result = ‘The patient is not likely to have heart
disease!’) else: return render_template(‘Heart Disease Classifier.html’, result = ‘The patient is likely to
have heart disease!’)
Finally, start the flask server and run our web page locally on the computer by calling app.run() and then
enter http://localhost:5000 on the browser.
HTML Template
The following figure presents the HTML form. You can find the code here.
The form has 13 inputs for the 13 features and a button. The button sends POST request to the/predict
endpoint with the input data. In the form tag, the action attribute calls predict function when the form
is submitted.
<strong style="color:red">{{result}}</strong>
Summary
In this article, you learned how to create a web application for prediction from scratch. Firstly, we started
with the problem definition and data collection. Then, we worked on data preparation, data exploration,
data modeling, and model evaluation. Finally, we deployed the model using a flask.
Now, it is time to practice and apply what you learn in this ar ticle. Define a problem, search for a dataset
on the Internet, and then go through the other steps of the data science lifecycle.
Nada Alay
I am working in the data analytics field and passionate about data science, machine learning, and scientific
research. Photograph: I use the attached logo as a personal image on the Internet.
Guest Blog