Deploy Machine Learning Model using Flask

Last Updated : 12 Jul, 2025

In this article, we will build and deploy a Machine Learning model using Flask. We will train a Decision Tree Classifier on the Adult Income Dataset, preprocess the data, and evaluate model accuracy. After training, we’ll save the model and create a Flask web application where users can input data and get real-time predictions about income classification. This will demonstrate how to integrate ML models into web applications using Flask.

Installation and Setup

To create a basic flask app, refer to- Create Flask App

After creating and activating a virtual environment install Flask and other libraries required in this project using these commands-

pip install flask
pip install pandas
pip install numpy
pip install scikit-learn

File Structure

After completing the project, our file structure should look similar to this-

Dataset and Model Selection

We are using the Adult Income Dataset from the UCI Machine Learning Repository. This dataset contains information about individuals, including age, education, occupation, and marital status, with the goal of predicting whether their income exceeds $50K per year.

To download the dataset click here.

Dataset Preview-

We are goin to use the Decision Tree Classifier, a popular supervised learning algorithm. It is easy to interpret, flexible, and works well with both numerical and categorical data. The model learns patterns from historical data and predicts whether a person’s income is above or below $50K based on their attributes.

Preprocessing Dataset

Dataset consists of 14 attributes and a class label telling whether the income of the individual is less than or more than 50K a year. Before training our machine learning model, we need to clean and preprocess the dataset to ensure better accuracy and efficiency. Create a file- "preprocessing.py", it will containt the code to preprocess the dataset. Here’s how we prepare the data:

Handling Missing Values:

The dataset may contain missing values represented by "?". These are replaced with NaN, and then filled using the mode (most frequent value) of each column.

Python

# Filling missing values
df.replace("?", np.nan, inplace=True)
df.fillna(df.mode().iloc[0], inplace=True)  # Fill missing values with the mode

Simplifying Categorical Data:

The marital status column is simplified by grouping values into just two categories: "married" and "not married".

Python

# Discretization (simplifying marital status)
df.replace(['Divorced', 'Married-AF-spouse', 'Married-civ-spouse', 
            'Married-spouse-absent', 'Never-married', 'Separated', 'Widowed'],
           ['divorced', 'married', 'married', 'married', 
            'not married', 'not married', 'not married'], inplace=True)

Encoding Categorical Variables:

Machine learning models work best with numerical data, so we apply Label Encoding to convert categorical columns like workclass, education, occupation, etc., into numerical values.
A mapping dictionary is created to keep track of the original values and their encoded form and then dropping redundant values.

Python

# Discretization (simplifying marital status)
df.replace(['Divorced', 'Married-AF-spouse', 'Married-civ-spouse', 
            'Married-spouse-absent', 'Never-married', 'Separated', 'Widowed'],
           ['divorced', 'married', 'married', 'married', 
            'not married', 'not married', 'not married'], inplace=True)

# Label Encoding
category_col = ['workclass', 'race', 'education', 'marital-status', 'occupation',
                'relationship', 'gender', 'native-country', 'income']
label_encoder = preprocessing.LabelEncoder()

# Creating a mapping dictionary
mapping_dict = {}
for col in category_col:
    df[col] = label_encoder.fit_transform(df[col])
    mapping_dict[col] = dict(enumerate(label_encoder.classes_))  # Improved mapping

print(mapping_dict)

# Dropping redundant columns
df.drop(['fnlwgt', 'educational-num'], axis=1, inplace=True)

Splitting Features and Target:

The dataset is split into features (X) and target labels (Y), where the target column represents income classification (≤50K or >50K).

Python

# Splitting features and target
X = df.iloc[:, :-1].values  # All columns except last
Y = df.iloc[:, -1].values  # Only last column

Training and Saving Model

Now that we have preprocessed our dataset, we can train and save our Machine Learning Model over it. The dataset is divided into 70% training data and 30% testing data to evaluate the model’s performance and we are using pickle library to save it locally.

Python

# Train-Test Split
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.3, random_state=100)

# Initialize and Train Decision Tree Classifier
dt_clf_gini = DecisionTreeClassifier(criterion="gini", random_state=100, max_depth=5, min_samples_leaf=5)
dt_clf_gini.fit(X_train, y_train)

# Save Model Using Pickle
with open("model.pkl", "wb") as model_file:
    pickle.dump(dt_clf_gini, model_file)

Creating app.py

Create a file- "app.py", it will contain the code of our main flask app.

Python

#importing libraries
import numpy as np
import flask
import pickle
from flask import Flask, render_template, request

#creating instance of the class
app=Flask(__name__)

#to tell flask what url shoud trigger the function index()
@app.route('/')
@app.route('/index')
def index():
    return flask.render_template('index.html')
    #return "Hello World"

#prediction function
def ValuePredictor(to_predict_list):
    to_predict = np.array(to_predict_list).reshape(1,12)
    loaded_model = pickle.load(open(r"path_to_the_saved_model","rb"))
    result = loaded_model.predict(to_predict)
    return result[0]


@app.route('/result',methods = ['POST'])
def result():
    if request.method == 'POST':
        to_predict_list = request.form.to_dict()
        to_predict_list=list(to_predict_list.values())
        to_predict_list = list(map(int, to_predict_list))
        result = ValuePredictor(to_predict_list)
        
        if int(result)==1:
            prediction='Income more than 50K'
        else:
            prediction='Income less that 50K'
            
        return render_template("result.html",prediction=prediction)

if __name__ == "__main__":
	app.run(debug=True)

Code Breakdown:

Loads and serves a pre-trained ML model (model.pkl).
Accepts user input via a web form and processes it.
Makes predictions and displays results on result.html.
Runs in debug mode for easy testing.

Creating Template files

We create all the HTML files in a templates folder in flask. Here are the HTML files we need to create for this app-

index.html

This page contains a form that will take input from the user and then send to "/result" route in the app.py file that will process it and predict the output over it using the saved model.

HTML

<html>
<body>
    <h3>Income Prediction Form</h3>

<div>
  <form action="/result" method="POST">
    <label for="age">Age</label>
    <input type="text" id="age" name="age">
    <br>
    <label for="w_class">Working Class</label>
    <select id="w_class" name="w_class">
      <option value="0">Federal-gov</option>
      <option value="1">Local-gov</option>
      <option value="2">Never-worked</option>
      <option value="3">Private</option>
      <option value="4">Self-emp-inc</option>
      <option value="5">Self-emp-not-inc</option>
      <option value="6">State-gov</option>
      <option value="7">Without-pay</option>
    </select>
    <br>
    <label for="edu">Education</label>
    <select id="edu" name="edu">
      <option value="0">10th</option>
      <option value="1">11th</option>
      <option value="2">12th</option>
      <option value="3">1st-4th</option>
      <option value="4">5th-6th</option>
      <option value="5">7th-8th</option>
      <option value="6">9th</option>
      <option value="7">Assoc-acdm</option>
      <option value="8">Assoc-voc</option>
      <option value="9">Bachelors</option>
      <option value="10">Doctorate</option>
      <option value="11">HS-grad</option>
      <option value="12">Masters</option>
      <option value="13">Preschool</option>
      <option value="14">Prof-school</option>
      <option value="15">16 - Some-college</option>
    </select>
    <br>
    <label for="martial_stat">Marital Status</label>
    <select id="martial_stat" name="martial_stat">
      <option value="0">divorced</option>
      <option value="1">married</option>
      <option value="2">not married</option>
    </select>
    <br>
    <label for="occup">Occupation</label>
    <select id="occup" name="occup">
      <option value="0">Adm-clerical</option>
      <option value="1">Armed-Forces</option>
      <option value="2">Craft-repair</option>
      <option value="3">Exec-managerial</option>
      <option value="4">Farming-fishing</option>
      <option value="5">Handlers-cleaners</option>
      <option value="6">Machine-op-inspect</option>
      <option value="7">Other-service</option>
      <option value="8">Priv-house-serv</option>
      <option value="9">Prof-specialty</option>
      <option value="10">Protective-serv</option>
      <option value="11">Sales</option>
      <option value="12">Tech-support</option>
      <option value="13">Transport-moving</option>
    </select>
    <br>
    <label for="relation">Relationship</label>
    <select id="relation" name="relation">
      <option value="0">Husband</option>
      <option value="1">Not-in-family</option>
      <option value="2">Other-relative</option>
      <option value="3">Own-child</option>
      <option value="4">Unmarried</option>
      <option value="5">Wife</option>
    </select>
    <br>
    <label for="race">Race</label>
    <select id="race" name="race">
      <option value="0">Amer Indian Eskimo</option>
      <option value="1">Asian Pac Islander</option>
      <option value="2">Black</option>
      <option value="3">Other</option>
      <option value="4">White</option>
    </select>
    <br>
    <label for="gender">Gender</label>
    <select id="gender" name="gender">
      <option value="0">Female</option>
      <option value="1">Male</option>
    </select>
    <br>
    <label for="c_gain">Capital Gain </label>
    <input type="text" id="c_gain" name="c_gain">btw:[0-99999]
    <br>
    <label for="c_loss">Capital Loss </label>
    <input type="text" id="c_loss" name="c_loss">btw:[0-4356]
    <br>
    <label for="hours_per_week">Hours per Week </label>
    <input type="text" id="hours_per_week" name="hours_per_week">btw:[1-99]
    <br>
    <label for="native-country">Native Country</label>
    <select id="native-country" name="native-country">
      <option value="0">Cambodia</option>
      <option value="1">Canada</option>
      <option value="2">China</option>
      <option value="3">Columbia</option>
      <option value="4">Cuba</option>
      <option value="5">Dominican Republic</option>
      <option value="6">Ecuador</option>
      <option value="7">El Salvadorr</option>
      <option value="8">England</option>
      <option value="9">France</option>
      <option value="10">Germany</option>
      <option value="11">Greece</option>
      <option value="12">Guatemala</option>
      <option value="13">Haiti</option>
      <option value="14">Netherlands</option>
      <option value="15">Honduras</option>
      <option value="16">HongKong</option>
      <option value="17">Hungary</option>
      <option value="18">India</option>
      <option value="19">Iran</option>
      <option value="20">Ireland</option>
      <option value="21">Italy</option>
      <option value="22">Jamaica</option>
      <option value="23">Japan</option>
      <option value="24">Laos</option>
      <option value="25">Mexico</option>
      <option value="26">Nicaragua</option>
      <option value="27">Outlying-US(Guam-USVI-etc)</option>
      <option value="28">Peru</option>
      <option value="29">Philippines</option>
      <option value="30">Poland</option>
      <option value="11">Portugal</option>
      <option value="32">Puerto-Rico</option>
      <option value="33">Scotland</option>
      <option value="34">South</option>
      <option value="35">Taiwan</option>
      <option value="36">Thailand</option>
      <option value="37">Trinadad&Tobago</option>
      <option value="38">United States</option>
      <option value="39">Vietnam</option>
      <option value="40">Yugoslavia</option>
    </select>
    <br>
    <input type="submit" value="Submit">
  </form>
</div>
</body>
</html>

Output :

input_data_form

result.html

Simple page that will render the predicted output.

HTML

<!doctype html>
<html>
   <body>
       <h1> {{ prediction }}</h1>
   </body>
</html>

Complete preprocessing.py Code

Python

import os
import pandas as pd  # Use 'pd' for Pandas (standard practice)
import numpy as np
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
import pickle

# Load dataset
file_path = os.path.join("C:", "Users", "Asus", "Desktop", "Suven", "practise_DS", "adult.csv")
df = pd.read_csv(r"path_to_the_dataset")

# Filling missing values
df.replace("?", np.nan, inplace=True)
df.fillna(df.mode().iloc[0], inplace=True)  # Fill missing values with the mode

# Discretization (simplifying marital status)
df.replace(['Divorced', 'Married-AF-spouse', 'Married-civ-spouse', 
            'Married-spouse-absent', 'Never-married', 'Separated', 'Widowed'],
           ['divorced', 'married', 'married', 'married', 
            'not married', 'not married', 'not married'], inplace=True)

# Label Encoding
category_col = ['workclass', 'race', 'education', 'marital-status', 'occupation',
                'relationship', 'gender', 'native-country', 'income']
label_encoder = preprocessing.LabelEncoder()

# Creating a mapping dictionary
mapping_dict = {}
for col in category_col:
    df[col] = label_encoder.fit_transform(df[col])
    mapping_dict[col] = dict(enumerate(label_encoder.classes_))  # Improved mapping

print(mapping_dict)

# Dropping redundant columns
df.drop(['fnlwgt', 'educational-num'], axis=1, inplace=True)


# Splitting features and target
X = df.iloc[:, :-1].values  # All columns except last
Y = df.iloc[:, -1].values  # Only last column

# Train-Test Split
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.3, random_state=100)

# Initialize and Train Decision Tree Classifier
dt_clf_gini = DecisionTreeClassifier(criterion="gini", random_state=100, max_depth=5, min_samples_leaf=5)
dt_clf_gini.fit(X_train, y_train)

# Predictions
y_pred_gini = dt_clf_gini.predict(X_test)

# Accuracy Score
print("Decision Tree using Gini Index\nAccuracy:", accuracy_score(y_test, y_pred_gini) * 100)

# Save Model Using Pickle
with open("model.pkl", "wb") as model_file:
    pickle.dump(dt_clf_gini, model_file)

Running the Application

To run the application, use this command in the terminal- "python app.py" and visit the developmeent URL- "http://127.0.0.1:5000". Below is the snapshot of the output and testing.

SQL for Machine Learning

karanjekarhoshang

Improve

Article Tags :

Practice Tags :

Machine Learning

Deploy Machine Learning Model using Flask

Installation and Setup

File Structure

Dataset and Model Selection

Preprocessing Dataset

Handling Missing Values:

Simplifying Categorical Data:

Encoding Categorical Variables:

Splitting Features and Target:

Training and Saving Model

Creating app.py

Creating Template files

index.html

result.html

Complete preprocessing.py Code

Running the Application

Similar Reads

Prerequisites for Machine Learning

Getting Started with Machine Learning

Thank You!

What kind of Experience do you want to share?