VJ Proj Docs Merged
VJ Proj Docs Merged
VJ Proj Docs Merged
A Project Report Submitted for the Partial fulfillment for the award
of Degree
MASTER OF SCIENCE (COMPUTER SCIENCE)
Submitted By
VIJAY KUMAR R V
(2213102078120)
BONAFIDE CERTIFICATE
This is to certify that this project work entitled "Food Order & Delivery Time
Prediction Using Random Forest Classifier & Long Short Term Memory" is a
bonafide record work done by Mr. VIJAY KUMAR R V (Reg
No:2213102078120) in partial fulfilment for award of degree of Master of
Science(Computer Science) under our guidance and supervision, during the
academic year 2022-2024.
I take this opportunity to express my sincere thanks to everyone in guiding me to complete this
project. I thank the almighty for the blessings that have showered upon me to complete the
project successfully.
I express my sincere thanks to Dr. S. Santhosh Baboo, M.Sc., Ph.D., Principal, Dwaraka
Doss Goverdhan Doss Vaishnav College for his help and valuable guidelines for the successful
completion of project.
I would also like to convey my gratitude to my guide, Dr. P. Suganya, MCA., M.Phil., Ph.D.,
SET, Head of the Department of Computer Science(UG & PG) for her continued support by
providing me an opportunity to do my project work on "Food Order & Delivery Time
Prediction Using Random Forest Classifier & Long Short Term Memory". Her willingness to
motivate me contributed tremendously to the success of the projects.
Besides I would like to thank all the faculty members of Department of Computer Science(UG
& PG) for their support and encouragement for the successful completion of the project.
I also thank my friends for providing moral support and timely help to finish the project.
VIJAY KUMAR R V
ABSTRACT
With the vast amount of data that the world has nowadays, Companies like Swiggy, Zomato, Uber
eats, etc., use their hugeamounts of data to perform analysis on the orders received, to enhance their
delivery time & give recommendations to users based on their order type, quantity & cost of the
order. Based on the historic data, systems can give predictions that “will the customer will order
food or not” as well as “ whether the customer will order the same type of food or will choose
different cuisine” and it also helps the organizations to find the delivery time based on the historic
data of deliveries performed. These Prediction systems use the user information such as age, type
of food, type of order, ratings, etc., to predict the order confirmation & the delivery time of their
order. This system has gained importance with the rapid growth in the e-commerce industry. The
motivation for this project comes from the eagerness to get a deep understanding of Prediction
systems. A Prediction Model has been developed that uses different techniques such as Random
Forest Classifier, LSTM from keras, Matplotlib, Plotly, Seaborn, Pandas & Numpy.
.
TABLE OF CONTENTS
1 INTRODUCTION
1.1 ORGANIZATON PROFILE
2 SYSTEM ANALYSIS
2.1 EXISTING SYSTEM
3 SYSTEM CONFIGURATION
3.1 HARDWARE ENVIRONMENT
3.3.2 ANACONDA
4 SYSTEM DESIGN
4.1 SYSTEM ARCHITECTURE
6 TESTING
6.1 OBJECTIVES
7 SOURCE CODE
8 RESULTS
9 CONCLUSION
10 FUTURE ENCHANCEMENT
11 BIBLIOGRAPHY
1.1 PROJECT OVERVIEW:
The exponential growth in online food order industry has provided the users with a
wide variety of food cuisine to choose from. Online Food order platforms are modern day
restaurants with wide range of foods with different type of Cuisine. Prediction systems play the
role of customer & delivery person who would understand customer’s requirements and suggest
the right food to order. Today almost all Online Food Order Platforms use recommendation &
Prediction systems to help its users enhanced user experiences by providing suggestions which the
user might like and about the delivery Time. The success of any Prediction system depends on
whether it can continuously provide the organization with correct & exact prediction that they
might expect. With advancement in data acquisition and reduced costs of data storage, various
types of user’s activity and feedbacks can be recorded easily. However, the performance of any
Prediction system depends on the quality of feedbacks or user opinions it uses for generating
Predictions. It is generally observed that the present-day Prediction systems are able to process
large quantity of data, but the quality of Prediction has a large scope of improvement. Typical
Prediction systems use user information to Predict the result, but some users are not much bothered
while entering their information and this leads to poor Prediction. With advancement in text
mining and sentiment analysis, reviews have become a valuable feedback by users which can be
processed and used for multiple purposes. This paper proposes an improvised user Information &
review-based Prediction system that uses information & review as input to get the exact prediction
for improvement. The result section verifies the importance of proposed method as we can observe
a significant improvement in Predictions by user input.
2. SYSTEM ANALYSIS
Advantages:
Huge number of dataset collected
More number of prediction from the user input
High accuracy
Increase speed
3. SYSTEM CONFIGURATION
Language : Python
Software : Anaconda 3(64 bit)
3.3 ABOUT THE SOFTWARE
3.3.1 PYTHON CONFIGURATION:
Python is a widely used general-purpose, high level programming language. It was created
by Guido van Rossum in 1991 and further developed by the Python Software Foundation. It was
designed with an emphasis on code readability, and its syntax allows programmers to express their
concepts in fewer lines of code. Python is a programming language that lets you work quickly and
integrate systems more efficiently. The major Python versions: Python 3. Both are quite different.
Python is a general-purpose language, which means it can be used to build just about
anything, which will be made easy with the right tools/libraries. Professionally, Python is great
for backend web development, data analysis, artificial intelligence, and scientific computing.
FEATURES:
USES:
Python uses code modules that are interchangeable instead of a single long list of instructions
that was standard for functional programming languages. Python doesn’t convert its code into
machine code, something that hardware can understand. It actually converts it into something
called byte code. So within python, compilation happens, but it’s just not into a machine language.
It is into byte code and this byte code can’t be understood by CPU. So we need actually an
interpreter called the python virtual machine. The python virtual machine executes the byte codes.
Step 1: The interpreter reads a python code or instruction. Then it verifies that the instruction
is well formatted, i.e. it checks the syntax of each line. If it encounters any error, it immediately
halts the translation and shows an error message.
Step 2: If there is no error, i.e. if the python instruction or code is well formatted then the
interpreter translates it into its equivalent form in intermediate language called “Byte code”. Thus,
after successful execution of Python script or code, it is completely translated into Byte code.
Step 3: Byte code is sent to the Python Virtual Machine(PVM).Here again the byte code is
executed on PVM. If an error occurs during this execution then the execution is halted with an
error message.
ADVANTAGES:
Presence of Third Party Modules.
Extensive Support Libraries
Open Source and Community Development
Learning Ease and Support Available
User-friendly Data Structures
Productivity and Speed.
Portable and Interactive.
Python has been gaining popularity and is considered as one of the most popular and flexible
server-side programming languages. You can install Python on your Windows server or local
machine in just a few easy steps:
If you opted to install an older version of Python, it is possible that it did not come with
Pip preinstalled. Pip is a powerful package management system for Python software
packages. Thus, make sure that you have it installed.
If you are installing a different version of Python, you can expect a similar process.
Choose your version carefully, make sure that you have Pip installed, and use virtual
environments when developing multiple projects on a single system.
PYTHON APPLICATIONS
Anaconda distribution comes with over 250 packages automatically installed, and over 7,500
additional open-source packages can be installed from PyPI as well as the conda package and virtual
environment manager. It also includes a GUI, Anaconda Navigator, as a graphical alternative to the
command line interface (CLI).
The big difference between conda and the pip package manager is in how package dependencies
are managed, which is a significant challenge for Python data science and the reason conda exists.
When pip installs a package, it automatically installs any dependent Python packages without
checking if these conflict with previously installed packages. It will install a package and any of its
dependencies regardless of the state of the existing installation. Because of this, a user with a working
installation of, for example, Google Tensorflow, can find that it stops working having used pip to install
a different package that requires a different version of the dependent numpy library than the one used by
Tensorflow. In some cases, the package may appear to work but produce different results in detail.
In contrast, conda analyses the current environment including everything currently installed, and,
together with any version limitations specified (e.g. the user may wish to have Tensorflow version2,0 or
higher), works out how to install a compatible set of dependencies, and shows a warning if this cannot
be done.
12
Open source packages can be individually installed from the Anaconda repository, Anaconda
Cloud (anaconda.org), or the user's own private repository or mirror, using the conda install command.
Anaconda, Inc. compiles and builds the packages available in the Anaconda repository itself, and
provides binaries for Windows 32/64 bit, Linux 64 bit and MacOS 64-bit. Anything available on
PyPI may be installed into a conda environment using pip, and conda will keep track of what it has
installed itself and what pip has installed.
Custom packages can be made using the conda build command, and can be shared with others
by uploading them to Anaconda Cloud, PyPI or other repositories.
The default installation of Anaconda2 includes Python 2.7 and Anaconda3 includes Python 3.7.
However, it is possible to create new environments that include any version of Python packaged with
conda.
13
Anaconda Navigator
Anaconda Navigator is a desktop graphical user interface (GUI) included in Anaconda distribution
that allows users to launch applications and manage conda packages, environments and channels without
using command-line commands. Navigator can search for packages on Anaconda Cloud or in a local
Anaconda Repository, install them in an environment, run the packages and update them. It is available
for Windows, macOS and Linux.
JupyterLab
Jupyter Notebook
QtConsole
Spyder
Glue
Orange
RStudio
14
JUPYTER NOTEBOOK
Jupyter Notebook can connect to many kernels to allow programming in different languages. By
default, Jupyter Notebook ships with the IPython kernel. As of the 2.3 release[11][12] (October 2014), there
are currently 49 Jupyter-compatible kernels for many programming languages, including Python,
R, Julia and Haskell.
The Notebook interface was added to IPython in the 0.12 release[14] (December 2011), renamed
to Jupyter notebook in 2015 (IPython 4.0 – Jupyter 1.0). Jupyter Notebook is similar to the notebook
interface of other programs such as Maple, Mathematica, and SageMath, a computational interface style
that originated with Mathematica in the 1980s. According to The Atlantic, Jupyter interest overtook the
popularity of the Mathematica notebook interface in early 2018.
15
Install Office 2010
OneNote 2010 Excel 2010 Word 2010 Outlook 2010 PowerPoint 2010 More...
Click Start, > Computer, and right-click in the disc drive where the Office 2010 installation
disc was inserted, and select Open.
Open the x64 folder in the installation root, and then double-click setup.exe.
After setup completes, continue by following the default installation instructions, by entering
the product key.
16
OPENING CSV
17
3. Navigate to the CSV file you wish to open and click “Import”.
18
5. Check the box next to the type of delimiter – in most cases this is either a semicolon or a comma.
Then click “Next”.
19
6. Click “Finish”.
20
VISUAL STUDIO CODE:
Visual Studio Code is a lightweight but powerful source code editor which runs on your desktop
and is available for Windows, macOS and Linux. It comes with built-in support for JavaScript, TypeScript
and Node.js and has a rich ecosystem of extensions for other languages and runtimes (such as C++, C#,
Java, Python, PHP, Go, .NET).
Features include support for
Debugging,
Syntax highlighting,
Intelligent code completion,
Snippets,
Code refactoring,
Embedded Git.
21
STEP 2: Once it is downloaded, run the installer (VSCodeUserSetup-{version}.exe). This will only take a
minute.
22
4. SYSTEM DESIGN
PURPOSE:
The purpose of System Architecture process is to generate system architecture alternatives,
to select one or more alternative(s) that frame stakeholder concerns and meet systemrequirements,
and to express this is a set of consistent views.
It should be noted that the architecture activities below overlap with both system definition
and concept definition activities. In particular that key aspects of the operational and business
context, and hence certain stakeholder needs, strongly influence the approach taken to architecture
development and description. Also the architecture activities will drive the selection of, and fit
within, whatever approach to solution synthesis has been selected.
23
4.2 ARCHITECTURE DIAGRAM
DATA Basic
Extraction of
Operations
SOURCE Dataset
LSTM
To Predict the
delivery time
24
4.3 DATA FLOW DIAGRAM
Output Prediction
Analysis, Preprocess
& Visualizing
Data Source
25
4.4 MODULE DIAGRAM:
1. To Pre-process th e data
Dataset
To Convert String
into Numerical
26
2. To Visualize the data
Data Visualization
Pre-processedData
To show data as a
percentage of a whole Pie Chart
27
3. To find the whether the user will order food or not
28
4. To find the food order delivery time
LSTM
To find out the
delivery time
29
5. SYSTEM IMPLEMENTATION
5.1 SOFTWARE DESCRIPTION:
INTRODUCTION TO PYTHON:
Machine learning (ML) is the study of computer algorithms that improve automatically
through experience. It is seen as a subset of artificial intelligence. Machine learning algorithms
build a mathematical model based on sample data, known as "training data", in order to make
predictions or decisions without being explicitly programmed to do so. Machine learning
algorithms are used in a wide variety of applications, such as email filtering and computer vision,
where it is difficult or infeasible to develop conventional algorithms to perform the needed tasks.
Machine learning (ML) is a type of artificial intelligence (AI) that allows software
applications to become more accurate at predicting outcomes without being explicitly
programmed to do so. Machine learning algorithms use historical data as input to predict new
output values. Recommendation engines are a common use case for machine learning. Other
popular uses include fraud detection, spam filtering, malware threat detection, business process
automation (BPA) and Predictive maintenance.
30
Machine learning is important because it gives enterprises a view of trends in
customer behavior and business operational patterns, as well as supports the development of new
products. Many of today's leading companies, such as Facebook, Google and Uber, make
machine learning a central part of their operations. Machine learning has become a significant
competitive differentiator for many companies.
31
METHOD:
Supervised learning:
Supervised learning is a type of machine learning paradigm where the algorithm learns
a mapping between input data and corresponding output labels from labeled training data. The
term "supervised" indicates that the algorithm is guided by a supervisor or teacher who provides
the correct answers during training. This learning approach is widely used in various fields,
including classification, regression, and forecasting.
Data Collection: The first step in supervised learning is to gather a dataset that
consists of input-output pairs. The input data can be represented as features, while
the output labels represent the target variable or the desired prediction.
Data Preprocessing: Before feeding the data into the model, preprocessing steps
are often performed to clean, normalize, and transform the data. This may include
handling missing values, feature scaling, encoding categorical variables, and
splitting the dataset into training and testing sets.
Model Selection: Choosing an appropriate model architecture is crucial for the
success of supervised learning. Common types of supervised learning algorithms
include:
- Classification: Used when the output variable is a category,
such as binary classification (e.g., spam detection) or multiclass classification
(e.g., image recognition).
- Regression: Employed when the output variable is a
continuous value, such as predicting house prices or stock prices.
- Time Series Forecasting: Deals with predicting future
values based on past observations, commonly used in financial forecasting,
weather prediction, and demand forecasting.
32
Training the Model: In supervised learning, the model is trained using
the labeled training data. During training, the algorithm adjusts its
parameters iteratively to minimize the discrepancy between the
predicted output and the actual output. This process typically involves
an optimization algorithm such as gradient descent.
Evaluation: After training the model, it is essential to evaluate its
performance using a separate validation dataset or through cross-
validation techniques. Common evaluation metrics depend on the
problem type and may include accuracy, precision, recall, F1 score for
classification, or mean squared error (MSE), mean absolute error
(MAE) for regression.
Model Tuning: Based on the evaluation results, the model may need to
be fine-tuned by adjusting hyperparameters or trying different
algorithms. This process aims to improve the model's performance and
generalization ability.
Deployment: Once the model achieves satisfactory performance, it can
be deployed in real-world applications to make predictions on new,
unseen data. Deployment may involve integrating the model into
existing systems or developing standalone applications.
Monitoring and Maintenance: After deployment, the model's
performance should be continuously monitored to ensure it remains
effective and accurate over time. Periodic retraining may be necessary
to adapt to changing data distributions or patterns.
33
s Start
Data Collection
Data Preprocessing
Model Selection
Evaluation
Model Tuning
Deployment
Monitoring &
Maintenance
End
End
34
CLASSIFICATION:
Classification is a fundamental task in supervised learning that involves categorizing input
data into predefined classes or categories based on their features. It is a vital tool across various
domains, including finance, healthcare, natural language processing, image recognition, and
many others. At its core, classification aims to discover patterns and relationships within the data
that distinguish one class from another, enabling the algorithm to make accurate predictions on
unseen instances. The process of classification typically begins with the collection of labeled
training data, where each data point is associated with a class label. These labels serve as the
ground truth or the correct answers that guide the learning process. The input data, often referred
to as features or attributes, can take various forms depending on the problem domain. For
example, in text classification tasks, features may include word frequencies, while in image
classification, they may comprise pixel values or image descriptors. Once the training data is
collected, preprocessing steps are often applied to clean, normalize, and transform the data to
make it suitable for modeling. This may involve handling missing values, removing noise, scaling
features, and encoding categorical variables. Preprocessing ensures that the data is in a suitable
format for training the classification model and helps improve its performance. The selection of
an appropriate classification algorithm is crucial in achieving accurate predictions. There are
several algorithms available for classification, each with its strengths and weaknesses.
Commonly used algorithms include logistic regression, decision trees, support vector machines
(SVM), k-nearest neighbors (KNN), random forests, and neural networks. The choice of
algorithm depends on factors such as the nature of the data, the size of the dataset, computational
resources, and the interpretability of the model. Once the algorithm is selected, the model is
trained using the labeled training data. During training, the algorithm learns to map input features
to their corresponding class labels by adjusting its internal parameters iteratively. This process
involves optimizing a predefined objective function, often referred to as the loss function, which
measures the disparity between the predicted class labels and the true labels in the training data.
Gradient-based optimization techniques such as gradient descent are commonly used to minimize
the loss function and update the model parameters.
35
After training, the performance of the classification model is evaluated using a separate
validation dataset or through cross-validation techniques. Various evaluation metrics can be used
to assess the model's performance, depending on the problem domain and the nature of the
classes. For binary classification tasks, common metrics include accuracy, precision, recall, F1
score, and receiver operating characteristic (ROC) curve analysis. For multiclass classification
tasks, metrics such as confusion matrix, precision-recall curve, and multi-class ROC analysis
may be used. Model evaluation provides insights into the model's strengths and weaknesses and
helps identify areas for improvement. It also facilitates comparisons between different models
and algorithms to determine the most suitable approach for the given problem. In cases where
the model's performance is unsatisfactory, fine-tuning techniques such as hyperparameter
optimization, feature engineering, or ensemble methods may be employed to enhance
performance. Once the classification model achieves satisfactory performance on the validation
data, it can be deployed in real-world applications to make predictions on new, unseen instances.
Deployment may involve integrating the model into existing systems or developing standalone
applications. It is essential to monitor the model's performance in production and periodically
update it as needed to adapt to changes in the data distribution or the problem domain.
36
REGRESSION:
Regression in supervised learning is a fundamental concept that plays a crucial role in
modeling and predicting continuous outcomes from data. In essence, regression involves
establishing a relationship between input variables (predictors) and a continuous output variable
(response). Unlike classification, where the output is categorical, regression aims to predict
numeric values within a range. This predictive modeling technique finds extensive application in
various fields, including finance, economics, healthcare, engineering, and environmental science,
among others. At its core, regression analysis seeks to model the relationship between one or
more independent variables and a dependent variable by fitting a mathematical function to the
observed data. The overarching goal is to understand and quantify how changes in the
independent variables affect the outcome. The simplest form of regression is linear regression,
where the relationship between the input variables and the output variable is assumed to be linear.
In linear regression, the model seeks to find the best-fitting line that minimizes the difference
between the observed values and the values predicted by the model. However, real-world data
often exhibit more complex relationships that cannot be adequately captured by a linear model.
This is where nonlinear regression techniques come into play. Nonlinear regression models can
capture more intricate patterns by allowing the relationship between the variables to be nonlinear.
Polynomial regression, for instance, extends linear regression by introducing polynomial terms
of the independent variables. Other nonlinear regression methods include logistic regression,
which is used for binary classification tasks, and support vector regression (SVR), which employs
support vector machines to perform regression. In addition to capturing nonlinear relationships,
regression analysis also addresses challenges such as overfitting and underfitting. Overfitting
occurs when the model learns to memorize the training data rather than generalize to new, unseen
data. On the other hand, underfitting arises when the model is too simple to capture the underlying
structure of the data.
37
To mitigate these issues, various techniques such as regularization, cross-validation, and
model selection are employed. Regularization methods like ridge regression and Lasso regression
add a penalty term to the loss function, discouraging overly complex models. Cross-validation
helps assess a model's performance on unseen data by splitting the dataset into training and testing
subsets. Model selection involves comparing different regression models and selecting the one
that strikes the right balance between simplicity and predictive accuracy. Moreover, regression
analysis provides valuable insights into the relationships between variables and enables data-
driven decision-making. For instance, in finance, regression models can be used to predict stock
prices, analyze the impact of interest rates on economic indicators, or forecast demand for
financial products. In healthcare, regression analysis can help identify risk factors for diseases,
predict patient outcomes, or estimate the effectiveness of medical treatments. Environmental
scientists use regression models to study the relationship between environmental factors (such as
temperature, precipitation, and pollution levels) and phenomena like climate change, species
distribution, and air quality. Furthermore, regression analysis can be extended to handle more
complex scenarios involving multiple predictors and interactions between variables. Multiple
regression allows for the inclusion of multiple independent variables in the model, enabling the
analysis of their combined effects on the outcome variable. Interaction terms can be incorporated
to capture synergistic or antagonistic relationships between predictors. Additionally, advanced
regression techniques such as time series analysis and spatial regression account for temporal or
spatial dependencies in the data, respectively.
38
TIME SERIES FORECASTING:
Time series forecasting is a branch of supervised learning that focuses on predicting future values
based on historical data. It is a critical component in various fields such as finance, economics,
weather forecasting, energy demand forecasting, and more. The fundamental concept behind time
series forecasting lies in analyzing sequential data points collected over regular intervals of time.
These data points represent a variable's behavior over time, and the objective is to develop a model
that can capture the underlying patterns and trends in the data to make accurate predictions about
its future values. The process of time series forecasting typically involves several stages, starting
with data collection and preprocessing. In this stage, historical time series data is gathered, which
may include observations of a single variable or multiple variables over time. Data preprocessing
steps are then applied to clean the data, handle missing values, and address any anomalies or
outliers that might affect the forecasting model's performance. Additionally, the data may be
transformed or normalized to ensure that it meets the assumptions of the chosen forecasting
algorithm. Once the data is prepared, the next step involves selecting an appropriate forecasting
model. There are various algorithms and techniques available for time series forecasting, ranging
from simple statistical methods to more complex machine learning approaches. Some of the
commonly used techniques include autoregressive integrated moving average (ARIMA),
exponential smoothing methods (such as Holt-Winters), and machine learning algorithms like
support vector machines (SVM), random forests, or deep learning models such as recurrent neural
networks (RNNs) and Long Short-Term Memory networks (LSTMs). The choice of model
depends on factors such as the characteristics of the data, the forecasting horizon, and the desired
level of accuracy. Once a model is selected, it is trained using historical data. In supervised
learning, this involves splitting the dataset into training and testing sets, with the training set used
to train the model and the testing set used to evaluate its performance. During training, the model
learns the patterns and relationships present in the historical data, which it can then use to make
predictions about future values.
39
The training process typically involves adjusting the model's parameters iteratively to minimize
the difference between the predicted values and the actual values in the training set. After the
model is trained, it is evaluated using the testing set to assess its performance and generalization
ability. Various metrics can be used to evaluate the accuracy of the forecasts, including mean
absolute error (MAE), mean squared error (MSE), root mean squared error (RMSE), and others.
These metrics provide insight into how well the model is performing and can help identify areas
for improvement. Once the model's performance is deemed satisfactory, it can be deployed to
make predictions on new, unseen data. In a production environment, the model may need to be
updated periodically to adapt to changes in the underlying data patterns or to incorporate new
information. This process of model maintenance and monitoring is essential to ensure that the
forecasts remain accurate and reliable over time. Time series forecasting is a powerful tool for
predicting future values based on historical data, with applications ranging from financial
forecasting and inventory management to weather prediction and resource planning. By leveraging
the principles of supervised learning and the wealth of data available in time series datasets,
organizations can gain valuable insights into future trends and make informed decisions to
optimize their operations and drive business success.
40
Applications of supervised learning
• Spam Detection: Email service providers utilize supervised learning to detect spam emails by
training classifiers on labeled datasets containing examples of both spam and legitimate emails.
This enables users to have a cleaner inbox by automatically filtering out unwanted messages.
• Medical Diagnosis: Supervised learning plays a vital role in medical imaging and diagnosis,
assisting healthcare professionals in accurately identifying diseases and conditions from medical
images like X-rays, MRIs, and CT scans. Models trained on labeled medical data can aid in early
detection and treatment planning for various illnesses.
• Sentiment Analysis: Companies leverage supervised learning techniques to analyze text data
from social media, customer reviews, and surveys to understand public opinion and sentiment
towards their products or services. This enables businesses to gather valuable insights for
improving customer satisfaction and brand reputation.
• Financial Forecasting: Supervised learning models are utilized in finance for predicting stock
prices, currency exchange rates, and market trends. By analyzing historical financial data, these
models help investors and financial institutions make informed decisions regarding investment
strategies and risk management.
41
• Autonomous Vehicles: Supervised learning algorithms are integral to the development of
autonomous vehicles, enabling them to perceive their surroundings and make real-time decisions
based on sensor data. This includes tasks like object detection, lane detection, and traffic sign
recognition, contributing to safer and more efficient transportation systems.
• Fraud Detection: Financial institutions utilize supervised learning to detect fraudulent activities
by analyzing transaction data and identifying patterns indicative of fraudulent behavior. This helps
prevent financial losses and protects customers from fraudulent transactions.
42
Advantages of Supervised learning:
Supervised learning offers several advantages that make it a widely-used and powerful approach
in machine learning:
1. Predictive Accuracy: Supervised learning algorithms can achieve high levels of predictive
accuracy when trained on labeled data. By learning from examples with known outcomes, these
algorithms can make informed predictions on new, unseen data, which is crucial for tasks such as
classification, regression, and forecasting.
2. Interpretability: Supervised learning models are often interpretable, meaning that the
relationship between input features and output labels can be understood and explained. This
transparency is valuable in domains where understanding the factors influencing predictions is
essential, such as healthcare and finance.
4. Versatility: Supervised learning can be applied to a wide range of tasks and domains, including
classification, regression, time series forecasting, natural language processing, computer vision,
and more. This versatility makes it suitable for addressing various real-world problems across
different industries.
5. Feature Engineering: Supervised learning often involves feature engineering, where domain-
specific knowledge is used to create informative input features that enhance the model's
performance. This process allows practitioners to extract relevant information from raw data and
improve the model's ability to generalize to new data.
43
6. Availability of Labeled Data: In many domains, labeled data is readily available or can be
generated through manual annotation or crowdsourcing. This abundance of labeled data facilitates
the training of supervised learning models and enables them to achieve higher levels of
performance compared to unsupervised or semi-supervised approaches.
7. Feedback Loop: Supervised learning models can be continuously updated and improved over
time as new labeled data becomes available. This feedback loop allows models to adapt to
changing conditions, incorporate new knowledge, and maintain relevance and accuracy in
dynamic environments.
8. Evaluation Metrics: Supervised learning provides clear evaluation metrics for assessing model
performance, such as accuracy, precision, recall, F1 score for classification, and mean squared
error (MSE), mean absolute error (MAE) for regression. These metrics provide quantifiable
measures of the model's effectiveness and guide the iterative improvement process.
44
Challenges of unsupervised learning
While supervised learning offers many advantages, it also comes with certain limitations and
disadvantages that should be considered:
1. Dependency on Labeled Data: Supervised learning requires a large amount of labeled data for
training, which can be time-consuming and expensive to acquire, especially for tasks where
obtaining labels is challenging or requires domain expertise. The quality of the labeled data also
directly impacts the performance of the model, and errors in labeling can propagate to the trained
model, leading to biased or inaccurate predictions.
2. Limited Generalization: Supervised learning models are prone to overfitting, where the model
learns to memorize the training data instead of capturing underlying patterns and relationships.
Overfitting occurs when the model is too complex relative to the amount of training data, leading
to poor generalization performance on unseen data. Conversely, if the model is too simplistic, it
may underfit the data and fail to capture important patterns, resulting in suboptimal performance.
3. Difficulty with Unstructured Data: Supervised learning techniques may struggle with
unstructured or high-dimensional data, such as text, images, and audio, which often require
specialized preprocessing and feature engineering to extract meaningful information. Designing
effective features manually can be challenging and may not capture all relevant information
present in the data, limiting the performance of the model.
4. Bias and Fairness Concerns: Supervised learning models can inherit biases present in the
training data, leading to unfair or discriminatory outcomes, particularly in sensitive domains such
as hiring, lending, and criminal justice. Biases in the data, such as underrepresentation or
misrepresentation of certain groups, can result in biased predictions and perpetuate existing
inequalities, posing ethical and social risks.
45
5. Difficulty with Imbalanced Data: In classification tasks with imbalanced class distributions,
where one class is significantly more prevalent than others, supervised learning models may
exhibit poor performance, as they tend to favor the majority class and overlook minority classes.
Imbalanced data can lead to biased evaluations and misleading conclusions, especially if the
minority class is of interest.
6. Lack of Interpretability: Some supervised learning models, such as deep neural networks, are
highly complex and black-box in nature, making it challenging to interpret their internal
workings and understand the factors influencing their predictions. Lack of interpretability can
hinder trust and acceptance of the model by users and stakeholders, particularly in critical
applications where transparency is essential.
7. Data Privacy and Security Risks: Supervised learning models trained on sensitive or
proprietary data may raise concerns about data privacy and security, particularly if the models
are vulnerable to adversarial attacks or unauthorized access. Exposing sensitive information
through the model's predictions or model inversion attacks can lead to privacy breaches and legal
implications, necessitating robust security measures and compliance with regulations.
46
5.2 MODULE SPECIFICATION:
Data Collection:
A dataset (or data set) is a collection of data, usually presented in tabular form.
Each column represents a particular variable. Each row corresponds to a given member of the
dataset in question. It lists values for each of the variables, such as height and weight of an object.
Each value is known as a datum.
We have chosen to use a publicly-available frequent itemset which contains a
relatively small number of inputs and cases. The data is arranged in such a way that will allow
those trained in medical disciplines to easily draw parallels between familiar statistical and novel
ML techniques. Additionally, the compact dataset enables short computational times on almost
all modern computers. Datasets are collected from Kaggle open source website. That dataset
includes product details and user details.
To improve the performance of the product, I have collected the following data
47
Data Preprocessing:
The sklearn.preprocessing package provides several common utility functions
and transformer classes to change raw feature vectors into a representation that is more suitable
for the downstream estimators.
In general, learning algorithms benefit from standardization of the data set. If some outliers
are present in the set, robust scalers or transformers are more appropriate. The behaviors of the
different scalers, transformers, and normalizers on a dataset containing marginal outliers are
highlighted in Compare the effect of different scalers on data with outliers.
48
Correlation:
In Python, correlation refers to the statistical measure of the relationship between
two variables. It quantifies the extent to which changes in one variable correspond
to changes in another. The correlation coefficient typically ranges from -1 to 1,
where -1 indicates a perfect negative correlation, 1 indicates a perfect positive
correlation, and 0 indicates no correlation. The corr() function in libraries like
Pandas or NumPy calculates the correlation coefficient between variables, aiding in
understanding the strength and direction of their relationship in datasets.
49
Scaling features to a range
In practice we often ignore the shape of the distribution and just transform the data to center
it by removing the mean value of each feature, then scale it by dividing non-constant features by
their standard deviation.
For instance, many elements used in the objective function of a learning algorithm (such as
the RBF kernel of Support Vector Machines or the l1 and l2 regularizers of linear models)
assume that all features are centered around zero and have variance in the same order. If a feature
has a variance that is orders of magnitude larger than others, it might dominate the objective
function and make the estimator unable to learn from other features correctly as expected.
An alternative standardization is scaling features to lie between a given minimum and
maximum value, often between zero and one, or so that the maximum absolute value of each
feature is scaled to unit size. This can be achieved using MinMaxScaler or MaxAbsScaler,
respectively.
The motivation to use this scaling include robustness to very small standard deviations
of features and preserving zero entries in sparse data. MaxAbsScaler works in a very similar
fashion, but scales in a way that the training data lies within the range [-1,1] by dividing through
the largest maximum value in each feature. It is meant for data that is already centered at zero or
sparse data.
Normalization
Normalization is the process of scaling individual samples to have unit norm. This process can be
useful if you plan to use a quadratic form such as the dot-product or any other kernel to quantify
the similarity of any pair of samples. This assumption is the base of the VectorSpace Model often
used in text classification and clustering contexts.
50
Data Visualization:
Data visualization is very critical to market research where both numerical and categorical data
can be visualized, which helps in an increase in the impact of insights and also helps in reducing
the risk of analysis paralysis.
Temporal Visualization: Temporal visualization focuses on visualizing data over time.
This category includes time series plots, timeline charts, calendar heatmaps, and animated
visualizations that show changes in data over different time intervals. Temporal
visualization is commonly used in financial analysis, weather forecasting, and tracking
trends in various fields over time.
Spatial Visualization: Spatial visualization involves representing data on maps or
geographical layouts. This category includes choropleth maps, point maps, heatmaps, and
cartograms. Spatial visualization is useful for visualizing geographical patterns, spatial
relationships, and distribution of data across regions. It is widely used in fields such as
geography, urban planning, and environmental science.
Hierarchical Visualization: Hierarchical visualization is used to represent data organized
in hierarchical structures, such as trees or networks. Examples include dendrogram trees,
sunburst charts, treemaps, and network graphs. Hierarchical visualization helps in
visualizing relationships and dependencies between different levels of data hierarchy, such
as organizational structures, file directories, or genealogical trees.
Multidimensional Visualization: Multidimensional visualization deals with visualizing
data with multiple dimensions or attributes. This category includes scatter plots, parallel
coordinate plots, radar charts, and 3D plots. Multidimensional visualization enables
exploration of relationships and patterns among multiple variables simultaneously, aiding
in data analysis and decision-making in fields such as marketing, social sciences, and
51
engineering.
Statistical Visualization: Statistical visualization focuses on visualizing statistical
distributions, relationships, and summary statistics of data. This category includes
histograms, box plots, violin plots, scatter plots with trend lines, and error bars. Statistical
visualization helps in understanding the central tendency, dispersion, and variability of
data, facilitating hypothesis testing and inference in scientific research and data analysis.
Textual Visualization: Textual visualization involves representing textual data visually to
extract insights and patterns. This category includes word clouds, text networks, sentiment
analysis plots, and topic models. Textual visualization helps in summarizing large volumes
of text, identifying key themes and sentiment, and exploring relationships between textual
elements in fields such as natural language processing, social media analysis, and content
analysis.
Interactive Visualization: Interactive visualization allows users to interact with the data
visualization dynamically, exploring different aspects of the data, filtering, zooming, and
drilling down into details. This category includes interactive dashboards, sliders, tooltips,
and zoomable plots. Interactive visualization enhances engagement and enables users to
gain deeper insights into the data, supporting data exploration and decision-making in
various domains.
52
Advantages of Data Visualization
Better Agreement: In business, for numerous periods, it happens that we need to look atthe
exhibitions of two components or two situations. A conventional methodology is to experience
the massive information of both the circumstances and afterward examine it. This will clearly
take a great deal of time.
A Superior Method: It can tackle the difficulty of placing the information of both perspectives
into the pictorial structure. This will unquestionably give a superior comprehension of the
circumstances. For instance, Google patterns assist us with understanding information identified
with top ventures or inquiries in pictorial or
graphical structures.
Simple Sharing of Data: With the representation of the information, organizations present
another arrangement of correspondence. Rather than sharing the cumbersome information,
sharing the visual data will draw in and pass on across the data which is more absorbable.
Deals Investigation: With the assistance of information representation, a salesman can, without
much of a stretch, comprehend the business chart of items. With information perception
instruments like warmth maps, he will have the option to comprehend the causes that are
pushing the business numbers up just as the reasons that are debasing the business numbers.
Information representation helps in understanding the patterns and furthermore, different
variables like sorts of clients keen on purchasing, rehash clients, the impact of topography, and
so forth.
53
Histogram:
A histogram plot in Python is a graphical representation of the distribution of numerical data, where
data is divided into bins, and the frequency or count of data points falling into each bin is represented
by the height of corresponding bars along a single axis.
The following histogram is used to identify and determine the highest preference of an age which can be
represented in terms of yes or no.
54
Scatter Plot:
A scatter plot in Python is a graphical representation of data points where each point represents the
value of two variables, typically plotted on the x-axis and y-axis, to visualize the relationship or
correlation between them.
The following scatter plot is used to identify and determine the relationship between the age and time
taken based on the distance.
55
Pie Chart:
A pie chart in Python is a circular statistical visualization that represents data as slices of a pie, with
each slice's size proportional to the data it represents.
The following Pie Chart is used to identify and determine the distribution of orders based on their
income.
56
Box Plot:
A box plot in Python is a graphical representation of the distribution of numerical data through
quartiles, highlighting the median, outliers, and range of the data set.
The following Box Plot is used to identify and determine the relationship between the type of vehicle
and time taken based on the type of order.
57
6. TESTING:
System testing is the stage of implementation, which aimed at ensuring that system works
accurately and efficiently before the live operation commence. Testing is the process of executing a
program with the intent of finding an error. A good test case is one that has a high probability of
finding an error. A successful test is one that answers a yet undiscovered error.
Testing is vital to the success of the system. System testing makes a logical assumption
that if all parts of the system are correct, the goal will be successfully achieved. A series of tests are
performed before the system is ready for the user acceptance testing. Any engineered product can be
tested in one of the following ways. Knowing the specified function that a product has been designed
to from, test can be conducted to demonstrate each function is fully operational. The purpose of testing
is to discover errors. Testing is the process of trying to discover every conceivable fault or weakness
in a work product. Software system meets its requirements and user expectations and does not fail in
an unacceptable manner. There are various types of test. Each test type addresses a specific testing
requirement.
OBJECTIVES:
Software Testing has different goals and objectives. The major objectives of Software testing
are as follows:
Finding defects which may get created by the programmer while developing the software.
Gaining confidence in and providing information about the level of quality.
To prevent defects.
To make sure that the end result meets the business and user requirements.
To ensure that it satisfies the BRS that is Business Requirement Specification and SRS that
is System Requirement Specifications.
To gain the confidence of the customers by providing them a quality product
58
6.2 TYPES OF TESTING
Unit testing involves the design of test cases that validate that the internal program logic
is functioning properly, and that program input produces valid outputs. All decision branches and
internal code flow should be validated. It is the testing of individual software units of the
application .it is done after the completion of an individual unit before integration. This is a
structural testing, that relies on knowledge of its construction and is invasive. Unit tests perform
basic tests at component level and test a specific business process, application, and/or system
configuration. Unit tests ensure that each unique path of a business process performs accurately to
the documented specifications and contains clearly defined inputs and expected results.
Integration tests are designed to test integrated software components to determine if they
actually run as one program. Testing is event driven and is more concerned with the basic outcome of
screens or fields. Integration tests demonstrate that although the components were individually
satisfaction, as shown by successfully unit testing, the combination of components is correct and
consistent. Integration testing is specifically aimed at exposing the problems that arise from the
combination of components.
.
6.2.3 FUNCTIONAL TESTING:
Functional tests provide systematic demonstrations that functions tested are available as
specified by the business and technical requirements, system documentation, and user manuals.
Functional testing is centered on the following items:
59
Organization and preaparation of functional tests is focused on requirements , key functions or
special tests cases. In addition , systematic coverage pertaining to identify business process flows,
data fields, predefined processes and successive processes must be considered for testing. Before
functional testing is complete, additional tests are identified and the effective value of current tests
is determined.
System testing ensures that the entire integrated software system meets requirements.
It tests a configuration to ensure known and predictable results. An example of system testing is
the configuration oriented system integration test. System testing is based on process descriptions
and flows, emphasizing pre-driven process links and integration points.
The Performance test ensures that the output be produced within the time limits,and
the time taken by the system for compiling, giving response to the users and request being send to
the system for to retrieve the results.
User Acceptance Testing is a critical phase of any project and requires significant
participation by the end user. It also ensures that the system meets the functional requirements.
The Acknowledgements will be received by the Sender Node after the Packets are received
by the Destination Node
The Route add operation is done only when there is a Route request in need
The Status of Nodes information is done automatically in the Cache Updating process
60
BLACK BOX TESTING:
Black Box Testing is testing the software without any knowledge of the inner workings,
structure or language of the module being tested. Black box tests as most other kinds of tests must
be written from a definitive source document, such as specification or requirements document, such
as specification or requirements document. It is a testing in which the software under test is treated
as a black box .You cannot see into it. The test provides inputs and responds to outputs without
considering how the software works.
Any project can be divided into units that can be further performed for detailed processing.
Then a testing strategy for each of this unit is carried out. Unit testing helps to identity the possible
bugs in the individual component, so the component that has bugs can be identified and can be
rectified from errors.
61
7. SOURCE CODE:
HTML CODE:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>TEST YOURSELF</title>
<style>
body {
font-family: Arial, sans-serif;
background-color: #f2f2f2;
}
.game {
max-width: 400px;
margin: 50px auto;
background-color: #fff;
padding: 20px;
border-radius: 10px;
box-shadow: 0 0 10px rgba(0, 0, 0, 0.1);
}
.game h2 {
text-align: center;
margin-bottom: 20px;
}
.btn-submit {
width: 6%;
62
padding: 10px;
border: none;
background-color: #007bff;
color: #fff;
border-radius: 5px;
cursor: pointer;
transition: background-color 0.3s;
}
.btn-submit:hover {
background-color: #0056b3;
}
</style>
</head>
<body><center>
<div id="game">
<h1>TEST YOURSELF & GET INTO THE PROJECT</h1>
<div id="question"></div><br/>
<input type="text" id="answerInput"><br/><br/><br/>
<button class="btn-submit" onclick="checkAnswer()">Submit</button>
</div>
<script>
const game = document.getElementById('game');
const questionDiv = document.getElementById('question');
const answerInput = document.getElementById('answerInput');
63
let correctAnswer = "holdout";
function checkAnswer() {
const userAnswer = answerInput.value.trim().toLowerCase();
if (userAnswer === correctAnswer) {
alert("Excellent, You Have Entered Correct Answer!!!!!!");
window.open("http://localhost:8888/notebooks/Music/PROJECTS/PYTHO
N/Food-deliver-time-
prediction/FOOD%20ORDER%20%26%20DELIVERY%20TIME%20PREDICTIO
N.ipynb", "_blank");
} else {
alert("Incorrect answer. Please try again.");
}
}
</script>
</center>
</body>
</html>
64
PYTHON CODE:
v.dropna(inplace = True)
65
# Finding Correlation between attributes
v.corr()
v.corr().plot()
v.corr().plot(kind = 'hist')
v.corr().plot(kind = 'bar')
v.corr().plot(kind = 'barh')
66
# Histogram with Matplotlib & Seaborn on Age vs Output
plt.figure(figsize=(15, 10))
plt.title("Online Food Order Decisions Based on the Age of the Customer")
sns.histplot(x="Age", hue="Output", data=data1)
plt.show()
gender = buying_again_data["Gender"].value_counts()
label = gender.index
counts = gender.values
colors = ['gold','lightgreen']
fig = go.Figure(data=[go.Pie(labels=label, values=counts)])
fig.update_layout(title_text='Who Orders Food Online More: Male Vs.
Female')
fig.update_traces(hoverinfo='label+percent', textinfo='value',
textfont_size=30,| marker=dict(colors=colors, line=dict(color='black', width=3)))
fig.show()
67
marital = buying_again_data["Marital Status"].value_counts()
label = marital.index
counts = marital.values
colors = ['gold','lightgreen','red']
fig = go.Figure(data=[go.Pie(labels=label, values=counts)])
fig.update_layout(title_text='Who Orders Food Online More: Married Vs.
Singles')
fig.update_traces(hoverinfo='label+percent', textinfo='value',
textfont_size=30,
marker=dict(colors=colors, line=dict(color='black', width=3)))
fig.show()
68
# Preprocessing first dataset
data1["Gender"] = data1["Gender"].map({"Male": 1, "Female": 0})
data1["Marital Status"] = data1["Marital Status"].map({"Married": 2,
"Single": 1, "Prefer not to say": 0})
#splitting data
from sklearn.model_selection import train_test_split
x = np.array(data1[["Age", "Gender", "Marital Status", "Occupation",
"Monthly Income", "Educational Qualifications",
"Family size", "Pin code", "Feedback"]])
y1 = np.array(data1[["Output"]])
y = y1.ravel()
69
# training a machine learning model
from sklearn.ensemble import RandomForestClassifier
xtrain, xtest, ytrain, ytest = train_test_split(x, y,
test_size=0.10,
random_state=42)
model = RandomForestClassifier()
model.fit(xtrain, ytrain)
print((model.score(xtest, ytest))*100)
70
# Reading second dataset
data2 = pd.read_csv("deliverytime.txt")
print(data2.head())
71
# Function to calculate the distance between two points using the haversine formula
def distcalculate(lat1, lon1, lat2, lon2):
d_lat = deg_to_rad(lat2-lat1)
d_lon = deg_to_rad(lon2-lon1)
a = np.sin(d_lat/2)**2 + np.cos(deg_to_rad(lat1)) *
np.cos(deg_to_rad(lat2)) * np.sin(d_lon/2)**2
c = 2 * np.arctan2(np.sqrt(a), np.sqrt(1-a))
return R * c
for i in range(len(data2)):
data2.loc[i, 'distance'] = distcalculate(data2.loc[i, 'Restaurant_latitude'],
data2.loc[i, 'Restaurant_longitude'],
data2.loc[i, 'Delivery_location_latitude'],
data2.loc[i, 'Delivery_location_longitude'])
print(data2.head())
72
figure = px.scatter(data_frame=data2,
x="Delivery_person_Age",
y="Time_taken(min)",
size="Time_taken(min)",
color="distance",
trendline="ols",
title="Relationship Between Time Taken and Age")
figure.show()
figure = px.scatter(data_frame=data2,
x="Delivery_person_Ratings",
y="Time_taken(min)",
size="Time_taken(min)",
color="distance",
trendline="ols",
title="Relationship Between Time Taken and Ratings")
figure.show()
fig = px.box(data2,
x="Type_of_vehicle",
y="Time_taken(min)",
color="Type_of_order")
fig.show()
73
# Preprocessing second dataset
#splitting data
x1 = np.array(data2[["Delivery_person_Age",
"Delivery_person_Ratings",
"distance"]])
y2 = np.array(data2[["Time_taken(min)"]])
xtrain, xtest, ytrain, ytest = train_test_split(x1, y2, test_size=0.10,
random_state=42)
74
# Predictions
print("Enter Customer Details to Predict If the Customer Will Order Again
Food Delivery Time Prediction")
a = int(input("Enter the Age of the Customer: "))
b = int(input("Enter the Gender of the Customer (Male = 1, Female = 0): "))
c = int(input("Marital Status of the Customer (Single = 1, Married = 2, Not
Revealed = 3): "))
d = int(input("Occupation of the Customer (Student = 1, Employee = 2, Self
Employeed = 3, House wife = 4): "))
e = int(input("Monthly Income(No Income = 0, Below Rs.10000 = 1, 10001
to 25000 = 2, 25001 to 50000 = 3, More than 50000 = 4): "))
f = int(input("Educational Qualification (Graduate = 1, Post Graduate = 2,
Ph.D = 3, School = 4, Uneducated = 5): "))
g = int(input("Family Size(No. of. Members): "))
h = int(input("Pin Code: "))
i = int(input("Review of the Last Order (Positive = 1, Negative = 0): "))
features = np.array([[a, b, c, d, e, f, g, h, i]])
a1 = int(input("Age of Delivery Partner: "))
b1 = float(input("Ratings of Previous Deliveries: "))
c1 = int(input("Total Distance: "))
features1 = np.array([[a1, b1, c1]])
print("Finding if the customer will order again: ", model.predict(features))
print("Predicted Delivery Time in Minutes: ", model1.predict(features1))
75
8. RESULT:
76
Data Collection for Product Recommendation:
To improve the performance of the products, I have collected the following data
Because of the CSV format allows users to glance at the file and immediately diagnose the
problems with data, change the delimiter, text qualifier, etc. All this is possible because a
CSV document is plain text and an average user or even a novice can easily understand it
without any learning curve.
77
78
Upload the CSV file in the Jupyter Notebook:
The csv module implements classes to read and write tabular data in CSV format. It allows
programmers to say, “write this data in the format preferred by Excel,” or “read data from this
file which was generated by Excel,” without knowing the precise details of the CSV format
used by Excel.
The CSV import process saves time and prevents errors, by submitting data in a CSV
(comma-separated value) file.
79
Basic Elements of Data Mining:
Extract, transform, and load transaction data onto the data warehouse system
Store and manage the data in a multidimensional database system.
Provide data access to business analysts and information technology professionals.
Analyze the data by application software.
Present the data in a useful format, such as a graph or table.
80
Data Preprocessing:
Data preprocessing is essential before its actual use. Data preprocessing is the concept of changing the
raw data into a clean data set. The dataset is preprocessed in order to check missing values, noisy data,
and other inconsistencies before executing it to the algorithm.
It can improve the accuracy and quality of a dataset, making it more reliable.
It makes data consistent.
Data preprocessing can be done in the following manner on the basis of our needs.
Converting lists into NumpyArray
To apply Apriori machine learning algorithm, first we need to make some changes on
the dataset. Using Transaction encoder we can transform this dataset into a logical
data frame. Each column represents an item and each row represents a record or a
transaction for one purchase.
81
82
83
Data Visualization:
Histogram:
A histogram plot in Python is a graphical representation of the distribution of numerical data, where
data is divided into bins, and the frequency or count of data points falling into each bin is represented
by the height of corresponding bars along a single axis.
The following histogram is used to identify and determine the highest preference of an age which can be
represented in terms of yes or no.
84
Scatter Plot:
A scatter plot in Python is a graphical representation of data points where each point represents the
value of two variables, typically plotted on the x-axis and y-axis, to visualize the relationship or
correlation between them.
The following scatter plot is used to identify and determine the relationship between the age and time
taken based on the distance.
85
Pie Chart:
A pie chart in Python is a circular statistical visualization that represents data as slices of a pie, with
each slice's size proportional to the data it represents.
The following Pie Chart is used to identify and determine the distribution of orders based on their
income.
86
Box Plot:
A box plot in Python is a graphical representation of the distribution of numerical data through
quartiles, highlighting the median, outliers, and range of the data set.
The following Box Plot is used to identify and determine the relationship between the type of vehicle
and time taken based on the type of order.
87
Correlation :
The Pearson correlation coefficient (r) is the most common way of measuring a linear correlation.
It is a number between –1 and 1 that measures the strength and direction of the relationship between
two variables.
A Pearson's correlation is used when the two statistics we want to analyze are both quantitative. This
means we will be comparing quantitative variables to find a linear relationship.
88
89
Clustering:
Prediction in machine learning refers to the process of using a trained model to make estimates or
forecasts about unseen or future data points based on patterns learned from historical data.
This process involves feeding input data into the model and obtaining output predictions or
classifications.
At its core, prediction relies on algorithms that have learned to identify and generalize patterns from
the provided training data.
These patterns could be complex relationships between various features or attributes within the data.
The model then applies these learned patterns to new data instances to make predictions.
The quality of predictions depends on several factors, including the quality and quantity of training
data, the complexity of the model, and the appropriateness of the chosen algorithm for the task at
hand. Evaluation metrics such as accuracy, precision, recall, or F1-score are often used to assess the
performance of prediction models.
In practical applications, prediction plays a crucial role across various domains, including finance,
healthcare, marketing, and more, enabling informed decision-making and facilitating automation of
tasks.
90
9. CONCLUSION:
Thus the Prediction system was successfully implemented. We found that whether user will order
food or not & also found that delivery time using Random Forest Classifier & LSTM was the best as
the accuracy was higher in its caseas compared to the rest of the methods. For working on large
dataset, it was an approach in implementing the algorithm and making it a web-based Prediction
System. This is similar to the algorithm that Zomato uses in its website to Predict the next move of its
business. It was a challenge for me to implement a web-based Prediction system on this scale of huge
data. Prediction systems have become ubiquitous. People use them to find future trends, key concepts,
key elements to develop business, etc., Nearly every online food order platform has prediction system
to predict the future. Sustaining these online platforms is a vibrant research community, with creative
interaction ideas, powerful new algorithms, and careful experiments.
91
10. FUTURE ENCHANCEMENT
As technology continues to evolve and data generation becomes even more prolific, companies
like Swiggy, Zomato, and Uber Eats are constantly seeking ways to leverage this abundance of
data to enhance their services further. While the current predictive models based on historic data
have proven to be valuable, there are several avenues for future enhancements that can
revolutionize the user experience and operational efficiency of these food delivery platforms. One
promising direction for enhancement involves the integration of real-time data streams into the
predictive modeling process. Currently, prediction systems rely predominantly on historical data
to make forecasts about future orders and delivery times. While historical data provides valuable
insights, incorporating real-time data streams can offer a more dynamic and responsive approach
to prediction. By continuously monitoring factors such as traffic conditions, weather patterns,
restaurant availability, and user behavior in real-time, these platforms can adapt their predictions
and recommendations on the fly, leading to more accurate and personalized outcomes. Moreover,
the future enhancement of these prediction systems can focus on implementing advanced
personalization algorithms. While existing models consider factors such as order type, quantity,
and cost, incorporating more granular user preferences and behaviors can significantly improve
the accuracy of predictions. For instance, by analyzing past order history, browsing patterns, and
feedback ratings, these platforms can tailor recommendations to individual users' tastes and
preferences. Personalized recommendations not only enhance the user experience by suggesting
relevant and appealing options but also increase user engagement and loyalty to the platform.
Another area for future enhancement lies in the refinement of delivery time estimation algorithms.
While current systems utilize historical delivery data to estimate delivery times, they often do not
account for real-time factors that can impact delivery efficiency, such as traffic congestion, road
closures, or unexpected delays. By integrating real-time tracking technologies and machine
learning algorithms, these platforms can provide more precise and dynamic estimates of delivery
times, keeping users informed and reducing the likelihood of dissatisfaction due to delays.
Furthermore, the future enhancement of prediction systems could involve the incorporation of
additional contextual information to improve decision-making.
92
For example, integrating geospatial data can enable platforms to optimize delivery routes and
allocate resources more efficiently, reducing delivery times and operational costs. Similarly,
leveraging social media data and sentiment analysis techniques can provide valuable insights into
emerging food trends, allowing platforms to adapt their offerings and marketing strategies
accordingly. In terms of the technology stack, future enhancements may involve exploring newer
machine learning techniques and algorithms beyond the ones currently employed. While Random
Forest Classifier and LSTM have demonstrated effectiveness, emerging techniques such as deep
learning, reinforcement learning, and ensemble methods hold promise for further improving
prediction accuracy and scalability. Additionally, advancements in data visualization tools and
techniques can enhance the interpretability and usability of predictive models, enabling
stakeholders to gain deeper insights and make more informed decisions. The future enhancement
of prediction systems for food delivery platforms represents an exciting frontier in data science
and machine learning. By embracing real-time data streams, personalized algorithms, and
advanced modeling techniques, companies can elevate their predictive capabilities, enhance the
user experience, and stay ahead in the competitive landscape of the e-commerce industry.
93
11. BIBLIOGRAPHY
1. Abhishek Singh1, Adithya R2, Vaishnav Kanade3, Prof. Salma Pathan4. “Online food
ordering system using android smart phone and tablets,” International Research
Journal of Engineering and Technology (IRJET-2018).
2. Zhou He, Guanghua Han, T.C.E. Cheng, Bo Fan, Jichang Dong. “Evolutionary food
quality and location strategies for restaurants in competitive online to-offline food
ordering and delivery markets,” International Journal of Production Economics
(PROECO 7037).
3. Trupthi B, Rakshitha Raj R, J B Akshaya, Srilaxmi C P. “Online Food Ordering
System that has been designed for Fast Food restaurant (Food Industry),” International
Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-
8, Issue-2S3, July 2019.
4. KU. Vaishnavi Chimote, Prof. Sheetal Dhole. “Review Paper on Food Ordering and
Payment System using GPS and Android,” Department of Computer Science
&Engineering DRGIT&R, AMT, India. © 2017 IJESC
5. Mayur Kumar Patel. “Online Food Order System for Restaurants,” Computer
Information Systems, Grand Valley State University, ScholarWorks@GVSU,
December 2015.
6. Anitta Abraham. “A Study on the effectiveness Of Online Food Applications on
Registered Restaurants,” International Journal of Creative Research Thoughts
(IJCRT) ISSN: 2320-2882, Volume 9, Issue 1 January 2021.
7. Prof Upendra More, Prof Ria Patnaik, Reema Shah.” "A Study on Online Food
delivery services during the COVID -19 in Mumbai”, Thakur Global Business School
& Thakur Institute of Management Studies & Research, friend arch’s journal of
archaeology of Egypt, PJAEE, 18 (7) (2021).
8. Dr. Mitali Gupta.” A Study on Impact of Online Food delivery app on Restaurant
Business special reference to Zomato and swiggy,” DAIMSR, International journal of
research and analytical reviews ijrar, VOLUME 6, MARCH 2019.
9. Aunpriya Saxena.” An Analysis of Online Food Ordering Applications in India:
Zomato and Swiggy,” Amity University, ABS, Lucknow, Uttar Pradesh, India.
Volume 9, Special Issue, April 2019, 4th International Conference on Recent Trends
94
in Humanities, Technology, Management & Social Development (RTHTMS 2K19);
KIET School Of Management, Ghaziabad, UP, India
10. Awojide, Simon, I. M. Omogbhemhe, O. S. Awe, and T. S. Babatope, “Towards the
digitalization of Restaurant Business Process for Food Ordering in Nigeria Private
University: The Design Perspective. A Study of Samuel Adegboyega University Edo
State Nigeria,” Int. J. Sci. Res. Publ., vol. 8, no. 5, pp. 46–54, 2018.
11. O. I. Mike and A. Simon, “Towards the Digitalization of Hotel Business in Nigeria:
The Design Perspective,” vol. 8, no. 2, pp. 1175–1178, 2017.
12. Adithya. R., A. Singh, S. Pathan, and V. Kanade, “Online Food Ordering System,”
Int. J. Comput. Appl., vol. 180, no. 6, pp. 22–24, 2017.
13. Varsha Chavan, Priya Jadhav, Snehal Korade, Priyanka Teli, ”Implementing
Customizable Online Food Ordering System Using Web Based Application”,
International Journal of Innovative Science, Engineering Technology(IJISET) 2015.
14. Patel, Mayurkumar, "Online Food Order System for Restaurants" (2015). Technical
Library. Paper 219.
15. php code [online] available at www.w3schools.com
16. mysql code [online] available at www.stackoverflow.com
17. Kirti Bhandge, Tejas Shinde, Dheeraj Ingale, Neeraj Solanki, Reshma Totare,”A
Proposed System for Touchpad Based Food Ordering System Using Android
Application”,ternationalJournalofAdvanced Research in Computer Science
Technology
18. Resham Shinde, Priyanka Thakare, Neha Dhomne, Sushmita Sarkar, ”Design and
Implementation of Digital dining in Restaurants using Android”, International Journal
of Advance Research in Computer Science and Management Studies 2014.
19. Ashutosh Bhargave, Niranjan Jadhav, Apurva Joshi, Prachi Oke, S. R Lahane,“Digital
Ordering System for Restaurant Using Android”, International Journal of Scienti?c
and Research Publications 2013.
20. Khairunnisa K., Ayob J., Mohd. Helmy A. Wahab, M. Erdi Ayob, M. Izwan Ayob,
M. A?f Ayob, ”The Application of Wireless Food Ordering System” MASAUM
Journal of Computing 2009.
21. Noor Azah Samsudin, Shamsul Kamal Ahmad Khalid, Mohd Fikry Akmal Mohd
Kohar, Zulki?i Senin, Mohd Nor Ihkasan,” A customizable wireless food ordering
95
system with real time customer feedback”, IEEE Symposium on WirelessTechnology
and Applications(ISWTA) 2011.
22. Serhat Murat Alagoza, Haluk Hekimoglub,” A study on tam: analysis of customer
attitudes in online food ordering system”, Elsevier Ltd. 2012.
23. Patel Krishna, Patel Palak, Raj Nirali, Patel Lalit,” Automated Food Ordering
System”, International Journal of Engineering Research and Development (IJERD)
2015
24. Dataset_1: https://statso.io/food-delivery-time-prediction-case-study
25. Dataset_2:https://raw.githubusercontent.com/amankharwal/Website-
data/master/onlinefoods.csv
96