0% found this document useful (0 votes)
30 views19 pages

Email Spam Filtering Using Logistic Regression With Artificial Bee Colony

The document describes using an artificial bee colony algorithm with logistic regression for email spam filtering. It discusses how artificial bee colony optimization can help logistic regression handle high dimensional data more efficiently by combining the exploitation and exploration abilities of the artificial bee colony algorithm. The summary then provides details on how the artificial bee colony algorithm is used to find the optimal weight vector for training a logistic regression classifier to differentiate spam and ham emails.

Uploaded by

imt2020007
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views19 pages

Email Spam Filtering Using Logistic Regression With Artificial Bee Colony

The document describes using an artificial bee colony algorithm with logistic regression for email spam filtering. It discusses how artificial bee colony optimization can help logistic regression handle high dimensional data more efficiently by combining the exploitation and exploration abilities of the artificial bee colony algorithm. The summary then provides details on how the artificial bee colony algorithm is used to find the optimal weight vector for training a logistic regression classifier to differentiate spam and ham emails.

Uploaded by

imt2020007
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Email Spam Filtering Using

Logistic Regression With


Artificial Bee Colony

Group Members -
Aman Kumar(2020IMT-007)
Milind Yadav(2020IMT-057)
Sahil Punia(2020IMT-083)
Yashraj Patil(2020IMT-117)
Deependra Yadav(2019IMT-031)
Siddharth Kumar Gautam(2018IMT-099)
Introduction
Emails are used in practically every industry today from business to
education.

Email spam, often known as junk email or unwanted email, is a kind of


email that may be used to hurt any user by wasting their time and
computing resources and stealing critical data.

Spam email volume is rising quickly day by day, Several machine learning
and deep learning techniques have been used, i.e., Naive Bayes, decision
trees, neural networks, and random forest.
Current spam detection methods typically have low detection
rates and struggle to handle high dimensional data.

To deal with this problem we can combine Artificial Bee Colony


Algorithm with Logistic Regression as exploitation and
exploration nature of ABC helps us deal in handling high
dimensional data with more efficiency.
Literature
Review
Email is the electronic way of communication and is categorized as spam and
ham emails.

In email filtering, content based filtering is most effective. The content based
filtering approach mainly depends on some machine learning algorithms based
on some features to differentiate between ham and spam using legitimate email
techniques.

The complete dataset is divided into training and testing set on which machine
learning algorithms are applied to already separate ham and spam email.
The testing dataset is used to analyze the efficiency of the technique Naive
Bayes is commonly used in spam filtering because of its simplicity, quick
convergence, linear computational complexity, and ease of interpretation.

Logistic Regression minimizes the error associated with the output calculated by
a logistic activation function.

It has also been applied to email classification and demonstrated good


performance in spam filtering.
What is
Logistic regression is a statistical
Logistic method used to model the
relationship between a binary
regression response variable and one or more
predictor variables.

? It is widely used in machine learning


and statistical modeling, particularly
in classification problems where the
goal is to predict the probability of an
event occurring.
Logistic regression
The logistic regression model uses a sigmoid function to map the
predictor variables to the binary response variable.
The sigmoid function ensures that the output of the model ies between 0
and 1, which represents the probability of the event occuring.
The model is trained by maximizing the likelihood of the observed data,
and the parameters of the model are estimated using iterative algorithms
such as maximum likelihood estimation or gradient descent.
Logistic regression is a powerful tool for modeling binary data and has
several advantages, including its simplicity, ease of interpretation, and
ability to handle a large number of predictor variables.
What is
Artificial The Artificial Bee Colony (ABC)
algorithm is a nature-inspired

Bee Colony optimization algorithm that was


first proposed in 2005.

? It is based on the behavior of


honey bees in a colony, where
bees work together to find the
best food sources.
Artificial Bee Colony
In the ABC algorithm, the problem to be solved is defined as an objective function that the
bees try to optimize.

The algorithm starts with a population of solutions (food sources), and the bees explore
the search space by flying to different solutions and evaluating their quality using the
objective function.

The bees communicate with each other by sharing information about the best solutions
found so far, and they use this information to adjust their search behavior.

The ABC algorithm has been shown to be effective at solving a wide range of optimization
problems, including continuous, discrete, and combinatorial optimization problems.
It has also been used in many applications, such as image processing, machine learning,
and engineering design.

The ABC algorithm is a simple and robust optimization algorithm that is easy to implement
and can often find high-quality solutions with relatively few function evaluations.
LR classifier
In Artificial Bee Colony, based on
based on the Logistic Regression classification,
the ABC algorithm is used in the
ABC algorithm training dataset to find the
optimal weight vector required.

The food sources needed for the


initial step of the weight vector
are associated with the logistic
regression.
LR classifier based on the ABC algorithm

In the first step, n random solutions are generated using eq . Then the fitness value
associated with each solution is calculated at the start of the Employed Bees Phase.

For each solution, a nearby neighboring solution is generated using the eq . Then the fitness
of this newly developed solution is calculated, and greedy selection occurs between the
newly generated solution and the existing solution.

After that, for each solution, a selection probability is generated using the eq In the
Onlookers Bees Phase, a random number is generated at every iteration. Then an iteration of
the existing solutions is carried up.

If the probability of solution selection is less than a random number, the given solution is
selected, and a corresponding neighboring solution is generated. Out of which, the solution
which has better fitness will be chosen. Then we break out the loop.
LR classifier based on the ABC algorithm

This process is carried out for each of the Onlooker Bees. The solution with the best fitness
value achieved so far is memoized so that it may not lose in the process.

In Scout Bees Phase, if the trial count of a solution exceeds the limit, a new solution is
generated and hence replaces the older one. This process repeats till the max iterations.

After that the weight vector is applied to the LR model, and an output is calculated using the
weights and bias value in the solution vector.
Understanding
the Model
Model Architecture

You might also like