Email Spam Filtering Using Logistic Regression With Artificial Bee Colony
Email Spam Filtering Using Logistic Regression With Artificial Bee Colony
Group Members -
Aman Kumar(2020IMT-007)
Milind Yadav(2020IMT-057)
Sahil Punia(2020IMT-083)
Yashraj Patil(2020IMT-117)
Deependra Yadav(2019IMT-031)
Siddharth Kumar Gautam(2018IMT-099)
Introduction
Emails are used in practically every industry today from business to
education.
Spam email volume is rising quickly day by day, Several machine learning
and deep learning techniques have been used, i.e., Naive Bayes, decision
trees, neural networks, and random forest.
Current spam detection methods typically have low detection
rates and struggle to handle high dimensional data.
In email filtering, content based filtering is most effective. The content based
filtering approach mainly depends on some machine learning algorithms based
on some features to differentiate between ham and spam using legitimate email
techniques.
The complete dataset is divided into training and testing set on which machine
learning algorithms are applied to already separate ham and spam email.
The testing dataset is used to analyze the efficiency of the technique Naive
Bayes is commonly used in spam filtering because of its simplicity, quick
convergence, linear computational complexity, and ease of interpretation.
Logistic Regression minimizes the error associated with the output calculated by
a logistic activation function.
The algorithm starts with a population of solutions (food sources), and the bees explore
the search space by flying to different solutions and evaluating their quality using the
objective function.
The bees communicate with each other by sharing information about the best solutions
found so far, and they use this information to adjust their search behavior.
The ABC algorithm has been shown to be effective at solving a wide range of optimization
problems, including continuous, discrete, and combinatorial optimization problems.
It has also been used in many applications, such as image processing, machine learning,
and engineering design.
The ABC algorithm is a simple and robust optimization algorithm that is easy to implement
and can often find high-quality solutions with relatively few function evaluations.
LR classifier
In Artificial Bee Colony, based on
based on the Logistic Regression classification,
the ABC algorithm is used in the
ABC algorithm training dataset to find the
optimal weight vector required.
In the first step, n random solutions are generated using eq . Then the fitness value
associated with each solution is calculated at the start of the Employed Bees Phase.
For each solution, a nearby neighboring solution is generated using the eq . Then the fitness
of this newly developed solution is calculated, and greedy selection occurs between the
newly generated solution and the existing solution.
After that, for each solution, a selection probability is generated using the eq In the
Onlookers Bees Phase, a random number is generated at every iteration. Then an iteration of
the existing solutions is carried up.
If the probability of solution selection is less than a random number, the given solution is
selected, and a corresponding neighboring solution is generated. Out of which, the solution
which has better fitness will be chosen. Then we break out the loop.
LR classifier based on the ABC algorithm
This process is carried out for each of the Onlooker Bees. The solution with the best fitness
value achieved so far is memoized so that it may not lose in the process.
In Scout Bees Phase, if the trial count of a solution exceeds the limit, a new solution is
generated and hence replaces the older one. This process repeats till the max iterations.
After that the weight vector is applied to the LR model, and an output is calculated using the
weights and bias value in the solution vector.
Understanding
the Model
Model Architecture