Email Spam Detection PPT Github
Email Spam Detection PPT Github
The primary goal of this project is to build a robust email spam detection
classifier that can accurately distinguish between spam and legitimate emails
EXISTING SYSTEM DRAWBACKS
• Email Spam Classifier based on Machine Leaning Techniques had done by using SVM, KNN,
Naive
• Bayes and Decision tree algorithms etc.
• SVM had an average accuracy of 99.6%.
• It had good accuracy when compared to the other algorithms in proposed system.
ABSTRACT:
Nowadays, all the people are communicating official information through
emails. Spam mails are the major issue on the internet. It is easy to send an
email which contains spam message by the spammers. Spam fills our inbox
with several irrelevant emails. Spammers can steal our sensitive information
from our device like files, contact. Even we have the latest technology, it is
challenging to detect spam emails. This paper aims to propose a Term
Frequency Inverse Document Frequency (TFIDF) approach by implementing
the Support Vector Machine algorithm. The results are compared in terms of
the confusion matrix, accuracy, and precision. This approach gives an
accuracy of 99.9% on training data and 98.2% on testing data achieved by
using the Term Frequency Inverse Document Frequency (TFIDF) based Support
Vector Machine(SVM) system.
GOALS:
1.Data Collection: Gather a dataset comprising both spam and
non-spam emails. This dataset will be the foundation for training
and evaluating our machine learning models.
2.Data Preprocessing: Clean and preprocess the email data to
ensure consistency and remove irrelevant information.
3.Model Selection: By exploring various machine learning
algorithms suitable for text classification algorithms such as
Naive Bayes, Support Vector Machines (SVM), Random Forests.
4.Model Training: Train the selected machine learning models
using the preprocessed email dataset.
5.Evaluation Metrics: Assess the performance of our models using a
range of evaluation metrics, including accuracy, precision, recall, F1-
score, and ROC-AUC (Receiver Operating Characteristic - Area Under
Curve). Cross-validation techniques will be employed to ensure
robustness.
6.Testing and Validation: The final step involves testing the email spam classifier
using real-world email samples. This validation process ensures that the classifier is
practical and effective in real-world scenarios.
Future Scope
1)Achieving precise grouping, with zero % (0%) misclassification of Ham SMS as spam
and spam SMS as Ham.
2) The endeavors would be applied to stand phishing SMS that conveys the phishing
assaults and now-days that is more and more matter of concern. The framework we
tend to area unit making are going to be operating simply on windows
Software Requirements
Unsupervised Learning:
• Models themselves find the hidden patterns and insights from the given data.
Machine Learning:
• Machine Learning is an application of Artificial Intelligence (AI) which enables
a program(software) to learn from the experiences and improve itself at a
task without being explicitly programmed.
Python:
• Python is an interactive and object-oriented scripting language.
Data Ethics
• There are many ethical and legal issues that can really take a toll on designing such
models.
• Need to protect the customer data from both intentional and inadvertent disclosure,
also protecting it from misuse.
• An important piece of information a company can miss if the user’s legit email is
marked as spam.
Deployment
• A tool using a browser plugin or API can be built for companies running their own email server
• Can be used in conjunction with existing email service providers as well.
Outcomes