0% found this document useful (0 votes)
39 views

sentiment analysis

The document is a project report for a spam classifier developed by Mr. K. Kiran as part of his Bachelor of Science in Computer Science at Bharathiar University. The project utilizes machine learning algorithms, specifically K-Nearest Neighbors and Decision Trees, to evaluate and classify spam emails using the Spambase dataset, providing a GUI for user interaction. It emphasizes the importance of continuous model evaluation and adaptation to combat evolving spam tactics in email security.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views

sentiment analysis

The document is a project report for a spam classifier developed by Mr. K. Kiran as part of his Bachelor of Science in Computer Science at Bharathiar University. The project utilizes machine learning algorithms, specifically K-Nearest Neighbors and Decision Trees, to evaluate and classify spam emails using the Spambase dataset, providing a GUI for user interaction. It emphasizes the importance of continuous model evaluation and adaptation to combat evolving spam tactics in email security.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 77

SCAS SPAM CLASSIFIER

SPAM CLASSFIER

Project report submitted in partial fulfillment of the requirements for the


award of Bachelor of Science (Computer Science)
of Bharathiar University, Coimbatore-46.

Submitted by

Mr. K. Kiran
(Reg. No. 2122K1649)

Under the guidance of

Ms. B. Suganya, MCA., M.Phil., (Ph.D.)


Assistant Professor
Department of Computer Science

SUGUNA COLLEGE OF ARTS AND SCIENCE


(Affiliated to Bharathiar University)
Nehru Nagar, Kalapatti Road,
Coimbatore-641014, Tamil Nadu.
E.Mail: [email protected]
Website: sugunacas.ac.in
SCAS SPAM CLASSIFIER

March 2024
PROJECT WORK

SPAM CLASSFIER

Bonafide Work Done by

Mr.K.Kiran
Reg. No. 2122K1649

Project report submitted in partial fulfillment of the requirements for the


award of Bachelor of Science (Computer Science)
of Bharathiar University, Coimbatore-46.

Signature of the Guide Signature of the HOD

Submitted for the Viva-voce Examination held on ________________

Internal Examiner External Examiner


SCAS SPAM CLASSIFIER

CERTIFICATE
SCAS SPAM CLASSIFIER

March 2024
SUGUNA COLLEGE OF ARTS AND SCIENCE
(Affiliated to Bharathiar University, Coimbatore)
Nehru Nagar, Kalapatti Road, Civil Aerodrome (PO),
Coimbatore-641014.

CERTIFICATE

This is to certify that the project, entitled “SPAM CLASSFIER” submitted to the Bharathiar

University, in partial fulfillment of the requirements for the award of the Degree of Bachelor

of Science (Computer Sceicne) is a record of original research work done by Mr.K.Kiran

(Reg. No. 2122K1649) during the period of 2021–2024 of his study in the Department of

Computer Science at Suguna College Arts and Science, Coimbatore, under my supervision

and guidance and the project has not formed the basis for the award of any Degree /

Diploma / Associateship / Fellowship or other similar title to any candidate of any University.

Head of the Department Signature of the Guide

Signature of the Principal


SCAS SPAM CLASSIFIER

DECLARATION
SCAS SPAM CLASSIFIER

DECLARATION

I, Mr.K.Kiran, hereby declare that the project entitled “SPAM CLASSFIER” submitted to

the Bharathiar University, Coimbatore in partial fulfillment of the requirements for the award

of the Degree of Bachelor of Science (Computer Science) is a record of original project

work done by me during the period of 2021-2024 under the supervision and guidance of

Ms.B.Suganya, Assistant Professor, Department of Computer Science, Suguna College

of Arts and Science, Coimbatore and it has not formed on the basis for the award of

any Degree/ Diploma/ Associate ship/ Fellowship or other similar title to any candidate of

any University.

Date: Signature of the Candidate

Place: Coimbatore Mr. K. Kiran


(Reg. No. 2122K1649)
SCAS SPAM CLASSIFIER

ACKNOWLEDGEMENT
SCAS SPAM CLASSIFIER

ACKNOWLEDGEMENT

I take great pleasure in acknowledging the noble hearts who lent their helping hands for the
successful completion of my project. I also extend my gratitude to all those who contributed
directly and indirectly to the success of this project.

I am profoundly thankful to Shri. V. Lakshminarayanaswamy, Chairman, Suguna Group of


Institutions, Coimbatore for providing me with the opportunity to undertake and successfully
complete this project work.

Deep thanks are extended to Smt. L. Suguna, President, Suguna Group of Institutions,
Coimbatore who also offered me the chance to undertake and successfully complete this
project work.

Sincere appreciation goes to Dr. Srikanth Kannan, Secretary, Suguna Group of Institutions,
Coimbatore for his full support and cooperation, enabling the success of this project and
granting permission to utilize various facilities.

I express my heartfelt thanks to Dr. V. Sekar, Director, Suguna College of Arts and Science,
Coimbatore for encouraging me throughout the project work.

I take this opportunity to thank Dr. R. Rajkumar, M.Com., MBA., M.Phil., Ph.D.,
Principal Suguna College of Arts and Science, Coimbatore for his unwavering support and
assistance in completing this project.

Wholehearted thanks go to Dr. N. Kamalraj, MCA, M.Phil., Ph.D., Associate Professor &
Head, Department of Computer Science, Suguna College of Arts and Science, Coimbatore
for his full support and assistance in completing this project.
SCAS SPAM CLASSIFIER

I express my deep gratitude and respect to my guide, Ms. B. Suganya, MCA., M.Phil.,
(Ph.D.), Assistant Professor, Department of Computer Science, Suguna College of Arts and
Science, Coimbatore, for her valuable suggestions and timely guidance, which played a
pivotal role in streamlining the successful completion of this project.
SCAS SPAM CLASSIFIER

SYNOPSIS
SCAS SPAM CLASSIFIER

SYNOPSIS
This Python project serves as a comprehensive tool for evaluating spam classification
models, offering insights into the performance of machine learning algorithms in identifying
and distinguishing spam emails from legitimate ones. By leveraging the Spambase dataset,
which encompasses a diverse range of attributes extracted from emails, including textual
content, metadata, and header information, the project provides a robust foundation for
training and testing machine learning models.

Through the utilization of K-Nearest Neighbors (KNN) and Decision Tree algorithms, users
gain access to a comparative analysis of model performance, allowing them to assess the
strengths and weaknesses of each approach in accurately classifying spam.

Moreover, the incorporation of feature scaling techniques enhances the reliability and
generalizability of the models, ensuring consistent performance across varying datasets. With
its intuitive graphical user interface (GUI), which displays essential statistics such as the
number of spam and non-spam messages and presents evaluation results in an organized
manner, this project facilitates seamless interaction and interpretation of results for users
across different expertise levels.

As email security remains a critical concern in today's digital landscape, this project
underscores the significance of machine learning in combating spam and underscores its
potential for fostering a safer and more secure online environment.

Furthermore, the project emphasizes the significance of continuous model evaluation and
refinement in the context of evolving spam tactics and email security threats. By providing
users with a platform to assess model performance using real-world data, the project
encourages ongoing experimentation and optimization of spam classification algorithms.
Additionally, it fosters a deeper understanding of the intricate nuances involved in email
filtering, including the detection of subtle patterns and anomalies indicative of spam
behavior.

Through the transparent presentation of evaluation metrics and confusion matrices, users can
gain insights into the efficacy of different feature sets and algorithmic approaches, paving the
way for informed decision-making in model selection and deployment. Moreover, the project
SCAS SPAM CLASSIFIER

serves as a catalyst for interdisciplinary collaboration, bridging the gap between machine
learning expertise and domain-specific knowledge in cybersecurity.

As email continues to be a primary communication channel for individuals and organizations


worldwide, the project underscores the importance of leveraging advanced technologies to
safeguard against spam and uphold user trust and security in digital communications.
SCAS SPAM CLASSIFIER

TABLE OF CONTENTS
SCAS SPAM CLASSIFIER

TABLE OF CONTENTS
S. No. Description Page No.
ACKNOWLEDGEMENT
SYNOPSIS
CONTENTS

1 INTRODUCTION 1
1.1 OVERVIEW OF THE PROJECT
1.2 SYSTEM SPECIFICATION
1.2.1 HARDWARE CONFIGURATION
1.2.2 SOFTWARE SPECIFICATION

2 SYSTEM STUDY 12 – 16
2.1 EXISTING SYSTEM
2.1.1 DRAWBACKS OF EXISTING SYSTEM
2.2 PROPOSED SYSTEM
2.2.1 FEATURES OF PROPOSED SYSTEM

3 SYSTEM DESIGN AND DEVELOPMENT 17 – 28


3.1 DATA FLOW DIAGRAM
3.2 INPUT DESIGN
3.3 OUTPUT DESIGN
3.4 DATABASE DESIGN
3.5 SYSTEM DEVELOPMENT
3.5.1 DESCRIPTION OF MODULES

4 TESTING AND IMPLEMENTATION 29-32

5 CONCLUSION 33-62

5.1 BIBLIOGRAPHY
SCAS SPAM CLASSIFIER

5.2 APPENDICES
A. DATA FLOW DIAGRAM
B. TABLE STRUCTURE
C. SAMPLE CODING
D. SPAM DATASET
SCAS SPAM CLASSIFIER

INTRODUCTION
SCAS SPAM CLASSIFIER

1. INTRODUCTION
This Python project aims to evaluate the performance of machine learning models in
classifying spam emails using the Spambase dataset. With the prevalence of spam emails
posing a significant threat to users' security and productivity, the project provides a practical
solution by employing two popular classification algorithms: K-Nearest Neighbors (KNN)
and Decision Trees.

The dataset, comprising attributes extracted from emails along with corresponding labels,
undergoes preprocessing steps including feature scaling and splitting into training and testing
sets. Subsequently, both models are trained on the training data and evaluated using key
metrics such as accuracy, precision, recall, and F1-score, as well as confusion matrices for
visual representation of classification results.

To enhance user interaction and interpretation of results, the project features a graphical user
interface (GUI) built using Tkinter, displaying essential statistics like the number of spam and
non-spam messages, and presenting evaluation outcomes in a user-friendly format. Overall,
this project serves as a valuable tool for assessing the efficacy of machine learning
approaches in spam detection, contributing to the ongoing efforts in enhancing email security
and user experience.

In addition to evaluating the performance of machine learning models, this project offers
insights into the broader landscape of email security and the challenges posed by spam. By
delving into the intricacies of spam classification, users gain a deeper understanding of the
various techniques and strategies employed by malicious actors to bypass email filters and
deceive recipients.

Moreover, the project highlights the importance of continuous monitoring and adaptation in
the face of evolving spam tactics, underscoring the need for robust and adaptive spam
filtering mechanisms. Furthermore, by providing a platform for experimentation and

1
SCAS SPAM CLASSIFIER

comparison of different classification algorithms, the project encourages innovation and


exploration in the field of email security. It serves as a springboard for further research and
development efforts aimed at enhancing the effectiveness and efficiency of spam detection
systems.

Ultimately, this project not only addresses the immediate need for spam classification but
also contributes to the advancement of email security technologies, fostering a safer and more
secure online environment for users worldwide.

Additionally, this project can be extended to explore more advanced machine learning
techniques and feature engineering strategies for improving spam classification accuracy.
Advanced algorithms such as Support Vector Machines (SVM), Random Forests, or Gradient
Boosting could be incorporated and compared against the KNN and Decision Tree models to
assess their performance.

Moreover, feature selection methods such as recursive feature elimination or principal


component analysis could be implemented to identify the most informative features for spam
classification, thereby enhancing model efficiency and interpretability. Furthermore, the
project could be augmented with a feedback mechanism where users can manually label
misclassified emails, allowing the models to adapt and improve over time through iterative
training.

Additionally, the GUI could be enhanced with visualization tools to provide intuitive insights
into the distribution of features and the decision boundaries of the classification models. By
integrating these additional components, the project can offer a more comprehensive and
interactive platform for exploring and advancing spam classification techniques, ultimately
contributing to the ongoing efforts in combating email spam and enhancing cybersecurity.

In today's digital age, where communication is predominantly facilitated through electronic


platforms, the omnipresence of spam has become an inevitable nuisance. Imagine waking up
to an inbox flooded with unsolicited emails promoting dubious products, or scrolling through
endless streams of fake social media accounts peddling counterfeit goods. Spam, in its
myriad forms, has permeated every corner of the internet, posing significant challenges to
individuals, businesses, and cybersecurity professionals alike.

2
SCAS SPAM CLASSIFIER

To comprehend the essence of spam filtering, it's essential to define the term "spam" within
the realm of electronic communication. Spam encompasses a wide array of unwanted
messages, ranging from email solicitations and phishing attempts to fraudulent
advertisements and malicious links. The proliferation of spam can be attributed to the advent
of digital communication channels, which have provided spammers with unprecedented
avenues to disseminate their messages indiscriminately.

Moreover, the financial incentives driving spam operations have fueled its exponential
growth, making it a lucrative enterprise for cybercriminals seeking to exploit unsuspecting
individuals and organizations. Given the pervasive nature of spam and its detrimental
consequences, the implementation of robust spam filtering mechanisms has emerged as a
critical necessity. Spam filtering serves as a formidable barrier against unwanted messages,
employing a diverse range of techniques to identify and intercept spam before it reaches its
intended recipients.

By leveraging advanced algorithms, heuristic analysis, and machine learning models, spam
filters can distinguish between legitimate communications and unsolicited content, thereby
safeguarding users from potential threats and preserving the integrity of communication
platforms. Central to the efficacy of spam filtering is the rigorous evaluation of its techniques
and methodologies. Various metrics, such as accuracy, precision, recall, and F1-score, are
employed to assess the performance of spam filtering algorithms.

Machine learning, in particular, has revolutionized the landscape of spam filtering, offering
powerful tools for pattern recognition and classification. Through supervised learning
algorithms like K-Nearest Neighbors (KNN) and Decision Trees, spam filters can analyze
vast datasets, identify underlying patterns, and make informed decisions regarding the
classification of incoming messages. To illustrate the practical application of spam filtering
techniques, we delve into a comprehensive case study focusing on the evaluation of two
prominent models: K-Nearest Neighbors (KNN) and Decision Trees. Leveraging a dataset
sourced from the UCI Machine Learning Repository, we embark on a journey to train, test,
and evaluate these models using real-world spam data.

By employing industry-standard evaluation metrics and methodologies, we aim to assess the


performance of each model in terms of accuracy, precision, recall, and overall effectiveness

3
SCAS SPAM CLASSIFIER

in mitigating the onslaught of spam messages. As we conclude our exploration of spam


filtering, it becomes evident that the battle against spam is an ongoing endeavor fraught with
challenges and complexities. However, through the relentless pursuit of innovation, research,
and collaboration, we can fortify our defenses against spam and pave the way for a safer,
more secure digital ecosystem.

By harnessing the power of machine learning, data analytics, and interdisciplinary


cooperation, we can empower individuals, businesses, and internet users worldwide to
combat the scourge of spam and preserve the integrity of online communication channels. As
we embark on this collective journey, let us remain vigilant, adaptive, and committed to the
pursuit of a spam-free future.

4
SCAS SPAM CLASSIFIER

1.1 OVERVIEW OF THE PROJECT

This Python project serves as a robust spam filter evaluation tool, leveraging machine
learning algorithms to classify emails as spam or non-spam. It begins by loading and
preprocessing the Spambase dataset, encompassing various attributes extracted from emails.
Splitting the data into training and testing sets, it applies feature scaling to ensure uniformity
in feature magnitudes. Two classification models, K-Nearest Neighbors (KNN) and Decision
Tree, are trained on the preprocessed data and evaluated using standard classification metrics
such as accuracy, precision, recall, and F1-score, along with confusion matrices to visualize
classification performance.

The project also includes a graphical user interface (GUI) built using Tkinter, providing
users with an intuitive platform to view the number of spam and non-spam messages in the
dataset and examine the evaluation results. Overall, this project facilitates comprehensive
assessment and comparison of machine learning models for spam classification, contributing
to advancements in email security and filtering techniques.

In addition to facilitating spam filter evaluation through machine learning algorithms, this
project serves as a versatile tool for analyzing email datasets and refining classification
techniques. By harnessing the power of K-Nearest Neighbors (KNN) and Decision Trees,
users gain valuable insights into the effectiveness of different approaches in discerning spam
from legitimate emails.

Furthermore, the project fosters exploration and experimentation by providing a


customizable interface for adjusting parameters and exploring alternative models. Its
integration with Tkinter not only enhances user interaction but also allows for seamless
integration with other Python libraries and tools for further analysis and visualization.
Moreover, the project's emphasis on evaluation metrics and visualizations empowers users to
make informed decisions regarding model selection and optimization strategies.

Overall, this project not only addresses the immediate need for spam classification but also
serves as a foundation for ongoing research and development in email security and machine
learning applications.

5
SCAS SPAM CLASSIFIER

In this project, we embark on a comprehensive exploration of spam filtering techniques using


machine learning algorithms, focusing on the evaluation of K-Nearest Neighbors (KNN) and
Decision Tree models. With the ubiquity of electronic communication channels, the
proliferation of spam has become an inevitable nuisance, posing significant challenges to
individuals, businesses, and the cybersecurity landscape at large. To address this pressing
issue, we leverage the power of machine learning to develop and evaluate effective spam
filtering solutions.

The foundation of our project lies in the analysis and preprocessing of the Spambase dataset
sourced from the UCI Machine Learning Repository. This dataset comprises a collection of
attributes extracted from email messages, along with corresponding labels indicating whether
the message is spam or not. We meticulously preprocess the dataset, splitting it into features
(X) and labels (y), and further divide it into training and testing sets using the
`train_test_split` function from the scikit-learn library. Moreover, we apply feature scaling
using the `StandardScaler` to ensure uniformity and optimal performance of our models.

With the dataset prepared, we proceed to train two distinct machine learning models: K-
Nearest Neighbors (KNN) and Decision Trees. The KNN model, a versatile and intuitive
algorithm, operates by classifying data points based on the majority class among their nearest
neighbors. Conversely, the Decision Tree model employs a hierarchical structure of decision
nodes to recursively partition the feature space, ultimately assigning class labels to data
points based on the terminal nodes' majority vote. By fitting these models to the training data,
we enable them to learn underlying patterns and relationships, thereby facilitating the
classification of incoming messages as spam or non-spam.

Following model training, we embark on the crucial phase of model evaluation to assess their
performance and efficacy in classifying spam messages. Leveraging industry-standard
evaluation metrics, including accuracy, precision, recall, and F1-score, we meticulously
evaluate the performance of both KNN and Decision Tree models on the testing dataset.

6
SCAS SPAM CLASSIFIER

Additionally, we compute the confusion matrix for each model, providing a detailed
breakdown of true positive, true negative, false positive, and false negative predictions.
Through rigorous evaluation, we gain insights into the strengths and limitations of each
model, thereby informing our subsequent analyses and recommendations.

With our models trained and evaluated, we transition to the implementation phase, wherein
we develop an interactive graphical user interface (GUI) using the Tkinter library in Python.
The GUI provides users with an intuitive platform to enter custom messages for classification
and view the corresponding predictions and evaluation metrics. Upon entering a message and
clicking the "Classify" button, the GUI triggers the preprocessing and classification of the
message using both the KNN and Decision Tree models. Subsequently, the classification
results, including predicted labels and evaluation metrics, are displayed to the user via
message boxes, enabling them to assess the models' performance in real-time.

Throughout the project, we emphasize the importance of continuous refinement and


optimization of spam filtering techniques to adapt to evolving spamming tactics and maintain
robust cybersecurity defenses. By leveraging machine learning algorithms and incorporating
user-friendly interfaces, we empower individuals and organizations to combat the scourge of
spam effectively, thereby preserving the integrity and security of electronic communication
channels. As we navigate the dynamic landscape of cybersecurity, our project serves as a
testament to the power of innovation and collaboration in mitigating digital threats and
fostering a safer online environment for all.

7
SCAS SPAM CLASSIFIER

1.2 SYSTEM SPECIFICATION

The system specifications for the spam filter evaluation tool developed using Python and
Tkinter encompass compatibility with major operating systems such as Windows, macOS,
and Linux, ensuring broad accessibility. The project relies on Python 3.x as the programming
language, with dependencies including tkinter for building the graphical user interface and
scikit-learn for machine learning functionalities like model training and evaluation.

Additionally, pandas and numpy are utilized for data manipulation and handling, while an
internet connection is necessary during execution to fetch the Spambase dataset from the
provided URL. By adhering to these system specifications, users can seamlessly execute and
interact with the spam filter evaluation tool, contributing to email security and machine
learning exploration.

Furthermore, the project's system specifications prioritize simplicity and accessibility,


allowing users to easily install and run the spam filter evaluation tool on their preferred
platform without complex hardware or software requirements. This approach fosters
inclusivity, enabling a diverse range of users, including those with limited technical expertise
or resources, to benefit from the tool's functionality.

Additionally, by leveraging widely-used libraries such as scikit-learn and tkinter, the project
ensures compatibility with existing Python environments and minimizes the need for
additional setup or configuration. This user-centric design philosophy underscores the
project's commitment to democratizing access to email security tools and empowering users
to take proactive measures against spam threats in a straightforward and intuitive manner.

Moreover, the project's system specifications prioritize scalability and extensibility, allowing
for future enhancements and customizations to meet evolving user needs and technological
advancements. By adhering to industry-standard practices and leveraging open-source
technologies, the project fosters a collaborative environment where contributions from the
community can drive innovation and improvement.

8
SCAS SPAM CLASSIFIER

1.2.1 HARDWARE CONFIGURATION


The hardware configuration for running the spam filter evaluation tool developed using
Python and Tkinter is relatively modest, ensuring compatibility with a wide range of systems.
Here's a basic hardware configuration:

 Processor : A modern processor with at least a dual-core architecture is recommended for


smooth execution of the application. While specific processor models are not mandated,
processors from Intel Core i3 or AMD Ryzen series, or equivalent, are sufficient.

 RAM : A minimum of 4GB of RAM is recommended to handle dataset manipulation,


model training, and GUI rendering effectively. However, for improved performance,
especially with larger datasets or complex models, 8GB or more of RAM is preferable.

 Storage : Adequate storage space is necessary to accommodate the Python environment,


libraries, dataset, and any generated intermediate files. While there are no strict
requirements, a standard HDD (Hard Disk Drive) or SSD (Solid State Drive) with at least
100GB of available space is sufficient for most use cases.

 Display : A monitor with a minimum resolution of 1024x768 pixels is recommended to


ensure proper rendering of the graphical user interface (GUI). Higher resolutions can provide
a better user experience, especially when working with detailed visualizations or larger
datasets.

 Input Devices : Standard input devices such as a keyboard and mouse (or touchpad) are
required for interacting with the application. Additionally, a pointing device (e.g., mouse)
facilitates precise navigation and selection within the GUI.

Overall, the spam filter evaluation tool is designed to be lightweight and resource-efficient,
making it accessible to users with a wide range of hardware configurations. However, users
may experience improved performance with higher-end hardware specifications, particularly
when working with large datasets or complex machine learning models.

9
SCAS SPAM CLASSIFIER

1.2.2 SOFTWARE CONFIGURATION

• Operating System: Windows 10 Home or Windows 10 Pro (64-bit)

• Python Version: Python 3.7 or later

• Integrated Development Environment (IDE): Any Python-compatible IDE can be used for
development and execution of the script. Popular choices include Visual Studio Code,
PyCharm, Jupyter Notebook, and Spyder.

Python Libraries:

 scikit-learn: Machine learning library for model training and evaluation


 accuracy_score: Computes the accuracy of a classification model, which is the
fraction of correctly classified samples.
 precision_score: Calculates the precision of a classification model, which is the ratio
of true positive predictions to the total number of positive predictions.
 recall_score: Computes the recall of a classification model, which is the ratio of true
positive predictions to the total number of actual positive instances.
 f1_score: Calculates the F1 score of a classification model, which is the harmonic
mean of precision and recall. It provides a balance between precision and recall.
 confusion_matrix: Generates a confusion matrix, which is a table that summarizes the
performance of a classification model by comparing actual and predicted class labels.
 train_test_split: This function splits a dataset into two subsets: a training set and a
testing set.
 StandardScaler: This class standardizes features by removing the mean and scaling
them to unit variance.
 KNeighborsClassifier: This class implements the k-nearest neighbors algorithm for
classification.
 DecisionTreeClassifier: This class implements a decision tree classifier, which is a
non-parametric supervised learning method used for classification.
 pandas: Data manipulation and analysis library
 tkinter: GUI toolkit for building the graphical user interface

ScrolledText - This module provides a widget called ScrolledText, which is a text widget that
automatically adds scrollbars when the text content exceeds the visible area.

10
SCAS SPAM CLASSIFIER

Web Browser: Google Chrome, Mozilla Firefox, or Microsoft Edge for accessing online
resources and documentation.

Video Conferencing Software: Zoom, Microsoft Teams, or Skype for online meetings and
collaboration.

11
SCAS SPAM CLASSIFIER

SYSTEM STUDY

12
SCAS SPAM CLASSIFIER

2. SYSTEM STUDY
The system study section in a project content provides an overview of the existing system, its
limitations, and the need for the proposed solution. . Here's a simplified system study for the
spam classification system:

2.1 EXISTING SYSTEM


Traditional spam classification methods encompass a variety of techniques aimed at
identifying and filtering out unwanted or unsolicited emails. One prevalent approach involves
rule-based systems, where predefined criteria and rules are utilized to flag emails as spam.
These rules may include identifying specific keywords or phrases commonly associated with
spam, such as "free" or "discount," as well as analyzing email headers for suspicious patterns.

Additionally, heuristic methods are employed, which leverage statistical or machine learning
techniques to assess email content and attributes. These methods often involve analyzing
word frequency, header information, and content characteristics to determine the likelihood
of an email being spam. Bayesian filtering, a statistical approach, calculates the probability of
an email being spam based on observed word occurrences and is trained on labeled email
datasets. Furthermore, the use of whitelists and blacklists helps classify emails based on
trusted or known spamming sources.

While traditional spam classification methods have been effective to a certain extent, they
may struggle to adapt to new spamming techniques and may result in higher false positive
rates compared to more advanced machine learning-based approaches.

2.1.1 DRAWBACKS OF EXISTING SYSTEM

Limited Adaptability: Traditional systems often rely on predefined rules or heuristics, making
them less adaptable to evolving spamming techniques. As spammers continually refine their
tactics, these systems may struggle to keep up with new spamming methods.

13
SCAS SPAM CLASSIFIER

High False Positive Rates: Rule-based systems and heuristic methods may inadvertently flag
legitimate emails as spam, leading to false positives. This can result in important messages
being incorrectly filtered out, potentially causing users to miss critical information.

Limited Feature Extraction: Traditional methods may have limited capabilities in extracting
and leveraging complex features from email content. They often focus on basic features such
as keyword frequency or header analysis, which may not capture the nuances of modern
spam emails.

Scalability Issues: As the volume of email traffic increases, traditional spam classification
systems may face scalability challenges. Processing large volumes of emails in real-time can
strain system resources and impact performance.

Dependency on Manual Updates: Rule-based systems and whitelists/blacklists require


manual updates to stay effective. Maintaining and updating these systems can be time-
consuming and resource-intensive, especially in dynamic email environments.

Susceptibility to Evasion Techniques: Spammers continuously devise new evasion techniques


to bypass traditional spam filters. This includes techniques like obfuscating text, using image-
based spam, and employing social engineering tactics, which traditional systems may
struggle to detect.

Lack of Personalization: Traditional systems often apply uniform filtering rules to all users,
regardless of individual preferences or behaviors. This one-size-fits-all approach may not
adequately address the unique spam filtering needs of different users or organizations.

14
SCAS SPAM CLASSIFIER

2.2 PROPOSED SYSTEM:


The proposed system aims to enhance spam classification using machine learning techniques.
Unlike traditional rule-based systems, which often struggle to adapt to evolving spamming
tactics, the proposed system leverages supervised machine learning algorithms to
automatically learn and classify spam emails based on their content and characteristics.

The system begins by loading and preprocessing a dataset containing attributes extracted
from spam and non-spam emails. These attributes include features such as word frequency,
character frequency, and other relevant characteristics of the email content. The dataset is
then split into training and testing sets to train and evaluate the machine learning models.

Two popular classification algorithms, K-Nearest Neighbors (KNN) and Decision Trees, are
utilized in the proposed system. These models are trained on the training data after applying
feature scaling to ensure consistent and accurate predictions. Following training, the models
are evaluated using various performance metrics such as accuracy, precision, recall, F1-score,
and confusion matrix.

The graphical user interface (GUI) built using Tkinter provides a user-friendly interface for
users to interact with the system. It displays essential information such as the number of spam
and non-spam messages in the dataset and the evaluation results of the machine learning
models. Users can easily interpret the classification performance and gain insights into the
effectiveness of the spam filter.

Overall, the proposed system offers a robust and adaptive approach to spam classification,
leveraging the power of machine learning to effectively identify and filter out spam emails
while minimizing false positives and improving overall email security.

In addition to the core functionality described, the proposed system can be further enhanced
with various features to improve its performance and usability. Firstly, feature engineering

15
SCAS SPAM CLASSIFIER

techniques can be explored to extract additional information from email metadata, such as
sender addresses, header details, and timestamps. This enriched feature set can enhance the
accuracy of spam classification. Moreover, the system can benefit from experimenting with
different classification algorithms beyond K-Nearest Neighbors and Decision Trees, such as
Support Vector Machines or Random Forests, along with fine-tuning their hyperparameters to
optimize performance.

Ensemble methods like bagging and boosting can also be implemented to combine multiple
classifiers and improve classification accuracy. Additionally, incorporating k-fold cross-
validation ensures robustness in evaluating model performance. Real-time classification
capabilities can be added to the system to automatically filter incoming emails, integrating
seamlessly with email clients or servers. Moreover, introducing user feedback mechanisms
allows users to provide input on misclassified emails, aiding in model refinement over time.

Integration with popular email services or clients, advanced data visualization in the GUI for
insights into email trends and model performance, and optimization for scalability and
performance are crucial aspects to consider. Furthermore, security measures must be
implemented to protect user data and ensure compliance with privacy regulations. By
incorporating these features, the proposed system can offer a comprehensive solution for
spam classification, meeting the evolving needs of users while providing robust protection
against spam emails.

16
SCAS SPAM CLASSIFIER

2.2.1 FEATURES OF PROPOSED SYSTEM:

Advanced Feature Engineering: Utilizing feature engineering techniques to extract valuable


information from email metadata, including sender details, header information, and
timestamps. These additional features can improve the accuracy of spam classification..

Multiple Classification Algorithms: The system integrates various machine learning


algorithms beyond K-Nearest Neighbors and Decision Trees, such as Support Vector
Machines, Random Forests, and Naive Bayes. This allows for experimentation and selection
of the most suitable algorithm based on performance metrics.

Ensemble Methods: Implementing ensemble methods like bagging and boosting to combine
the predictions of multiple classifiers, further enhancing classification accuracy and
robustness.

Cross-Validation: Employing k-fold cross-validation to evaluate model performance and


ensure generalizability across different subsets of the dataset.

Real-Time Classification: Providing real-time spam classification capabilities to


automatically filter incoming emails. This feature seamlessly integrates with email clients or
servers, offering users immediate protection against spam.

User Feedback Mechanisms: Allowing users to provide feedback on misclassified emails,


facilitating model refinement and improvement over time. This iterative process enhances the
system's effectiveness in identifying spam.

17
SCAS SPAM CLASSIFIER

Integration with Email Services: Integrating with popular email services or clients to
streamline the user experience and ensure seamless operation within existing email
workflows.

Data Visualization: Incorporating advanced data visualization techniques in the graphical


user interface (GUI) to provide users with insights into email trends, model performance, and
classification results.

Scalability and Performance Optimization: Optimizing the system for scalability and
performance to handle large volumes of emails efficiently while maintaining high
classification accuracy.

Security Measures: Implementing robust security measures to protect user data and ensure
compliance with privacy regulations, safeguarding sensitive information from unauthorized
access or misuse.

18
SCAS SPAM CLASSIFIER

SYSTEM DESIGN AND DEVELOPMENT

19
SCAS SPAM CLASSIFIER

3. SYSTEM DESIGN AND DEVELOPMENT

The system design and development process for the spam filter project entails several critical
phases to ensure the creation of a robust and efficient solution. Initially, the project
requirements are thoroughly analyzed to understand the desired features, performance
metrics, and user expectations. Following this, relevant datasets containing email samples are
acquired and preprocessed to handle missing values, normalize features, and encode
categorical variables.

Subsequently, various machine learning algorithms suitable for text classification tasks, such
as K-Nearest Neighbors (KNN) and Decision Trees, are evaluated and the most appropriate
models are selected based on their performance on training and validation datasets. These
selected models are then trained using the preprocessed data, with hyperparameters tuned to
optimize performance. Evaluation of the trained models is conducted using performance
metrics like accuracy, precision, recall, and F1-score, along with confusion matrices to
visualize their classification performance.

A graphical user interface (GUI) is designed using Tkinter to provide a user-friendly


platform for interacting with the spam filter application. This GUI integrates the trained
models, allowing users to input email data and receive real-time classification results.
Extensive testing ensures the system functions as intended, following which the application is
deployed for user access. Regular maintenance and updates are essential post-deployment to
address any issues and incorporate improvements based on user feedback and evolving spam
detection techniques. Through these systematic steps, the spam filter application is designed
and developed to effectively classify emails and meet user requirements.

In addition to the core system design and development process, several other aspects
contribute to the successful implementation of the spam filter project. One crucial component
is the selection and preprocessing of the dataset, which involves cleaning the data, handling
outliers, and ensuring data integrity to prevent biases in the model. Moreover, feature

20
SCAS SPAM CLASSIFIER

selection and engineering play a vital role in improving model performance by identifying the
most relevant features and transforming them to better represent the underlying patterns in
the data. Furthermore, the choice of evaluation metrics is critical to accurately assess the
performance of the trained models and compare them against each other.

Alongside model evaluation, techniques such as cross-validation help ensure the reliability
of the results by testing the models on multiple subsets of the data. Additionally, the
scalability and efficiency of the system are important considerations, especially when dealing
with large volumes of email data. Implementing optimization techniques and leveraging
parallel processing capabilities can help enhance the system's speed and scalability.

Lastly, robust error handling and logging mechanisms are essential to identify and address
any issues that may arise during system operation, ensuring the stability and reliability of the
spam filter application. By incorporating these elements into the system design and
development process, the spam filter project can deliver a reliable, efficient, and user-friendly
solution for classifying spam and non-spam emails.

3.1 DATA FLOW DIAGRAM

21
SCAS SPAM CLASSIFIER

22
SCAS SPAM CLASSIFIER

3.2 INPUT DESIGN

The input design for the spam filter project encompasses several crucial steps aimed at
facilitating efficient data processing and model training. Initially, the selection of an
appropriate dataset is paramount, ensuring it contains email attributes alongside
corresponding labels denoting their classification as spam or non-spam. For this project, the
dataset of choice is sourced from the UCI Machine Learning Repository, specifically the
Spambase dataset. Following dataset selection, thorough data preprocessing tasks are
undertaken, encompassing handling missing values, encoding categorical variables (if
applicable), and scaling numerical features to ensure the data is suitably formatted for model
training.

Additionally, feature selection is conducted to identify pertinent features that contribute to


email classification, enhancing model efficiency and accuracy. Subsequently, the dataset is
split into training and testing sets using scikit-learn's `train_test_split` function, enabling the
model to be trained on a portion of the data and evaluated on unseen samples to gauge its
performance. Feature scaling techniques like StandardScaler are then applied to standardize
feature ranges, particularly beneficial for algorithms like K-Nearest Neighbors (KNN)
sensitive to feature scaling.

Following this, machine learning models, including KNN and Decision Tree, are trained on
the data, learning patterns from input features to classify emails as spam or non-spam.
Evaluation of these models is crucial, employing metrics such as accuracy, precision, recall,
and F1-score to assess their effectiveness in generalizing to unseen data and accurately
classifying emails. Finally, a user interface is developed using Tkinter, allowing users to
interact with the spam filter system, visualize input features, model predictions, and
evaluation results in real-time.

Through meticulous attention to these input design steps, the spam filter project can
efficiently process input data, train models effectively, and provide valuable insights into

23
SCAS SPAM CLASSIFIER

email classification. In addition to the core input design steps outlined above, several other
aspects contribute to the overall effectiveness and robustness of the spam filter project. These
include:

Data Exploration and Analysis: Before proceeding with input design, it's essential to conduct
exploratory data analysis (EDA) to gain insights into the dataset's characteristics. This
involves visualizing feature distributions, identifying correlations between features, and
understanding the imbalance between spam and non-spam classes. EDA helps in making
informed decisions during preprocessing and feature selection.

Handling Imbalanced Data: Imbalanced datasets, where one class (e.g., spam) significantly
outweighs the other, are common in spam classification tasks. Techniques such as
oversampling, undersampling, or using algorithms like Synthetic Minority Over-sampling
Technique (SMOTE) can address class imbalances, ensuring the model learns from both
classes effectively.

Feature Engineering: Feature engineering involves creating new features from existing ones
or transforming features to improve model performance. For email classification, potential
features could include word frequencies, presence of specific keywords, email length, sender
information, and more. Effective feature engineering enhances the model's ability to capture
relevant information for classification.

Hyperparameter Tuning: Model performance heavily depends on hyperparameters, which


control aspects like model complexity, regularization, and tree depth. Conducting
hyperparameter tuning using techniques like grid search or randomized search helps optimize
model performance by finding the best combination of hyperparameters.

Model Selection: While KNN and Decision Tree models are used in this project, exploring
other algorithms such as Random Forests, Support Vector Machines (SVM), or neural

24
SCAS SPAM CLASSIFIER

networks could potentially yield better performance. Comparing multiple models and
selecting the most suitable one based on performance metrics is crucial for achieving high
accuracy in spam classification.

Cross-Validation: Evaluating model performance using cross-validation techniques such as k-


fold cross-validation ensures that the model's performance estimates are robust and not overly
influenced by the particular training-test split. Cross-validation provides a more accurate
estimate of how well the model generalizes to unseen data.

Error Analysis: Understanding the types of errors made by the model (e.g., false positives,
false negatives) through error analysis helps in identifying areas for improvement. Analyzing
misclassified examples can provide valuable insights into the limitations of the model and
potential avenues for further refinement.

By incorporating these additional considerations into the input design process, the spam filter
project can enhance its effectiveness, reliability, and adaptability, ultimately leading to more
accurate email classification and improved user experience.

25
SCAS SPAM CLASSIFIER

3.3 OUTPUT DESIGN

Window Title and Layout: The Tkinter window is designed with an intuitive layout to
enhance user experience. The title "Spam Filter Evaluation" is prominently displayed at the
top of the window, indicating the purpose of the application. Below the title, the layout
includes labels and a text box for displaying the evaluation results.

Labels for Dataset Information: Two labels are included to provide essential information
about the dataset being evaluated. The "Number of Spam Messages" label displays the total
count of spam messages in the dataset, while the "Number of Non-Spam Messages" label
shows the count of non-spam messages. These labels help users understand the composition
of the dataset and its distribution between spam and non-spam categories.

Text Box for Evaluation Results: A scrolled text widget is incorporated to present the
evaluation results in a structured format. The text box dynamically updates with the
evaluation metrics for each classification model (KNN and Decision Tree). The results
include:

Accuracy: The percentage of correctly classified instances out of the total instances.

Precision: The ratio of true positive instances to the sum of true positive and false positive
instances, indicating the model's ability to correctly classify positive instances.

Recall: The ratio of true positive instances to the sum of true positive and false negative
instances, representing the model's ability to identify all relevant instances.

F1-score: The harmonic mean of precision and recall, providing a balance between the two
metrics.

Confusion Matrix: A table displaying the counts of true positive, true negative, false positive,
and false negative instances, facilitating a deeper understanding of the model's performance.

26
SCAS SPAM CLASSIFIER

3.4 DATABASE DESIGN

For the spam filter evaluation project, the focus is primarily on evaluating machine learning
models rather than database management. However, a database design could still be relevant
if you plan to incorporate features like data logging, user authentication, or storing evaluation
results for future reference. Here's a basic outline of a database design tailored to support
such functionalities:

1. User Authentication (Optional):

- User Table: This table stores information about registered users, including their username,
password (hashed for security), email address, and any other relevant user details.

2. Data Logging (Optional):

- Log Table: If you want to log information about model evaluations or user interactions
with the application, you can create a log table. It may include fields like timestamp, user ID
(if applicable), action performed, and any additional metadata.

3. Storage of Evaluation Results:

- Evaluation Results Table: This table stores the evaluation results obtained from running
machine learning models on different datasets. It may include fields like dataset name, model
name, evaluation metrics (accuracy, precision, recall, F1-score), confusion matrix, timestamp,
and any other relevant information.

4. Dataset Management (Optional):

- Dataset Table: If you plan to manage multiple datasets within the application, you can
create a dataset table to store information about each dataset. Fields may include dataset
name, description, source URL, upload date, and any other metadata.

27
SCAS SPAM CLASSIFIER

5. Model Management (Optional):

- Model Table: Similarly, if you want to manage multiple machine learning models, you
can create a model table. Fields may include model name, description, algorithm used,
hyperparameters, training duration, and any other relevant information.

6. Relationships:

- If necessary, establish relationships between tables using foreign keys to maintain data
integrity. For example, the Evaluation Results Table may have foreign keys referencing the
User Table (if user-specific evaluations are logged) and the Dataset Table (to associate
evaluation results with specific datasets).

7. Indexes and Constraints:

- Define indexes on frequently queried fields for faster retrieval of data.

- Implement constraints (e.g., unique constraints, foreign key constraints) to enforce data
integrity and consistency.

8. Data Backup and Recovery:

- Implement mechanisms for regular data backup to prevent data loss in case of system
failures or accidental deletions.

It's essential to tailor the database design to the specific requirements and functionalities of
your project. Consider factors such as data volume, access patterns, and security
considerations when designing the database schema. Additionally, ensure compliance with
relevant privacy regulations when handling sensitive user data.

28
SCAS SPAM CLASSIFIER

3.5 SYSTEM DEVELOPMENT

System development for the spam filter evaluation project encompasses several key phases
aimed at creating a robust and effective solution. Initially, the process begins with a thorough
requirement analysis to understand the project's scope and user needs. This involves
gathering both functional and non-functional requirements to guide the subsequent
development stages. Following this, the design phase entails structuring the system
architecture and creating detailed specifications for each module or component. Designing
the user interface is also crucial during this phase to ensure an intuitive and user-friendly
experience.

Once the design is finalized, the development phase kicks off, involving the actual
implementation of the system components. This includes writing scripts or modules for data
preprocessing, feature extraction, and model training, utilizing libraries like scikit-learn for
machine learning model implementation. Integrating these models into the user interface
allows for seamless interaction and display of evaluation results. Throughout development,
rigorous testing is conducted, including unit testing, integration testing, and system testing, to
ensure the system's correctness and compliance with requirements.

Upon successful testing, the deployment phase involves preparing the system for deployment
on the intended platform, whether it be a web server or a desktop application. Configuration
of server-side components and databases, along with final testing in the production
environment, ensures smooth deployment and operation. Post-deployment, ongoing
maintenance and support are crucial for monitoring the system's performance, addressing any
issues promptly, and implementing updates or enhancements based on user feedback and
evolving requirements. Adherence to best practices in software engineering, such as version
control and documentation, ensures the system's reliability, scalability, and security
throughout its lifecycle.

29
SCAS SPAM CLASSIFIER

In addition to the core phases of system development, several other critical aspects contribute
to the success of the spam filter evaluation project. Firstly, acquiring and preprocessing the
spam email dataset is essential, involving tasks like data cleaning and feature extraction.
Feature engineering plays a crucial role in identifying relevant features that distinguish
between spam and non-spam emails, employing techniques such as TF-IDF for text data.
Model selection and tuning involve experimenting with various machine learning algorithms
and hyperparameter optimization to identify the most effective model for classification.
Evaluation metrics like accuracy, precision, recall, and F1-score are essential for assessing
model performance accurately.

Designing an intuitive user interface facilitates interaction with the system, while scalability
and performance optimization ensure efficient handling of large datasets and user demands.
Security measures, including data encryption and user authentication, protect sensitive
information. Comprehensive documentation and training materials aid users and developers
in understanding and utilizing the system effectively. Addressing these aspects ensures the
development of a robust, reliable, and user-friendly spam filter evaluation system.

In addition to the core development phases, there are several supplementary components vital
to the success of the spam filter evaluation project. One crucial aspect is the continuous
monitoring and updating of the system to adapt to evolving spamming techniques and
patterns. Regular updates to the dataset used for training and testing the models ensure that
the system remains effective against new spam threats. Moreover, implementing feedback
mechanisms allows users to report misclassified emails, contributing to the refinement of the
classification models over time.

Furthermore, integration with external APIs or services for email handling and analysis can
enhance the system's capabilities, such as real-time email classification and automatic spam
filtering. Additionally, incorporating advanced features like natural language processing
(NLP) for semantic analysis of email content can improve the accuracy of spam detection.

30
SCAS SPAM CLASSIFIER

3.5.1 DESCRIPTION OF MODULES

The spam filter evaluation project encompasses multiple interconnected modules, each
playing a crucial role in the system's functionality and effectiveness.

Data Loading and Preprocessing Module:

At the core of the system lies the data loading and preprocessing module. This module is
responsible for fetching the spam dataset from the UCI Machine Learning Repository using a
specified URL. Upon retrieval, the data is structured into a DataFrame using the Pandas
library, where appropriate column names are assigned. Additionally, this module handles any
missing values and performs essential preprocessing steps, such as encoding categorical
variables and scaling numerical features, to ensure the dataset is suitable for model training.

Model Training and Evaluation Module:

Following data preprocessing, the system proceeds to train and evaluate two classification
models: K-Nearest Neighbors (KNN) and Decision Tree. The scikit-learn library facilitates
model training by providing implementations of these algorithms. The dataset is split into
training and testing sets using the train_test_split function, with a specified test size and
random seed for reproducibility. Each model is then trained on the training data and
evaluated using various performance metrics, including accuracy, precision, recall, F1-score,
and confusion matrix. These metrics provide insights into the models' effectiveness in
distinguishing between spam and non-spam messages.

Feature Scaling Module:

Prior to model training, the dataset undergoes feature scaling to normalize the numerical
features' values. The StandardScaler class from scikit-learn is employed to scale the features
to a mean of 0 and a standard deviation of 1. This preprocessing step prevents features with
larger magnitudes from dominating the model's learning process, ensuring fair and unbiased
model training.

31
SCAS SPAM CLASSIFIER

User Interface Module:

The user interface (UI) module employs the Tkinter library to create a graphical user
interface (GUI) for the spam filter evaluation system. The UI provides an intuitive platform
for users to interact with the system, displaying essential information such as the number of
spam and non-spam messages in the dataset. Additionally, a scrolled text box presents the
evaluation results of the trained models, allowing users to assess their performance
comprehensively.

Data Analysis and Visualization Module:

Although not explicitly depicted in the provided code snippet, a data analysis and
visualization module could be incorporated to further explore the dataset's characteristics and
visualize the model's performance. This module may include tasks such as feature
distribution analysis, correlation examination, and generation of visualizations such as
histograms, scatter plots, or ROC curves to aid in model interpretation and decision-making.

Together, these modules form a robust spam filter evaluation system capable of loading,
preprocessing, training, and evaluating classification models to assess their effectiveness in
identifying spam messages. The user-friendly interface enhances usability, while the
incorporation of feature scaling ensures fair model training. Additionally, the system's
modular design allows for flexibility and scalability, facilitating future enhancements and
modifications to accommodate evolving requirements.

32
SCAS SPAM CLASSIFIER

TESTING AND IMPLEMENTATION

33
SCAS SPAM CLASSIFIER

4. TESTING AND IMPLEMENTATION


Testing:

The testing and implementation phase of the spam filter evaluation project involves rigorous
validation of the system's functionality and performance, followed by its deployment for
practical use.

Unit Testing: The system undergoes comprehensive unit testing to verify the correctness of
individual modules and functions. Test cases are designed to cover various scenarios,
including edge cases and typical user interactions. Automated testing frameworks such as
pytest or unittest are employed to streamline the testing process and ensure robustness.

Integration Testing: Once individual modules are validated, integration testing is conducted
to assess the interoperability and compatibility of different components. Integration tests
verify that modules communicate effectively and function as expected when integrated into
the system. This phase also includes testing user interface interactions and data flow between
modules.

Validation and Performance Testing: The trained models undergo validation testing to
evaluate their accuracy, precision, recall, F1-score, and other performance metrics. Validation
datasets, distinct from the training and testing sets, are used to assess the models'
generalization capabilities and identify any overfitting or underfitting issues. Performance
testing involves stress testing the system under various load conditions to ensure it can handle
multiple user interactions concurrently without performance degradation.

User Acceptance Testing (UAT): UAT involves inviting end-users or stakeholders to interact
with the system and provide feedback on its usability, intuitiveness, and effectiveness.
Testers simulate real-world usage scenarios to validate whether the system meets their
requirements and expectations. Any issues or suggestions raised during UAT are addressed
and incorporated into system refinements.

34
SCAS SPAM CLASSIFIER

Deployment and Rollout: Upon successful testing and validation, the spam filter evaluation
system is deployed for practical use. Deployment may involve hosting the system on a server
or cloud platform accessible to users via web or desktop interfaces. The rollout process
includes notifying users of the system's availability, providing necessary training and
documentation, and ensuring smooth transition from previous tools or processes.

Monitoring and Maintenance: Post-deployment, the system is continuously monitored to


detect and address any issues or performance bottlenecks. Monitoring tools track system
usage, performance metrics, and user feedback, allowing for proactive maintenance and
optimization. Regular updates and enhancements based on user feedback and evolving
requirements ensure the system remains effective and up-to-date.

Implementation:

Data Acquisition and Preprocessing:

Obtain the spam email dataset from a reliable source, such as the UCI Machine Learning
Repository.

Preprocess the dataset to handle missing values, encode categorical variables, and normalize
numerical features.

Model Training and Evaluation:

Implement machine learning models such as K-Nearest Neighbors (KNN) and Decision
Trees for spam classification using libraries like scikit-learn.

Split the dataset into training and testing sets to train the models and evaluate their
performance.

35
SCAS SPAM CLASSIFIER

Use performance metrics like accuracy, precision, recall, and F1-score to assess the models'
effectiveness.

User Interface Development:

Develop a graphical user interface (GUI) using the tkinter library in Python to provide an
interactive platform for users.

Design intuitive input fields for users to input email attributes and select classification
models.

Include output panels to display classification results, performance metrics, and confusion
matrices.

Integration of Models with GUI:

Integrate the trained machine learning models with the GUI to enable users to select models
and classify emails in real-time.

Implement functionality to preprocess user input, scale features, and pass them to the selected
model for prediction.

Display classification results and performance metrics in the GUI output panels for user
interpretation.

Testing and Debugging:

Conduct thorough testing of the integrated system to identify and fix any bugs or errors.

Test the system's functionality under different scenarios, including varying input data and
model selections.

Debug any issues related to data processing, model prediction, or GUI interaction to ensure
the system operates smoothly.

36
SCAS SPAM CLASSIFIER

Documentation and User Guides:

Document the implementation details, including data preprocessing steps, model training
parameters, and GUI design considerations.

Create user guides and documentation to help users understand how to interact with the
system, input email attributes, and interpret classification results.

Provide instructions for troubleshooting common issues and contacting support for
assistance.

Deployment and Maintenance:

Deploy the implemented system on a suitable platform, such as a local machine or a web
server accessible to users.

Monitor the system's performance and user feedback to identify areas for improvement and
implement updates as needed.

Provide ongoing maintenance and support to ensure the system remains functional and
effective in classifying spam emails.

37
SCAS SPAM CLASSIFIER

CONCLUSION

38
SCAS SPAM CLASSIFIER

5. CONCLUSION

In conclusion, the spam filter evaluation project presents a comprehensive solution to classify
emails as spam or non-spam using machine learning techniques. Through the implementation
of K-Nearest Neighbors (KNN) and Decision Tree models, coupled with a user-friendly
graphical interface developed with tkinter, users can interactively input email attributes and
receive real-time classification results.

The project's implementation involved various stages, including data acquisition and
preprocessing, model training and evaluation, GUI development, integration of models with
the GUI, testing and debugging, documentation creation, and deployment. Each step was
meticulously executed to ensure the system's functionality, accuracy, and user-friendliness.

Through thorough testing and validation, the implemented system demonstrates robust
performance in accurately classifying spam and non-spam emails. Users can rely on the
system to efficiently filter out unwanted spam emails, thereby enhancing productivity and
reducing the risk of falling victim to phishing scams or malicious content.

Overall, the spam filter evaluation project not only serves as a practical tool for email
classification but also showcases the power of machine learning in addressing real-world
problems. With continued maintenance and support, the project stands ready to serve users in
their email filtering needs, contributing to a safer and more efficient digital communication
environment.

39
SCAS SPAM CLASSIFIER

BIBLIOGRAPHY

40
SCAS SPAM CLASSIFIER

5.1 BIBLIOGRAPHY
 Graham, P., Robinson, R., & Hickey, T. (2003). SpamAssassin. Retrieved
from http://spamassassin.apache.org/
 Scikit-learn: Machine Learning in Python. (n.d.). Retrieved from https://scikit-
learn.org/stable/index.html
 UCI Machine Learning Repository: Spambase Data Set. (n.d.). Retrieved
from https://archive.ics.uci.edu/ml/datasets/spambase
 Tkinter GUI toolkit documentation. (n.d.). Retrieved from
https://docs.python.org/3/library/tkinter.html
 Python Standard Library. (n.d.). Retrieved from https://docs.python.org/3/library/
 "Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow" by Aurélien
Géron: This book covers machine learning concepts and practical implementations using
popular Python libraries like Scikit-Learn.
 "Python Data Science Handbook" by Jake VanderPlas: It's a comprehensive guide to data
science and machine learning in Python, covering essential libraries such as NumPy, Pandas,
Matplotlib, Scikit-Learn, and more.

41
SCAS SPAM CLASSIFIER

DATA FLOW DIAGRAM

42
SCAS SPAM CLASSIFIER

5.2 APPENDICES
A. DATA FLOW DIAGRAM

43
SCAS SPAM CLASSIFIER

44
SCAS SPAM CLASSIFIER

TABLE STRUCTURE

45
SCAS SPAM CLASSIFIER

TABLE STRUCTURE :

The table structure of the program could be organized as follows:

1. Data Preprocessing:

- Load dataset

- Split dataset into features (X) and target variable (y)

- Split data into training and testing sets

- Feature scaling using StandardScaler

2. Model Building:

- Initialize KNN and Decision Tree models

- Train the models using the training data

3. Model Evaluation:

- Use trained models to predict on the test data

- Calculate evaluation metrics (accuracy, precision, recall, F1-score, confusion matrix) for
each model

4. Display Results:

- Display the number of spam and non-spam messages in the dataset

- Display evaluation results (metrics and confusion matrix) for each model

46
SCAS SPAM CLASSIFIER

5. Graphical User Interface (GUI):

- Create a Tkinter window

- Add labels and text boxes to display spam and non-spam message counts

- Add a scrolled text box to display evaluation results

6. Main Functionality:

- Run the Tkinter event loop to start the GUI

- Incorporate the functionality to execute the model evaluation and display the results in the
GUI

This table structure outlines the main components and functionalities of the program, helping
to organize the code and ensure clarity and readability.

47
SCAS SPAM CLASSIFIER

SAMPLE CODING

48
SCAS SPAM CLASSIFIER

SAMPLE CODING:
import tkinter as tk
from tkinter import scrolledtext
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score,
confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
import pandas as pd

# Load and preprocess the dataset


url = "https://archive.ics.uci.edu/ml/machine-learning-databases/spambase/
spambase.data"
column_names = [f"attribute_{i}" for i in range(1, 58)] + ["label"]
data = pd.read_csv(url, header=None, names=column_names)

# Function to display the number of spam and non-spam messages


def display_spam_stats(data):
spam_count = data['label'].sum()
ham_count = len(data) - spam_count
return spam_count, ham_count

# Split the dataset into training and testing sets


X = data.iloc[:, :-1]
y = data["label"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Apply feature scaling


scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# KNN Model
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)

# Decision Tree Model


tree = DecisionTreeClassifier(criterion='entropy', max_depth=10)
tree.fit(X_train, y_train)

# Model Evaluation
models = {"KNN": knn, "Decision Tree": tree}

output_text = ""

for name, model in models.items():


y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

49
SCAS SPAM CLASSIFIER

precision = precision_score(y_test, y_pred)


recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
cm = confusion_matrix(y_test, y_pred)
output_text += f"{name} Results:\n"
output_text += f"Accuracy: {accuracy:.4f}\n"
output_text += f"Precision: {precision:.4f}\n"
output_text += f"Recall: {recall:.4f}\n"
output_text += f"F1-score: {f1:.4f}\n"
output_text += f"Confusion Matrix:\n{cm}\n\n"

# Display the number of spam and non-spam messages


spam_count, ham_count = display_spam_stats(data)

# Create a Tkinter window


root = tk.Tk()
root.title("Spam Filter Evaluation")

# Add labels and text boxes to the window


spam_label = tk.Label(root, text=f"Number of Spam Messages: {spam_count}")
spam_label.pack()

ham_label = tk.Label(root, text=f"Number of Non-Spam Messages: {ham_count}")


ham_label.pack()

output_textbox = scrolledtext.ScrolledText(root, width=60, height=20)


output_textbox.insert(tk.END, output_text)
output_textbox.pack()

# Start the Tkinter event loop


root.mainloop()

50
SCAS SPAM CLASSIFIER

OUTPUT:

51
SCAS SPAM CLASSIFIER

SPAM DATASET:
ham Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine
there got amore wat...
ham Ok lar... Joking wif u oni...
spam Free entry in 2 a wkly comp to win FA Cup final tkts 21st May 2005. Text FA to
87121 to receive entry question(std txt rate)T&C's apply 08452810075over18's
ham U dun say so early hor... U c already then say...
ham Nah I don't think he goes to usf, he lives around here though
spam FreeMsg Hey there darling it's been 3 week's now and no word back! I'd like some
fun you up for it still? Tb ok! XxX std chgs to send, £1.50 to rcv
ham Even my brother is not like to speak with me. They treat me like aids patent.
ham As per your request 'Melle Melle (Oru Minnaminunginte Nurungu Vettam)' has been
set as your callertune for all Callers. Press *9 to copy your friends Callertune
spam WINNER!! As a valued network customer you have been selected to receivea £900
prize reward! To claim call 09061701461. Claim code KL341. Valid 12 hours only.
spam Had your mobile 11 months or more? U R entitled to Update to the latest colour
mobiles with camera for Free! Call The Mobile Update Co FREE on 08002986030
ham I'm gonna be home soon and i don't want to talk about this stuff anymore tonight, k?
I've cried enough today.
spam SIX chances to win CASH! From 100 to 20,000 pounds txt> CSH11 and send to
87575. Cost 150p/day, 6days, 16+ TsandCs apply Reply HL 4 info
spam URGENT! You have won a 1 week FREE membership in our £100,000 Prize
Jackpot! Txt the word: CLAIM to No: 81010 T&C www.dbuk.net LCCLTD POBOX
4403LDNW1A7RW18
ham I've been searching for the right words to thank you for this breather. I promise i wont
take your help for granted and will fulfil my promise. You have been wonderful and a
blessing at all times.
ham I HAVE A DATE ON SUNDAY WITH WILL!!
spam XXXMobileMovieClub: To use your credit, click the WAP link in the next txt
message or click here>> http://wap. xxxmobilemovieclub.com?n=QJKGIGHJJGCBL
ham Oh k...i'm watching here:)
ham Eh u remember how 2 spell his name... Yes i did. He v naughty make until i v wet.
ham Fine if that’s the way u feel. That’s the way its gota b

52
SCAS SPAM CLASSIFIER

spam England v Macedonia - dont miss the goals/team news. Txt ur national team to 87077
eg ENGLAND to 87077 Try:WALES, SCOTLAND 4txt/ú1.20 POBOXox36504W45WQ
16+
ham Is that seriously how you spell his name?
ham I‘m going to try for 2 months ha ha only joking
ham So ü pay first lar... Then when is da stock comin...
ham Aft i finish my lunch then i go str down lor. Ard 3 smth lor. U finish ur lunch already?
ham Ffffffffff. Alright no way I can meet up with you sooner?
ham Just forced myself to eat a slice. I'm really not hungry tho. This sucks. Mark is getting
worried. He knows I'm sick when I turn down pizza. Lol
ham Lol your always so convincing.
ham Did you catch the bus ? Are you frying an egg ? Did you make a tea? Are you eating
your mom's left over dinner ? Do you feel my Love ?
ham I'm back & we're packing the car now, I'll let you know if there's room
ham Ahhh. Work. I vaguely remember that! What does it feel like? Lol
ham Wait that's still not all that clear, were you not sure about me being sarcastic or that
that's why x doesn't want to live with us
ham Yeah he got in at 2 and was v apologetic. n had fallen out and she was actin like spoilt
child and he got caught up in that. Till 2! But we won't go there! Not doing too badly cheers.
You?
ham K tell me anything about you.
ham For fear of fainting with the of all that housework you just did? Quick have a cuppa
spam Thanks for your subscription to Ringtone UK your mobile will be charged £5/month
Please confirm by replying YES or NO. If you reply NO you will not be charged
ham Yup... Ok i go home look at the timings then i msg ü again... Xuhui going to learn on
2nd may too but her lesson is at 8am
ham Oops, I'll let you know when my roommate's done
ham I see the letter B on my car
ham Anything lor... U decide...
ham Hello! How's you and how did saturday go? I was just texting to see if you'd decided
to do anything tomo. Not that i'm trying to invite myself or anything!
ham Pls go ahead with watts. I just wanted to be sure. Do have a great weekend. Abiola
ham Did I forget to tell you ? I want you , I need you, I crave you ... But most of all ... I
love you my sweet Arabian steed ... Mmmmmm ... Yummy

53
SCAS SPAM CLASSIFIER

spam 07732584351 - Rodger Burns - MSG = We tried to call you re your reply to our sms
for a free nokia mobile + free camcorder. Please call now 08000930705 for delivery
tomorrow
ham WHO ARE YOU SEEING?
ham Great! I hope you like your man well endowed. I am <#> inches...
ham No calls..messages..missed calls
ham Didn't you get hep b immunisation in nigeria.
ham Fair enough, anything going on?
ham Yeah hopefully, if tyler can't do it I could maybe ask around a bit
ham U don't know how stubborn I am. I didn't even want to go to the hospital. I kept
telling Mark I'm not a weak sucker. Hospitals are for weak suckers.
ham What you thinked about me. First time you saw me in class.
ham A gram usually runs like <#> , a half eighth is smarter though and gets you
almost a whole second gram for <#>
ham K fyi x has a ride early tomorrow morning but he's crashing at our place tonight
ham Wow. I never realized that you were so embarassed by your accomodations. I thought
you liked it, since i was doing the best i could and you always seemed so happy about "the
cave". I'm sorry I didn't and don't have more to give. I'm sorry i offered. I'm sorry your room
was so embarassing.
spam SMS. ac Sptv: The New Jersey Devils and the Detroit Red Wings play Ice Hockey.
Correct or Incorrect? End? Reply END SPTV
ham Do you know what Mallika Sherawat did yesterday? Find out now @ <URL>
spam Congrats! 1 year special cinema pass for 2 is yours. call 09061209465 now! C
Suprman V, Matrix3, StarWars3, etc all 4 FREE! bx420-ip4-5we. 150pm. Dont miss out!
ham Sorry, I'll call later in meeting.
ham Tell where you reached
ham Yes..gauti and sehwag out of odi series.
ham Your gonna have to pick up a $1 burger for yourself on your way home. I can't even
move. Pain is killing me.
ham Ha ha ha good joke. Girls are situation seekers.
ham Its a part of checking IQ
ham Sorry my roommates took forever, it ok if I come by now?
ham Ok lar i double check wif da hair dresser already he said wun cut v short. He said will
cut until i look nice.

54
SCAS SPAM CLASSIFIER

spam As a valued customer, I am pleased to advise you that following recent review of your
Mob No. you are awarded with a £1500 Bonus Prize, call 09066364589
ham Today is "song dedicated day.." Which song will u dedicate for me? Send this to all ur
valuable frnds but first rply me...
spam Urgent UR awarded a complimentary trip to EuroDisinc Trav, Aco&Entry41 Or
£1000. To claim txt DIS to 87121 18+6*£1.50(moreFrmMob. ShrAcomOrSglSuplt)10, LS1
3AJ
spam Did you hear about the new "Divorce Barbie"? It comes with all of Ken's stuff!
ham I plane to give on this month end.
ham Wah lucky man... Then can save money... Hee...
ham Finished class where are you.
ham HI BABE IM AT HOME NOW WANNA DO SOMETHING? XX
ham K..k:)where are you?how did you performed?
ham U can call me now...
ham I am waiting machan. Call me once you free.
ham Thats cool. i am a gentleman and will treat you with dignity and respect.
ham I like you peoples very much:) but am very shy pa.
ham Does not operate after <#> or what
ham Its not the same here. Still looking for a job. How much do Ta's earn there.
ham Sorry, I'll call later
ham K. Did you call me just now ah?
ham Ok i am on the way to home hi hi
ham You will be in the place of that man
ham Yup next stop.
ham I call you later, don't have network. If urgnt, sms me.
ham For real when u getting on yo? I only need 2 more tickets and one more jacket and I'm
done. I already used all my multis.
ham Yes I started to send requests to make it but pain came back so I'm back in bed.
Double coins at the factory too. I gotta cash in all my nitros.
ham I'm really not up to it still tonight babe
ham Ela kano.,il download, come wen ur free..
ham Yeah do! Don‘t stand to close tho- you‘ll catch something!

55
SCAS SPAM CLASSIFIER

ham Sorry to be a pain. Is it ok if we meet another night? I spent late afternoon in casualty
and that means i haven't done any of y stuff42moro and that includes all my time sheets and
that. Sorry.
ham Smile in Pleasure Smile in Pain Smile when trouble pours like Rain Smile when sum1
Hurts U Smile becoz SOMEONE still Loves to see u Smiling!!
spam Please call our customer service representative on 0800 169 6031 between 10am-9pm
as you have WON a guaranteed £1000 cash or £5000 prize!
ham Havent planning to buy later. I check already lido only got 530 show in e afternoon. U
finish work already?
spam Your free ringtone is waiting to be collected. Simply text the password "MIX" to
85069 to verify. Get Usher and Britney. FML, PO Box 5249, MK17 92H. 450Ppw 16
ham Watching telugu movie..wat abt u?
ham i see. When we finish we have loads of loans to pay
ham Hi. Wk been ok - on hols now! Yes on for a bit of a run. Forgot that i have
hairdressers appointment at four so need to get home n shower beforehand. Does that cause
prob for u?"
ham I see a cup of coffee animation
ham Please don't text me anymore. I have nothing else to say.
ham Okay name ur price as long as its legal! Wen can I pick them up? Y u ave x ams xx
ham I'm still looking for a car to buy. And have not gone 4the driving test yet.
ham As per your request 'Melle Melle (Oru Minnaminunginte Nurungu Vettam)' has been
set as your callertune for all Callers. Press *9 to copy your friends Callertune
ham wow. You're right! I didn't mean to do that. I guess once i gave up on boston men and
changed my search location to nyc, something changed. Cuz on my signin page it still says
boston.
ham Umma my life and vava umma love you lot dear
ham Thanks a lot for your wishes on my birthday. Thanks you for making my birthday
truly memorable.
ham Aight, I'll hit you up when I get some cash
ham How would my ip address test that considering my computer isn't a minecraft server
ham I know! Grumpy old people. My mom was like you better not be lying. Then again I
am always the one to play jokes...
ham Dont worry. I guess he's busy.
ham What is the plural of the noun research?

56
SCAS SPAM CLASSIFIER

ham Going for dinner.msg you after.


ham I'm ok wif it cos i like 2 try new things. But i scared u dun like mah. Cos u said not
too loud.
spam GENT! We are trying to contact you. Last weekends draw shows that you won a
£1000 prize GUARANTEED. Call 09064012160. Claim Code K52. Valid 12hrs only.
150ppm
ham Wa, ur openin sentence very formal... Anyway, i'm fine too, juz tt i'm eatin too much
n puttin on weight...Haha... So anythin special happened?
ham As I entered my cabin my PA said, '' Happy B'day Boss !!''. I felt special. She askd me
4 lunch. After lunch she invited me to her apartment. We went there.
spam You are a winner U have been specially selected 2 receive £1000 or a 4* holiday
(flights inc) speak to a live operator 2 claim 0871277810910p/min (18+)
ham Goodo! Yes we must speak friday - egg-potato ratio for tortilla needed!
ham Hmm...my uncle just informed me that he's paying the school directly. So pls buy
food.
spam PRIVATE! Your 2004 Account Statement for 07742676969 shows 786 unredeemed
Bonus Points. To claim call 08719180248 Identifier Code: 45239 Expires
spam URGENT! Your Mobile No. was awarded £2000 Bonus Caller Prize on 5/9/03 This is
our final try to contact U! Call from Landline 09064019788 BOX42WR29C, 150PPM
ham here is my new address -apples&pairs&all that malarky
spam Todays Voda numbers ending 7548 are selected to receive a $350 award. If you have
a match please call 08712300220 quoting claim code 4041 standard rates app
ham I am going to sao mu today. Will be done only at 12
ham Ü predict wat time ü'll finish buying?
ham Good stuff, will do.
ham Just so that you know,yetunde hasn't sent money yet. I just sent her a text not to
bother sending. So its over, you dont have to involve yourself in anything. I shouldn't have
imposed anything on you in the first place so for that, i apologise.
ham Are you there in room.
ham HEY GIRL. HOW R U? HOPE U R WELL ME AN DEL R BAK! AGAIN LONG
TIME NO C! GIVE ME A CALL SUM TIME FROM LUCYxx
ham K..k:)how much does it cost?
ham I'm home.
ham Dear, will call Tmorrow.pls accomodate.

57
SCAS SPAM CLASSIFIER

ham First answer my question.


spam Sunshine Quiz Wkly Q! Win a top Sony DVD player if u know which country the
Algarve is in? Txt ansr to 82277. £1.50 SP:Tyrone
spam Want 2 get laid tonight? Want real Dogging locations sent direct 2 ur mob? Join the
UK's largest Dogging Network bt Txting GRAVEL to 69888! Nt. ec2a. 31p.msg@150p
ham I only haf msn. It's [email protected]
ham He is there. You call and meet him
ham No no. I will check all rooms befor activities
spam You'll not rcv any more msgs from the chat svc. For FREE Hardcore services text GO
to: 69988 If u get nothing u must Age Verify with yr network & try again
ham Got c... I lazy to type... I forgot ü in lect... I saw a pouch but like not v nice...
ham K, text me when you're on the way
ham Sir, Waiting for your mail.
ham A swt thought: "Nver get tired of doing little things 4 lovable persons.."
Coz..somtimes those little things occupy d biggest part in their Hearts.. Gud ni8
ham I know you are. Can you pls open the back?
ham Yes see ya not on the dot
ham Whats the staff name who is taking class for us?
spam FreeMsg Why haven't you replied to my text? I'm Randy, sexy, female and live local.
Luv to hear from u. Netcollex Ltd 08700621170150p per msg reply Stop to end
ham Ummma.will call after check in.our life will begin from qatar so pls pray very hard.
ham K..i deleted my contact that why?
ham Sindu got job in birla soft ..
ham The wine is flowing and i'm i have nevering..
ham Yup i thk cine is better cos no need 2 go down 2 plaza mah.
ham Ok... Ur typical reply...
ham As per your request 'Melle Melle (Oru Minnaminunginte Nurungu Vettam)' has been
set as your callertune for all Callers. Press *9 to copy your friends Callertune
ham You are everywhere dirt, on the floor, the windows, even on my shirt. And sometimes
when i open my mouth, you are all that comes flowing out. I dream of my world without you,
then half my chores are out too. A time of joy for me, lots of tv shows i.ll see. But i guess like
all things you just must exist, like rain, hail and mist, and when my time here is done, you
and i become one.
ham Aaooooright are you at work?

58
SCAS SPAM CLASSIFIER

ham I'm leaving my house now...


ham Hello, my love. What are you doing? Did you get to that interview today? Are you
you happy? Are you being a good boy? Do you think of me?Are you missing me ?
spam Customer service annoncement. You have a New Years delivery waiting for you.
Please call 07046744435 now to arrange delivery
spam You are a winner U have been specially selected 2 receive £1000 cash or a 4* holiday
(flights inc) speak to a live operator 2 claim 0871277810810
ham Keep yourself safe for me because I need you and I miss you already and I envy
everyone that see's you in real life
ham New car and house for my parents.:)i have only new job in hand:)
ham I'm so in love with you. I'm excited each day i spend with you. You make me so
happy.
spam -PLS STOP bootydelious (32/F) is inviting you to be her friend. Reply YES-434 or
NO-434 See her: www.SMS.ac/u/bootydelious STOP? Send STOP FRND to 62468
spam BangBabes Ur order is on the way. U SHOULD receive a Service Msg 2 download
UR content. If U do not, GoTo wap. bangb. tv on UR mobile internet/service menu
ham I place all ur points on e cultures module already.
spam URGENT! We are trying to contact you. Last weekends draw shows that you have
won a £900 prize GUARANTEED. Call 09061701939. Claim code S89. Valid 12hrs only
ham Hi frnd, which is best way to avoid missunderstding wit our beloved one's?
ham Great escape. I fancy the bridge but needs her lager. See you tomo
ham Yes :)it completely in out of form:)clark also utter waste.
ham Sir, I need AXIS BANK account no and bank address.
ham Hmmm.. Thk sure got time to hop ard... Ya, can go 4 free abt... Muz call u to discuss
liao...
ham What time you coming down later?
ham Bloody hell, cant believe you forgot my surname Mr . Ill give u a clue, its spanish and
begins with m...
ham Well, i'm gonna finish my bath now. Have a good...fine night.
ham Let me know when you've got the money so carlos can make the call
ham U still going to the mall?
ham Turns out my friends are staying for the whole show and won't be back til ~ <#>
, so feel free to go ahead and smoke that $ <#> worth
ham Text her. If she doesnt reply let me know so i can have her log in

59
SCAS SPAM CLASSIFIER

ham Hi! You just spoke to MANEESHA V. We'd like to know if you were satisfied with
the experience. Reply Toll Free with Yes or No.
ham You lifted my hopes with the offer of money. I am in need. Especially when the end
of the month approaches and it hurts my studying. Anyways have a gr8 weekend
ham Lol no. U can trust me.
ham ok. I am a gentleman and will treat you with dignity and respect.
ham He will, you guys close?
ham Going on nothing great.bye
ham Hello handsome ! Are you finding that job ? Not being lazy ? Working towards
getting back that net for mummy ? Where's my boytoy now ? Does he miss me ?
ham Haha awesome, be there in a minute
spam Please call our customer service representative on FREEPHONE 0808 145 4742
between 9am-11pm as you have WON a guaranteed £1000 cash or £5000 prize!
ham Have you got Xmas radio times. If not i will get it now
ham I jus reached home. I go bathe first. But my sis using net tell u when she finishes k...
spam Are you unique enough? Find out from 30th August. www.areyouunique.co.uk
ham I'm sorry. I've joined the league of people that dont keep in touch. You mean a great
deal to me. You have been a friend at all times even at great personal cost. Do have a great
week.|
ham Hi :)finally i completed the course:)
ham It will stop on itself. I however suggest she stays with someone that will be able to
give ors for every stool.
ham How are you doing? Hope you've settled in for the new school year. Just wishin you a
gr8 day
ham Gud mrng dear hav a nice day
ham Did u got that persons story
ham is your hamster dead? Hey so tmr i meet you at 1pm orchard mrt?
ham Hi its Kate how is your evening? I hope i can see you tomorrow for a bit but i have to
bloody babyjontet! Txt back if u can. :) xxx
ham Found it, ENC <#> , where you at?
ham I sent you <#> bucks
ham Hello darlin ive finished college now so txt me when u finish if u can love Kate xxx

60
SCAS SPAM CLASSIFIER

ham Your account has been refilled successfully by INR <DECIMAL> . Your
KeralaCircle prepaid account balance is Rs <DECIMAL> . Your Transaction ID is KR
<#> .
ham Goodmorning sleeping ga.
ham U call me alter at 11 ok.
ham Ü say until like dat i dun buy ericsson oso cannot oredi lar...
ham As I entered my cabin my PA said, '' Happy B'day Boss !!''. I felt special. She askd me
4 lunch. After lunch she invited me to her apartment. We went there.

61

You might also like