sentiment analysis
sentiment analysis
SPAM CLASSFIER
Submitted by
Mr. K. Kiran
(Reg. No. 2122K1649)
March 2024
PROJECT WORK
SPAM CLASSFIER
Mr.K.Kiran
Reg. No. 2122K1649
CERTIFICATE
SCAS SPAM CLASSIFIER
March 2024
SUGUNA COLLEGE OF ARTS AND SCIENCE
(Affiliated to Bharathiar University, Coimbatore)
Nehru Nagar, Kalapatti Road, Civil Aerodrome (PO),
Coimbatore-641014.
CERTIFICATE
This is to certify that the project, entitled “SPAM CLASSFIER” submitted to the Bharathiar
University, in partial fulfillment of the requirements for the award of the Degree of Bachelor
(Reg. No. 2122K1649) during the period of 2021–2024 of his study in the Department of
Computer Science at Suguna College Arts and Science, Coimbatore, under my supervision
and guidance and the project has not formed the basis for the award of any Degree /
Diploma / Associateship / Fellowship or other similar title to any candidate of any University.
DECLARATION
SCAS SPAM CLASSIFIER
DECLARATION
I, Mr.K.Kiran, hereby declare that the project entitled “SPAM CLASSFIER” submitted to
the Bharathiar University, Coimbatore in partial fulfillment of the requirements for the award
work done by me during the period of 2021-2024 under the supervision and guidance of
of Arts and Science, Coimbatore and it has not formed on the basis for the award of
any Degree/ Diploma/ Associate ship/ Fellowship or other similar title to any candidate of
any University.
ACKNOWLEDGEMENT
SCAS SPAM CLASSIFIER
ACKNOWLEDGEMENT
I take great pleasure in acknowledging the noble hearts who lent their helping hands for the
successful completion of my project. I also extend my gratitude to all those who contributed
directly and indirectly to the success of this project.
Deep thanks are extended to Smt. L. Suguna, President, Suguna Group of Institutions,
Coimbatore who also offered me the chance to undertake and successfully complete this
project work.
Sincere appreciation goes to Dr. Srikanth Kannan, Secretary, Suguna Group of Institutions,
Coimbatore for his full support and cooperation, enabling the success of this project and
granting permission to utilize various facilities.
I express my heartfelt thanks to Dr. V. Sekar, Director, Suguna College of Arts and Science,
Coimbatore for encouraging me throughout the project work.
I take this opportunity to thank Dr. R. Rajkumar, M.Com., MBA., M.Phil., Ph.D.,
Principal Suguna College of Arts and Science, Coimbatore for his unwavering support and
assistance in completing this project.
Wholehearted thanks go to Dr. N. Kamalraj, MCA, M.Phil., Ph.D., Associate Professor &
Head, Department of Computer Science, Suguna College of Arts and Science, Coimbatore
for his full support and assistance in completing this project.
SCAS SPAM CLASSIFIER
I express my deep gratitude and respect to my guide, Ms. B. Suganya, MCA., M.Phil.,
(Ph.D.), Assistant Professor, Department of Computer Science, Suguna College of Arts and
Science, Coimbatore, for her valuable suggestions and timely guidance, which played a
pivotal role in streamlining the successful completion of this project.
SCAS SPAM CLASSIFIER
SYNOPSIS
SCAS SPAM CLASSIFIER
SYNOPSIS
This Python project serves as a comprehensive tool for evaluating spam classification
models, offering insights into the performance of machine learning algorithms in identifying
and distinguishing spam emails from legitimate ones. By leveraging the Spambase dataset,
which encompasses a diverse range of attributes extracted from emails, including textual
content, metadata, and header information, the project provides a robust foundation for
training and testing machine learning models.
Through the utilization of K-Nearest Neighbors (KNN) and Decision Tree algorithms, users
gain access to a comparative analysis of model performance, allowing them to assess the
strengths and weaknesses of each approach in accurately classifying spam.
Moreover, the incorporation of feature scaling techniques enhances the reliability and
generalizability of the models, ensuring consistent performance across varying datasets. With
its intuitive graphical user interface (GUI), which displays essential statistics such as the
number of spam and non-spam messages and presents evaluation results in an organized
manner, this project facilitates seamless interaction and interpretation of results for users
across different expertise levels.
As email security remains a critical concern in today's digital landscape, this project
underscores the significance of machine learning in combating spam and underscores its
potential for fostering a safer and more secure online environment.
Furthermore, the project emphasizes the significance of continuous model evaluation and
refinement in the context of evolving spam tactics and email security threats. By providing
users with a platform to assess model performance using real-world data, the project
encourages ongoing experimentation and optimization of spam classification algorithms.
Additionally, it fosters a deeper understanding of the intricate nuances involved in email
filtering, including the detection of subtle patterns and anomalies indicative of spam
behavior.
Through the transparent presentation of evaluation metrics and confusion matrices, users can
gain insights into the efficacy of different feature sets and algorithmic approaches, paving the
way for informed decision-making in model selection and deployment. Moreover, the project
SCAS SPAM CLASSIFIER
serves as a catalyst for interdisciplinary collaboration, bridging the gap between machine
learning expertise and domain-specific knowledge in cybersecurity.
TABLE OF CONTENTS
SCAS SPAM CLASSIFIER
TABLE OF CONTENTS
S. No. Description Page No.
ACKNOWLEDGEMENT
SYNOPSIS
CONTENTS
1 INTRODUCTION 1
1.1 OVERVIEW OF THE PROJECT
1.2 SYSTEM SPECIFICATION
1.2.1 HARDWARE CONFIGURATION
1.2.2 SOFTWARE SPECIFICATION
2 SYSTEM STUDY 12 – 16
2.1 EXISTING SYSTEM
2.1.1 DRAWBACKS OF EXISTING SYSTEM
2.2 PROPOSED SYSTEM
2.2.1 FEATURES OF PROPOSED SYSTEM
5 CONCLUSION 33-62
5.1 BIBLIOGRAPHY
SCAS SPAM CLASSIFIER
5.2 APPENDICES
A. DATA FLOW DIAGRAM
B. TABLE STRUCTURE
C. SAMPLE CODING
D. SPAM DATASET
SCAS SPAM CLASSIFIER
INTRODUCTION
SCAS SPAM CLASSIFIER
1. INTRODUCTION
This Python project aims to evaluate the performance of machine learning models in
classifying spam emails using the Spambase dataset. With the prevalence of spam emails
posing a significant threat to users' security and productivity, the project provides a practical
solution by employing two popular classification algorithms: K-Nearest Neighbors (KNN)
and Decision Trees.
The dataset, comprising attributes extracted from emails along with corresponding labels,
undergoes preprocessing steps including feature scaling and splitting into training and testing
sets. Subsequently, both models are trained on the training data and evaluated using key
metrics such as accuracy, precision, recall, and F1-score, as well as confusion matrices for
visual representation of classification results.
To enhance user interaction and interpretation of results, the project features a graphical user
interface (GUI) built using Tkinter, displaying essential statistics like the number of spam and
non-spam messages, and presenting evaluation outcomes in a user-friendly format. Overall,
this project serves as a valuable tool for assessing the efficacy of machine learning
approaches in spam detection, contributing to the ongoing efforts in enhancing email security
and user experience.
In addition to evaluating the performance of machine learning models, this project offers
insights into the broader landscape of email security and the challenges posed by spam. By
delving into the intricacies of spam classification, users gain a deeper understanding of the
various techniques and strategies employed by malicious actors to bypass email filters and
deceive recipients.
Moreover, the project highlights the importance of continuous monitoring and adaptation in
the face of evolving spam tactics, underscoring the need for robust and adaptive spam
filtering mechanisms. Furthermore, by providing a platform for experimentation and
1
SCAS SPAM CLASSIFIER
Ultimately, this project not only addresses the immediate need for spam classification but
also contributes to the advancement of email security technologies, fostering a safer and more
secure online environment for users worldwide.
Additionally, this project can be extended to explore more advanced machine learning
techniques and feature engineering strategies for improving spam classification accuracy.
Advanced algorithms such as Support Vector Machines (SVM), Random Forests, or Gradient
Boosting could be incorporated and compared against the KNN and Decision Tree models to
assess their performance.
Additionally, the GUI could be enhanced with visualization tools to provide intuitive insights
into the distribution of features and the decision boundaries of the classification models. By
integrating these additional components, the project can offer a more comprehensive and
interactive platform for exploring and advancing spam classification techniques, ultimately
contributing to the ongoing efforts in combating email spam and enhancing cybersecurity.
2
SCAS SPAM CLASSIFIER
To comprehend the essence of spam filtering, it's essential to define the term "spam" within
the realm of electronic communication. Spam encompasses a wide array of unwanted
messages, ranging from email solicitations and phishing attempts to fraudulent
advertisements and malicious links. The proliferation of spam can be attributed to the advent
of digital communication channels, which have provided spammers with unprecedented
avenues to disseminate their messages indiscriminately.
Moreover, the financial incentives driving spam operations have fueled its exponential
growth, making it a lucrative enterprise for cybercriminals seeking to exploit unsuspecting
individuals and organizations. Given the pervasive nature of spam and its detrimental
consequences, the implementation of robust spam filtering mechanisms has emerged as a
critical necessity. Spam filtering serves as a formidable barrier against unwanted messages,
employing a diverse range of techniques to identify and intercept spam before it reaches its
intended recipients.
By leveraging advanced algorithms, heuristic analysis, and machine learning models, spam
filters can distinguish between legitimate communications and unsolicited content, thereby
safeguarding users from potential threats and preserving the integrity of communication
platforms. Central to the efficacy of spam filtering is the rigorous evaluation of its techniques
and methodologies. Various metrics, such as accuracy, precision, recall, and F1-score, are
employed to assess the performance of spam filtering algorithms.
Machine learning, in particular, has revolutionized the landscape of spam filtering, offering
powerful tools for pattern recognition and classification. Through supervised learning
algorithms like K-Nearest Neighbors (KNN) and Decision Trees, spam filters can analyze
vast datasets, identify underlying patterns, and make informed decisions regarding the
classification of incoming messages. To illustrate the practical application of spam filtering
techniques, we delve into a comprehensive case study focusing on the evaluation of two
prominent models: K-Nearest Neighbors (KNN) and Decision Trees. Leveraging a dataset
sourced from the UCI Machine Learning Repository, we embark on a journey to train, test,
and evaluate these models using real-world spam data.
3
SCAS SPAM CLASSIFIER
4
SCAS SPAM CLASSIFIER
This Python project serves as a robust spam filter evaluation tool, leveraging machine
learning algorithms to classify emails as spam or non-spam. It begins by loading and
preprocessing the Spambase dataset, encompassing various attributes extracted from emails.
Splitting the data into training and testing sets, it applies feature scaling to ensure uniformity
in feature magnitudes. Two classification models, K-Nearest Neighbors (KNN) and Decision
Tree, are trained on the preprocessed data and evaluated using standard classification metrics
such as accuracy, precision, recall, and F1-score, along with confusion matrices to visualize
classification performance.
The project also includes a graphical user interface (GUI) built using Tkinter, providing
users with an intuitive platform to view the number of spam and non-spam messages in the
dataset and examine the evaluation results. Overall, this project facilitates comprehensive
assessment and comparison of machine learning models for spam classification, contributing
to advancements in email security and filtering techniques.
In addition to facilitating spam filter evaluation through machine learning algorithms, this
project serves as a versatile tool for analyzing email datasets and refining classification
techniques. By harnessing the power of K-Nearest Neighbors (KNN) and Decision Trees,
users gain valuable insights into the effectiveness of different approaches in discerning spam
from legitimate emails.
Overall, this project not only addresses the immediate need for spam classification but also
serves as a foundation for ongoing research and development in email security and machine
learning applications.
5
SCAS SPAM CLASSIFIER
The foundation of our project lies in the analysis and preprocessing of the Spambase dataset
sourced from the UCI Machine Learning Repository. This dataset comprises a collection of
attributes extracted from email messages, along with corresponding labels indicating whether
the message is spam or not. We meticulously preprocess the dataset, splitting it into features
(X) and labels (y), and further divide it into training and testing sets using the
`train_test_split` function from the scikit-learn library. Moreover, we apply feature scaling
using the `StandardScaler` to ensure uniformity and optimal performance of our models.
With the dataset prepared, we proceed to train two distinct machine learning models: K-
Nearest Neighbors (KNN) and Decision Trees. The KNN model, a versatile and intuitive
algorithm, operates by classifying data points based on the majority class among their nearest
neighbors. Conversely, the Decision Tree model employs a hierarchical structure of decision
nodes to recursively partition the feature space, ultimately assigning class labels to data
points based on the terminal nodes' majority vote. By fitting these models to the training data,
we enable them to learn underlying patterns and relationships, thereby facilitating the
classification of incoming messages as spam or non-spam.
Following model training, we embark on the crucial phase of model evaluation to assess their
performance and efficacy in classifying spam messages. Leveraging industry-standard
evaluation metrics, including accuracy, precision, recall, and F1-score, we meticulously
evaluate the performance of both KNN and Decision Tree models on the testing dataset.
6
SCAS SPAM CLASSIFIER
Additionally, we compute the confusion matrix for each model, providing a detailed
breakdown of true positive, true negative, false positive, and false negative predictions.
Through rigorous evaluation, we gain insights into the strengths and limitations of each
model, thereby informing our subsequent analyses and recommendations.
With our models trained and evaluated, we transition to the implementation phase, wherein
we develop an interactive graphical user interface (GUI) using the Tkinter library in Python.
The GUI provides users with an intuitive platform to enter custom messages for classification
and view the corresponding predictions and evaluation metrics. Upon entering a message and
clicking the "Classify" button, the GUI triggers the preprocessing and classification of the
message using both the KNN and Decision Tree models. Subsequently, the classification
results, including predicted labels and evaluation metrics, are displayed to the user via
message boxes, enabling them to assess the models' performance in real-time.
7
SCAS SPAM CLASSIFIER
The system specifications for the spam filter evaluation tool developed using Python and
Tkinter encompass compatibility with major operating systems such as Windows, macOS,
and Linux, ensuring broad accessibility. The project relies on Python 3.x as the programming
language, with dependencies including tkinter for building the graphical user interface and
scikit-learn for machine learning functionalities like model training and evaluation.
Additionally, pandas and numpy are utilized for data manipulation and handling, while an
internet connection is necessary during execution to fetch the Spambase dataset from the
provided URL. By adhering to these system specifications, users can seamlessly execute and
interact with the spam filter evaluation tool, contributing to email security and machine
learning exploration.
Additionally, by leveraging widely-used libraries such as scikit-learn and tkinter, the project
ensures compatibility with existing Python environments and minimizes the need for
additional setup or configuration. This user-centric design philosophy underscores the
project's commitment to democratizing access to email security tools and empowering users
to take proactive measures against spam threats in a straightforward and intuitive manner.
Moreover, the project's system specifications prioritize scalability and extensibility, allowing
for future enhancements and customizations to meet evolving user needs and technological
advancements. By adhering to industry-standard practices and leveraging open-source
technologies, the project fosters a collaborative environment where contributions from the
community can drive innovation and improvement.
8
SCAS SPAM CLASSIFIER
Input Devices : Standard input devices such as a keyboard and mouse (or touchpad) are
required for interacting with the application. Additionally, a pointing device (e.g., mouse)
facilitates precise navigation and selection within the GUI.
Overall, the spam filter evaluation tool is designed to be lightweight and resource-efficient,
making it accessible to users with a wide range of hardware configurations. However, users
may experience improved performance with higher-end hardware specifications, particularly
when working with large datasets or complex machine learning models.
9
SCAS SPAM CLASSIFIER
• Integrated Development Environment (IDE): Any Python-compatible IDE can be used for
development and execution of the script. Popular choices include Visual Studio Code,
PyCharm, Jupyter Notebook, and Spyder.
Python Libraries:
ScrolledText - This module provides a widget called ScrolledText, which is a text widget that
automatically adds scrollbars when the text content exceeds the visible area.
10
SCAS SPAM CLASSIFIER
Web Browser: Google Chrome, Mozilla Firefox, or Microsoft Edge for accessing online
resources and documentation.
Video Conferencing Software: Zoom, Microsoft Teams, or Skype for online meetings and
collaboration.
11
SCAS SPAM CLASSIFIER
SYSTEM STUDY
12
SCAS SPAM CLASSIFIER
2. SYSTEM STUDY
The system study section in a project content provides an overview of the existing system, its
limitations, and the need for the proposed solution. . Here's a simplified system study for the
spam classification system:
Additionally, heuristic methods are employed, which leverage statistical or machine learning
techniques to assess email content and attributes. These methods often involve analyzing
word frequency, header information, and content characteristics to determine the likelihood
of an email being spam. Bayesian filtering, a statistical approach, calculates the probability of
an email being spam based on observed word occurrences and is trained on labeled email
datasets. Furthermore, the use of whitelists and blacklists helps classify emails based on
trusted or known spamming sources.
While traditional spam classification methods have been effective to a certain extent, they
may struggle to adapt to new spamming techniques and may result in higher false positive
rates compared to more advanced machine learning-based approaches.
Limited Adaptability: Traditional systems often rely on predefined rules or heuristics, making
them less adaptable to evolving spamming techniques. As spammers continually refine their
tactics, these systems may struggle to keep up with new spamming methods.
13
SCAS SPAM CLASSIFIER
High False Positive Rates: Rule-based systems and heuristic methods may inadvertently flag
legitimate emails as spam, leading to false positives. This can result in important messages
being incorrectly filtered out, potentially causing users to miss critical information.
Limited Feature Extraction: Traditional methods may have limited capabilities in extracting
and leveraging complex features from email content. They often focus on basic features such
as keyword frequency or header analysis, which may not capture the nuances of modern
spam emails.
Scalability Issues: As the volume of email traffic increases, traditional spam classification
systems may face scalability challenges. Processing large volumes of emails in real-time can
strain system resources and impact performance.
Lack of Personalization: Traditional systems often apply uniform filtering rules to all users,
regardless of individual preferences or behaviors. This one-size-fits-all approach may not
adequately address the unique spam filtering needs of different users or organizations.
14
SCAS SPAM CLASSIFIER
The system begins by loading and preprocessing a dataset containing attributes extracted
from spam and non-spam emails. These attributes include features such as word frequency,
character frequency, and other relevant characteristics of the email content. The dataset is
then split into training and testing sets to train and evaluate the machine learning models.
Two popular classification algorithms, K-Nearest Neighbors (KNN) and Decision Trees, are
utilized in the proposed system. These models are trained on the training data after applying
feature scaling to ensure consistent and accurate predictions. Following training, the models
are evaluated using various performance metrics such as accuracy, precision, recall, F1-score,
and confusion matrix.
The graphical user interface (GUI) built using Tkinter provides a user-friendly interface for
users to interact with the system. It displays essential information such as the number of spam
and non-spam messages in the dataset and the evaluation results of the machine learning
models. Users can easily interpret the classification performance and gain insights into the
effectiveness of the spam filter.
Overall, the proposed system offers a robust and adaptive approach to spam classification,
leveraging the power of machine learning to effectively identify and filter out spam emails
while minimizing false positives and improving overall email security.
In addition to the core functionality described, the proposed system can be further enhanced
with various features to improve its performance and usability. Firstly, feature engineering
15
SCAS SPAM CLASSIFIER
techniques can be explored to extract additional information from email metadata, such as
sender addresses, header details, and timestamps. This enriched feature set can enhance the
accuracy of spam classification. Moreover, the system can benefit from experimenting with
different classification algorithms beyond K-Nearest Neighbors and Decision Trees, such as
Support Vector Machines or Random Forests, along with fine-tuning their hyperparameters to
optimize performance.
Ensemble methods like bagging and boosting can also be implemented to combine multiple
classifiers and improve classification accuracy. Additionally, incorporating k-fold cross-
validation ensures robustness in evaluating model performance. Real-time classification
capabilities can be added to the system to automatically filter incoming emails, integrating
seamlessly with email clients or servers. Moreover, introducing user feedback mechanisms
allows users to provide input on misclassified emails, aiding in model refinement over time.
Integration with popular email services or clients, advanced data visualization in the GUI for
insights into email trends and model performance, and optimization for scalability and
performance are crucial aspects to consider. Furthermore, security measures must be
implemented to protect user data and ensure compliance with privacy regulations. By
incorporating these features, the proposed system can offer a comprehensive solution for
spam classification, meeting the evolving needs of users while providing robust protection
against spam emails.
16
SCAS SPAM CLASSIFIER
Ensemble Methods: Implementing ensemble methods like bagging and boosting to combine
the predictions of multiple classifiers, further enhancing classification accuracy and
robustness.
17
SCAS SPAM CLASSIFIER
Integration with Email Services: Integrating with popular email services or clients to
streamline the user experience and ensure seamless operation within existing email
workflows.
Scalability and Performance Optimization: Optimizing the system for scalability and
performance to handle large volumes of emails efficiently while maintaining high
classification accuracy.
Security Measures: Implementing robust security measures to protect user data and ensure
compliance with privacy regulations, safeguarding sensitive information from unauthorized
access or misuse.
18
SCAS SPAM CLASSIFIER
19
SCAS SPAM CLASSIFIER
The system design and development process for the spam filter project entails several critical
phases to ensure the creation of a robust and efficient solution. Initially, the project
requirements are thoroughly analyzed to understand the desired features, performance
metrics, and user expectations. Following this, relevant datasets containing email samples are
acquired and preprocessed to handle missing values, normalize features, and encode
categorical variables.
Subsequently, various machine learning algorithms suitable for text classification tasks, such
as K-Nearest Neighbors (KNN) and Decision Trees, are evaluated and the most appropriate
models are selected based on their performance on training and validation datasets. These
selected models are then trained using the preprocessed data, with hyperparameters tuned to
optimize performance. Evaluation of the trained models is conducted using performance
metrics like accuracy, precision, recall, and F1-score, along with confusion matrices to
visualize their classification performance.
In addition to the core system design and development process, several other aspects
contribute to the successful implementation of the spam filter project. One crucial component
is the selection and preprocessing of the dataset, which involves cleaning the data, handling
outliers, and ensuring data integrity to prevent biases in the model. Moreover, feature
20
SCAS SPAM CLASSIFIER
selection and engineering play a vital role in improving model performance by identifying the
most relevant features and transforming them to better represent the underlying patterns in
the data. Furthermore, the choice of evaluation metrics is critical to accurately assess the
performance of the trained models and compare them against each other.
Alongside model evaluation, techniques such as cross-validation help ensure the reliability
of the results by testing the models on multiple subsets of the data. Additionally, the
scalability and efficiency of the system are important considerations, especially when dealing
with large volumes of email data. Implementing optimization techniques and leveraging
parallel processing capabilities can help enhance the system's speed and scalability.
Lastly, robust error handling and logging mechanisms are essential to identify and address
any issues that may arise during system operation, ensuring the stability and reliability of the
spam filter application. By incorporating these elements into the system design and
development process, the spam filter project can deliver a reliable, efficient, and user-friendly
solution for classifying spam and non-spam emails.
21
SCAS SPAM CLASSIFIER
22
SCAS SPAM CLASSIFIER
The input design for the spam filter project encompasses several crucial steps aimed at
facilitating efficient data processing and model training. Initially, the selection of an
appropriate dataset is paramount, ensuring it contains email attributes alongside
corresponding labels denoting their classification as spam or non-spam. For this project, the
dataset of choice is sourced from the UCI Machine Learning Repository, specifically the
Spambase dataset. Following dataset selection, thorough data preprocessing tasks are
undertaken, encompassing handling missing values, encoding categorical variables (if
applicable), and scaling numerical features to ensure the data is suitably formatted for model
training.
Following this, machine learning models, including KNN and Decision Tree, are trained on
the data, learning patterns from input features to classify emails as spam or non-spam.
Evaluation of these models is crucial, employing metrics such as accuracy, precision, recall,
and F1-score to assess their effectiveness in generalizing to unseen data and accurately
classifying emails. Finally, a user interface is developed using Tkinter, allowing users to
interact with the spam filter system, visualize input features, model predictions, and
evaluation results in real-time.
Through meticulous attention to these input design steps, the spam filter project can
efficiently process input data, train models effectively, and provide valuable insights into
23
SCAS SPAM CLASSIFIER
email classification. In addition to the core input design steps outlined above, several other
aspects contribute to the overall effectiveness and robustness of the spam filter project. These
include:
Data Exploration and Analysis: Before proceeding with input design, it's essential to conduct
exploratory data analysis (EDA) to gain insights into the dataset's characteristics. This
involves visualizing feature distributions, identifying correlations between features, and
understanding the imbalance between spam and non-spam classes. EDA helps in making
informed decisions during preprocessing and feature selection.
Handling Imbalanced Data: Imbalanced datasets, where one class (e.g., spam) significantly
outweighs the other, are common in spam classification tasks. Techniques such as
oversampling, undersampling, or using algorithms like Synthetic Minority Over-sampling
Technique (SMOTE) can address class imbalances, ensuring the model learns from both
classes effectively.
Feature Engineering: Feature engineering involves creating new features from existing ones
or transforming features to improve model performance. For email classification, potential
features could include word frequencies, presence of specific keywords, email length, sender
information, and more. Effective feature engineering enhances the model's ability to capture
relevant information for classification.
Model Selection: While KNN and Decision Tree models are used in this project, exploring
other algorithms such as Random Forests, Support Vector Machines (SVM), or neural
24
SCAS SPAM CLASSIFIER
networks could potentially yield better performance. Comparing multiple models and
selecting the most suitable one based on performance metrics is crucial for achieving high
accuracy in spam classification.
Error Analysis: Understanding the types of errors made by the model (e.g., false positives,
false negatives) through error analysis helps in identifying areas for improvement. Analyzing
misclassified examples can provide valuable insights into the limitations of the model and
potential avenues for further refinement.
By incorporating these additional considerations into the input design process, the spam filter
project can enhance its effectiveness, reliability, and adaptability, ultimately leading to more
accurate email classification and improved user experience.
25
SCAS SPAM CLASSIFIER
Window Title and Layout: The Tkinter window is designed with an intuitive layout to
enhance user experience. The title "Spam Filter Evaluation" is prominently displayed at the
top of the window, indicating the purpose of the application. Below the title, the layout
includes labels and a text box for displaying the evaluation results.
Labels for Dataset Information: Two labels are included to provide essential information
about the dataset being evaluated. The "Number of Spam Messages" label displays the total
count of spam messages in the dataset, while the "Number of Non-Spam Messages" label
shows the count of non-spam messages. These labels help users understand the composition
of the dataset and its distribution between spam and non-spam categories.
Text Box for Evaluation Results: A scrolled text widget is incorporated to present the
evaluation results in a structured format. The text box dynamically updates with the
evaluation metrics for each classification model (KNN and Decision Tree). The results
include:
Accuracy: The percentage of correctly classified instances out of the total instances.
Precision: The ratio of true positive instances to the sum of true positive and false positive
instances, indicating the model's ability to correctly classify positive instances.
Recall: The ratio of true positive instances to the sum of true positive and false negative
instances, representing the model's ability to identify all relevant instances.
F1-score: The harmonic mean of precision and recall, providing a balance between the two
metrics.
Confusion Matrix: A table displaying the counts of true positive, true negative, false positive,
and false negative instances, facilitating a deeper understanding of the model's performance.
26
SCAS SPAM CLASSIFIER
For the spam filter evaluation project, the focus is primarily on evaluating machine learning
models rather than database management. However, a database design could still be relevant
if you plan to incorporate features like data logging, user authentication, or storing evaluation
results for future reference. Here's a basic outline of a database design tailored to support
such functionalities:
- User Table: This table stores information about registered users, including their username,
password (hashed for security), email address, and any other relevant user details.
- Log Table: If you want to log information about model evaluations or user interactions
with the application, you can create a log table. It may include fields like timestamp, user ID
(if applicable), action performed, and any additional metadata.
- Evaluation Results Table: This table stores the evaluation results obtained from running
machine learning models on different datasets. It may include fields like dataset name, model
name, evaluation metrics (accuracy, precision, recall, F1-score), confusion matrix, timestamp,
and any other relevant information.
- Dataset Table: If you plan to manage multiple datasets within the application, you can
create a dataset table to store information about each dataset. Fields may include dataset
name, description, source URL, upload date, and any other metadata.
27
SCAS SPAM CLASSIFIER
- Model Table: Similarly, if you want to manage multiple machine learning models, you
can create a model table. Fields may include model name, description, algorithm used,
hyperparameters, training duration, and any other relevant information.
6. Relationships:
- If necessary, establish relationships between tables using foreign keys to maintain data
integrity. For example, the Evaluation Results Table may have foreign keys referencing the
User Table (if user-specific evaluations are logged) and the Dataset Table (to associate
evaluation results with specific datasets).
- Implement constraints (e.g., unique constraints, foreign key constraints) to enforce data
integrity and consistency.
- Implement mechanisms for regular data backup to prevent data loss in case of system
failures or accidental deletions.
It's essential to tailor the database design to the specific requirements and functionalities of
your project. Consider factors such as data volume, access patterns, and security
considerations when designing the database schema. Additionally, ensure compliance with
relevant privacy regulations when handling sensitive user data.
28
SCAS SPAM CLASSIFIER
System development for the spam filter evaluation project encompasses several key phases
aimed at creating a robust and effective solution. Initially, the process begins with a thorough
requirement analysis to understand the project's scope and user needs. This involves
gathering both functional and non-functional requirements to guide the subsequent
development stages. Following this, the design phase entails structuring the system
architecture and creating detailed specifications for each module or component. Designing
the user interface is also crucial during this phase to ensure an intuitive and user-friendly
experience.
Once the design is finalized, the development phase kicks off, involving the actual
implementation of the system components. This includes writing scripts or modules for data
preprocessing, feature extraction, and model training, utilizing libraries like scikit-learn for
machine learning model implementation. Integrating these models into the user interface
allows for seamless interaction and display of evaluation results. Throughout development,
rigorous testing is conducted, including unit testing, integration testing, and system testing, to
ensure the system's correctness and compliance with requirements.
Upon successful testing, the deployment phase involves preparing the system for deployment
on the intended platform, whether it be a web server or a desktop application. Configuration
of server-side components and databases, along with final testing in the production
environment, ensures smooth deployment and operation. Post-deployment, ongoing
maintenance and support are crucial for monitoring the system's performance, addressing any
issues promptly, and implementing updates or enhancements based on user feedback and
evolving requirements. Adherence to best practices in software engineering, such as version
control and documentation, ensures the system's reliability, scalability, and security
throughout its lifecycle.
29
SCAS SPAM CLASSIFIER
In addition to the core phases of system development, several other critical aspects contribute
to the success of the spam filter evaluation project. Firstly, acquiring and preprocessing the
spam email dataset is essential, involving tasks like data cleaning and feature extraction.
Feature engineering plays a crucial role in identifying relevant features that distinguish
between spam and non-spam emails, employing techniques such as TF-IDF for text data.
Model selection and tuning involve experimenting with various machine learning algorithms
and hyperparameter optimization to identify the most effective model for classification.
Evaluation metrics like accuracy, precision, recall, and F1-score are essential for assessing
model performance accurately.
Designing an intuitive user interface facilitates interaction with the system, while scalability
and performance optimization ensure efficient handling of large datasets and user demands.
Security measures, including data encryption and user authentication, protect sensitive
information. Comprehensive documentation and training materials aid users and developers
in understanding and utilizing the system effectively. Addressing these aspects ensures the
development of a robust, reliable, and user-friendly spam filter evaluation system.
In addition to the core development phases, there are several supplementary components vital
to the success of the spam filter evaluation project. One crucial aspect is the continuous
monitoring and updating of the system to adapt to evolving spamming techniques and
patterns. Regular updates to the dataset used for training and testing the models ensure that
the system remains effective against new spam threats. Moreover, implementing feedback
mechanisms allows users to report misclassified emails, contributing to the refinement of the
classification models over time.
Furthermore, integration with external APIs or services for email handling and analysis can
enhance the system's capabilities, such as real-time email classification and automatic spam
filtering. Additionally, incorporating advanced features like natural language processing
(NLP) for semantic analysis of email content can improve the accuracy of spam detection.
30
SCAS SPAM CLASSIFIER
The spam filter evaluation project encompasses multiple interconnected modules, each
playing a crucial role in the system's functionality and effectiveness.
At the core of the system lies the data loading and preprocessing module. This module is
responsible for fetching the spam dataset from the UCI Machine Learning Repository using a
specified URL. Upon retrieval, the data is structured into a DataFrame using the Pandas
library, where appropriate column names are assigned. Additionally, this module handles any
missing values and performs essential preprocessing steps, such as encoding categorical
variables and scaling numerical features, to ensure the dataset is suitable for model training.
Following data preprocessing, the system proceeds to train and evaluate two classification
models: K-Nearest Neighbors (KNN) and Decision Tree. The scikit-learn library facilitates
model training by providing implementations of these algorithms. The dataset is split into
training and testing sets using the train_test_split function, with a specified test size and
random seed for reproducibility. Each model is then trained on the training data and
evaluated using various performance metrics, including accuracy, precision, recall, F1-score,
and confusion matrix. These metrics provide insights into the models' effectiveness in
distinguishing between spam and non-spam messages.
Prior to model training, the dataset undergoes feature scaling to normalize the numerical
features' values. The StandardScaler class from scikit-learn is employed to scale the features
to a mean of 0 and a standard deviation of 1. This preprocessing step prevents features with
larger magnitudes from dominating the model's learning process, ensuring fair and unbiased
model training.
31
SCAS SPAM CLASSIFIER
The user interface (UI) module employs the Tkinter library to create a graphical user
interface (GUI) for the spam filter evaluation system. The UI provides an intuitive platform
for users to interact with the system, displaying essential information such as the number of
spam and non-spam messages in the dataset. Additionally, a scrolled text box presents the
evaluation results of the trained models, allowing users to assess their performance
comprehensively.
Although not explicitly depicted in the provided code snippet, a data analysis and
visualization module could be incorporated to further explore the dataset's characteristics and
visualize the model's performance. This module may include tasks such as feature
distribution analysis, correlation examination, and generation of visualizations such as
histograms, scatter plots, or ROC curves to aid in model interpretation and decision-making.
Together, these modules form a robust spam filter evaluation system capable of loading,
preprocessing, training, and evaluating classification models to assess their effectiveness in
identifying spam messages. The user-friendly interface enhances usability, while the
incorporation of feature scaling ensures fair model training. Additionally, the system's
modular design allows for flexibility and scalability, facilitating future enhancements and
modifications to accommodate evolving requirements.
32
SCAS SPAM CLASSIFIER
33
SCAS SPAM CLASSIFIER
The testing and implementation phase of the spam filter evaluation project involves rigorous
validation of the system's functionality and performance, followed by its deployment for
practical use.
Unit Testing: The system undergoes comprehensive unit testing to verify the correctness of
individual modules and functions. Test cases are designed to cover various scenarios,
including edge cases and typical user interactions. Automated testing frameworks such as
pytest or unittest are employed to streamline the testing process and ensure robustness.
Integration Testing: Once individual modules are validated, integration testing is conducted
to assess the interoperability and compatibility of different components. Integration tests
verify that modules communicate effectively and function as expected when integrated into
the system. This phase also includes testing user interface interactions and data flow between
modules.
Validation and Performance Testing: The trained models undergo validation testing to
evaluate their accuracy, precision, recall, F1-score, and other performance metrics. Validation
datasets, distinct from the training and testing sets, are used to assess the models'
generalization capabilities and identify any overfitting or underfitting issues. Performance
testing involves stress testing the system under various load conditions to ensure it can handle
multiple user interactions concurrently without performance degradation.
User Acceptance Testing (UAT): UAT involves inviting end-users or stakeholders to interact
with the system and provide feedback on its usability, intuitiveness, and effectiveness.
Testers simulate real-world usage scenarios to validate whether the system meets their
requirements and expectations. Any issues or suggestions raised during UAT are addressed
and incorporated into system refinements.
34
SCAS SPAM CLASSIFIER
Deployment and Rollout: Upon successful testing and validation, the spam filter evaluation
system is deployed for practical use. Deployment may involve hosting the system on a server
or cloud platform accessible to users via web or desktop interfaces. The rollout process
includes notifying users of the system's availability, providing necessary training and
documentation, and ensuring smooth transition from previous tools or processes.
Implementation:
Obtain the spam email dataset from a reliable source, such as the UCI Machine Learning
Repository.
Preprocess the dataset to handle missing values, encode categorical variables, and normalize
numerical features.
Implement machine learning models such as K-Nearest Neighbors (KNN) and Decision
Trees for spam classification using libraries like scikit-learn.
Split the dataset into training and testing sets to train the models and evaluate their
performance.
35
SCAS SPAM CLASSIFIER
Use performance metrics like accuracy, precision, recall, and F1-score to assess the models'
effectiveness.
Develop a graphical user interface (GUI) using the tkinter library in Python to provide an
interactive platform for users.
Design intuitive input fields for users to input email attributes and select classification
models.
Include output panels to display classification results, performance metrics, and confusion
matrices.
Integrate the trained machine learning models with the GUI to enable users to select models
and classify emails in real-time.
Implement functionality to preprocess user input, scale features, and pass them to the selected
model for prediction.
Display classification results and performance metrics in the GUI output panels for user
interpretation.
Conduct thorough testing of the integrated system to identify and fix any bugs or errors.
Test the system's functionality under different scenarios, including varying input data and
model selections.
Debug any issues related to data processing, model prediction, or GUI interaction to ensure
the system operates smoothly.
36
SCAS SPAM CLASSIFIER
Document the implementation details, including data preprocessing steps, model training
parameters, and GUI design considerations.
Create user guides and documentation to help users understand how to interact with the
system, input email attributes, and interpret classification results.
Provide instructions for troubleshooting common issues and contacting support for
assistance.
Deploy the implemented system on a suitable platform, such as a local machine or a web
server accessible to users.
Monitor the system's performance and user feedback to identify areas for improvement and
implement updates as needed.
Provide ongoing maintenance and support to ensure the system remains functional and
effective in classifying spam emails.
37
SCAS SPAM CLASSIFIER
CONCLUSION
38
SCAS SPAM CLASSIFIER
5. CONCLUSION
In conclusion, the spam filter evaluation project presents a comprehensive solution to classify
emails as spam or non-spam using machine learning techniques. Through the implementation
of K-Nearest Neighbors (KNN) and Decision Tree models, coupled with a user-friendly
graphical interface developed with tkinter, users can interactively input email attributes and
receive real-time classification results.
The project's implementation involved various stages, including data acquisition and
preprocessing, model training and evaluation, GUI development, integration of models with
the GUI, testing and debugging, documentation creation, and deployment. Each step was
meticulously executed to ensure the system's functionality, accuracy, and user-friendliness.
Through thorough testing and validation, the implemented system demonstrates robust
performance in accurately classifying spam and non-spam emails. Users can rely on the
system to efficiently filter out unwanted spam emails, thereby enhancing productivity and
reducing the risk of falling victim to phishing scams or malicious content.
Overall, the spam filter evaluation project not only serves as a practical tool for email
classification but also showcases the power of machine learning in addressing real-world
problems. With continued maintenance and support, the project stands ready to serve users in
their email filtering needs, contributing to a safer and more efficient digital communication
environment.
39
SCAS SPAM CLASSIFIER
BIBLIOGRAPHY
40
SCAS SPAM CLASSIFIER
5.1 BIBLIOGRAPHY
Graham, P., Robinson, R., & Hickey, T. (2003). SpamAssassin. Retrieved
from http://spamassassin.apache.org/
Scikit-learn: Machine Learning in Python. (n.d.). Retrieved from https://scikit-
learn.org/stable/index.html
UCI Machine Learning Repository: Spambase Data Set. (n.d.). Retrieved
from https://archive.ics.uci.edu/ml/datasets/spambase
Tkinter GUI toolkit documentation. (n.d.). Retrieved from
https://docs.python.org/3/library/tkinter.html
Python Standard Library. (n.d.). Retrieved from https://docs.python.org/3/library/
"Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow" by Aurélien
Géron: This book covers machine learning concepts and practical implementations using
popular Python libraries like Scikit-Learn.
"Python Data Science Handbook" by Jake VanderPlas: It's a comprehensive guide to data
science and machine learning in Python, covering essential libraries such as NumPy, Pandas,
Matplotlib, Scikit-Learn, and more.
41
SCAS SPAM CLASSIFIER
42
SCAS SPAM CLASSIFIER
5.2 APPENDICES
A. DATA FLOW DIAGRAM
43
SCAS SPAM CLASSIFIER
44
SCAS SPAM CLASSIFIER
TABLE STRUCTURE
45
SCAS SPAM CLASSIFIER
TABLE STRUCTURE :
1. Data Preprocessing:
- Load dataset
2. Model Building:
3. Model Evaluation:
- Calculate evaluation metrics (accuracy, precision, recall, F1-score, confusion matrix) for
each model
4. Display Results:
- Display evaluation results (metrics and confusion matrix) for each model
46
SCAS SPAM CLASSIFIER
- Add labels and text boxes to display spam and non-spam message counts
6. Main Functionality:
- Incorporate the functionality to execute the model evaluation and display the results in the
GUI
This table structure outlines the main components and functionalities of the program, helping
to organize the code and ensure clarity and readability.
47
SCAS SPAM CLASSIFIER
SAMPLE CODING
48
SCAS SPAM CLASSIFIER
SAMPLE CODING:
import tkinter as tk
from tkinter import scrolledtext
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score,
confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
import pandas as pd
# KNN Model
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)
# Model Evaluation
models = {"KNN": knn, "Decision Tree": tree}
output_text = ""
49
SCAS SPAM CLASSIFIER
50
SCAS SPAM CLASSIFIER
OUTPUT:
51
SCAS SPAM CLASSIFIER
SPAM DATASET:
ham Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine
there got amore wat...
ham Ok lar... Joking wif u oni...
spam Free entry in 2 a wkly comp to win FA Cup final tkts 21st May 2005. Text FA to
87121 to receive entry question(std txt rate)T&C's apply 08452810075over18's
ham U dun say so early hor... U c already then say...
ham Nah I don't think he goes to usf, he lives around here though
spam FreeMsg Hey there darling it's been 3 week's now and no word back! I'd like some
fun you up for it still? Tb ok! XxX std chgs to send, £1.50 to rcv
ham Even my brother is not like to speak with me. They treat me like aids patent.
ham As per your request 'Melle Melle (Oru Minnaminunginte Nurungu Vettam)' has been
set as your callertune for all Callers. Press *9 to copy your friends Callertune
spam WINNER!! As a valued network customer you have been selected to receivea £900
prize reward! To claim call 09061701461. Claim code KL341. Valid 12 hours only.
spam Had your mobile 11 months or more? U R entitled to Update to the latest colour
mobiles with camera for Free! Call The Mobile Update Co FREE on 08002986030
ham I'm gonna be home soon and i don't want to talk about this stuff anymore tonight, k?
I've cried enough today.
spam SIX chances to win CASH! From 100 to 20,000 pounds txt> CSH11 and send to
87575. Cost 150p/day, 6days, 16+ TsandCs apply Reply HL 4 info
spam URGENT! You have won a 1 week FREE membership in our £100,000 Prize
Jackpot! Txt the word: CLAIM to No: 81010 T&C www.dbuk.net LCCLTD POBOX
4403LDNW1A7RW18
ham I've been searching for the right words to thank you for this breather. I promise i wont
take your help for granted and will fulfil my promise. You have been wonderful and a
blessing at all times.
ham I HAVE A DATE ON SUNDAY WITH WILL!!
spam XXXMobileMovieClub: To use your credit, click the WAP link in the next txt
message or click here>> http://wap. xxxmobilemovieclub.com?n=QJKGIGHJJGCBL
ham Oh k...i'm watching here:)
ham Eh u remember how 2 spell his name... Yes i did. He v naughty make until i v wet.
ham Fine if that’s the way u feel. That’s the way its gota b
52
SCAS SPAM CLASSIFIER
spam England v Macedonia - dont miss the goals/team news. Txt ur national team to 87077
eg ENGLAND to 87077 Try:WALES, SCOTLAND 4txt/ú1.20 POBOXox36504W45WQ
16+
ham Is that seriously how you spell his name?
ham I‘m going to try for 2 months ha ha only joking
ham So ü pay first lar... Then when is da stock comin...
ham Aft i finish my lunch then i go str down lor. Ard 3 smth lor. U finish ur lunch already?
ham Ffffffffff. Alright no way I can meet up with you sooner?
ham Just forced myself to eat a slice. I'm really not hungry tho. This sucks. Mark is getting
worried. He knows I'm sick when I turn down pizza. Lol
ham Lol your always so convincing.
ham Did you catch the bus ? Are you frying an egg ? Did you make a tea? Are you eating
your mom's left over dinner ? Do you feel my Love ?
ham I'm back & we're packing the car now, I'll let you know if there's room
ham Ahhh. Work. I vaguely remember that! What does it feel like? Lol
ham Wait that's still not all that clear, were you not sure about me being sarcastic or that
that's why x doesn't want to live with us
ham Yeah he got in at 2 and was v apologetic. n had fallen out and she was actin like spoilt
child and he got caught up in that. Till 2! But we won't go there! Not doing too badly cheers.
You?
ham K tell me anything about you.
ham For fear of fainting with the of all that housework you just did? Quick have a cuppa
spam Thanks for your subscription to Ringtone UK your mobile will be charged £5/month
Please confirm by replying YES or NO. If you reply NO you will not be charged
ham Yup... Ok i go home look at the timings then i msg ü again... Xuhui going to learn on
2nd may too but her lesson is at 8am
ham Oops, I'll let you know when my roommate's done
ham I see the letter B on my car
ham Anything lor... U decide...
ham Hello! How's you and how did saturday go? I was just texting to see if you'd decided
to do anything tomo. Not that i'm trying to invite myself or anything!
ham Pls go ahead with watts. I just wanted to be sure. Do have a great weekend. Abiola
ham Did I forget to tell you ? I want you , I need you, I crave you ... But most of all ... I
love you my sweet Arabian steed ... Mmmmmm ... Yummy
53
SCAS SPAM CLASSIFIER
spam 07732584351 - Rodger Burns - MSG = We tried to call you re your reply to our sms
for a free nokia mobile + free camcorder. Please call now 08000930705 for delivery
tomorrow
ham WHO ARE YOU SEEING?
ham Great! I hope you like your man well endowed. I am <#> inches...
ham No calls..messages..missed calls
ham Didn't you get hep b immunisation in nigeria.
ham Fair enough, anything going on?
ham Yeah hopefully, if tyler can't do it I could maybe ask around a bit
ham U don't know how stubborn I am. I didn't even want to go to the hospital. I kept
telling Mark I'm not a weak sucker. Hospitals are for weak suckers.
ham What you thinked about me. First time you saw me in class.
ham A gram usually runs like <#> , a half eighth is smarter though and gets you
almost a whole second gram for <#>
ham K fyi x has a ride early tomorrow morning but he's crashing at our place tonight
ham Wow. I never realized that you were so embarassed by your accomodations. I thought
you liked it, since i was doing the best i could and you always seemed so happy about "the
cave". I'm sorry I didn't and don't have more to give. I'm sorry i offered. I'm sorry your room
was so embarassing.
spam SMS. ac Sptv: The New Jersey Devils and the Detroit Red Wings play Ice Hockey.
Correct or Incorrect? End? Reply END SPTV
ham Do you know what Mallika Sherawat did yesterday? Find out now @ <URL>
spam Congrats! 1 year special cinema pass for 2 is yours. call 09061209465 now! C
Suprman V, Matrix3, StarWars3, etc all 4 FREE! bx420-ip4-5we. 150pm. Dont miss out!
ham Sorry, I'll call later in meeting.
ham Tell where you reached
ham Yes..gauti and sehwag out of odi series.
ham Your gonna have to pick up a $1 burger for yourself on your way home. I can't even
move. Pain is killing me.
ham Ha ha ha good joke. Girls are situation seekers.
ham Its a part of checking IQ
ham Sorry my roommates took forever, it ok if I come by now?
ham Ok lar i double check wif da hair dresser already he said wun cut v short. He said will
cut until i look nice.
54
SCAS SPAM CLASSIFIER
spam As a valued customer, I am pleased to advise you that following recent review of your
Mob No. you are awarded with a £1500 Bonus Prize, call 09066364589
ham Today is "song dedicated day.." Which song will u dedicate for me? Send this to all ur
valuable frnds but first rply me...
spam Urgent UR awarded a complimentary trip to EuroDisinc Trav, Aco&Entry41 Or
£1000. To claim txt DIS to 87121 18+6*£1.50(moreFrmMob. ShrAcomOrSglSuplt)10, LS1
3AJ
spam Did you hear about the new "Divorce Barbie"? It comes with all of Ken's stuff!
ham I plane to give on this month end.
ham Wah lucky man... Then can save money... Hee...
ham Finished class where are you.
ham HI BABE IM AT HOME NOW WANNA DO SOMETHING? XX
ham K..k:)where are you?how did you performed?
ham U can call me now...
ham I am waiting machan. Call me once you free.
ham Thats cool. i am a gentleman and will treat you with dignity and respect.
ham I like you peoples very much:) but am very shy pa.
ham Does not operate after <#> or what
ham Its not the same here. Still looking for a job. How much do Ta's earn there.
ham Sorry, I'll call later
ham K. Did you call me just now ah?
ham Ok i am on the way to home hi hi
ham You will be in the place of that man
ham Yup next stop.
ham I call you later, don't have network. If urgnt, sms me.
ham For real when u getting on yo? I only need 2 more tickets and one more jacket and I'm
done. I already used all my multis.
ham Yes I started to send requests to make it but pain came back so I'm back in bed.
Double coins at the factory too. I gotta cash in all my nitros.
ham I'm really not up to it still tonight babe
ham Ela kano.,il download, come wen ur free..
ham Yeah do! Don‘t stand to close tho- you‘ll catch something!
55
SCAS SPAM CLASSIFIER
ham Sorry to be a pain. Is it ok if we meet another night? I spent late afternoon in casualty
and that means i haven't done any of y stuff42moro and that includes all my time sheets and
that. Sorry.
ham Smile in Pleasure Smile in Pain Smile when trouble pours like Rain Smile when sum1
Hurts U Smile becoz SOMEONE still Loves to see u Smiling!!
spam Please call our customer service representative on 0800 169 6031 between 10am-9pm
as you have WON a guaranteed £1000 cash or £5000 prize!
ham Havent planning to buy later. I check already lido only got 530 show in e afternoon. U
finish work already?
spam Your free ringtone is waiting to be collected. Simply text the password "MIX" to
85069 to verify. Get Usher and Britney. FML, PO Box 5249, MK17 92H. 450Ppw 16
ham Watching telugu movie..wat abt u?
ham i see. When we finish we have loads of loans to pay
ham Hi. Wk been ok - on hols now! Yes on for a bit of a run. Forgot that i have
hairdressers appointment at four so need to get home n shower beforehand. Does that cause
prob for u?"
ham I see a cup of coffee animation
ham Please don't text me anymore. I have nothing else to say.
ham Okay name ur price as long as its legal! Wen can I pick them up? Y u ave x ams xx
ham I'm still looking for a car to buy. And have not gone 4the driving test yet.
ham As per your request 'Melle Melle (Oru Minnaminunginte Nurungu Vettam)' has been
set as your callertune for all Callers. Press *9 to copy your friends Callertune
ham wow. You're right! I didn't mean to do that. I guess once i gave up on boston men and
changed my search location to nyc, something changed. Cuz on my signin page it still says
boston.
ham Umma my life and vava umma love you lot dear
ham Thanks a lot for your wishes on my birthday. Thanks you for making my birthday
truly memorable.
ham Aight, I'll hit you up when I get some cash
ham How would my ip address test that considering my computer isn't a minecraft server
ham I know! Grumpy old people. My mom was like you better not be lying. Then again I
am always the one to play jokes...
ham Dont worry. I guess he's busy.
ham What is the plural of the noun research?
56
SCAS SPAM CLASSIFIER
57
SCAS SPAM CLASSIFIER
58
SCAS SPAM CLASSIFIER
59
SCAS SPAM CLASSIFIER
ham Hi! You just spoke to MANEESHA V. We'd like to know if you were satisfied with
the experience. Reply Toll Free with Yes or No.
ham You lifted my hopes with the offer of money. I am in need. Especially when the end
of the month approaches and it hurts my studying. Anyways have a gr8 weekend
ham Lol no. U can trust me.
ham ok. I am a gentleman and will treat you with dignity and respect.
ham He will, you guys close?
ham Going on nothing great.bye
ham Hello handsome ! Are you finding that job ? Not being lazy ? Working towards
getting back that net for mummy ? Where's my boytoy now ? Does he miss me ?
ham Haha awesome, be there in a minute
spam Please call our customer service representative on FREEPHONE 0808 145 4742
between 9am-11pm as you have WON a guaranteed £1000 cash or £5000 prize!
ham Have you got Xmas radio times. If not i will get it now
ham I jus reached home. I go bathe first. But my sis using net tell u when she finishes k...
spam Are you unique enough? Find out from 30th August. www.areyouunique.co.uk
ham I'm sorry. I've joined the league of people that dont keep in touch. You mean a great
deal to me. You have been a friend at all times even at great personal cost. Do have a great
week.|
ham Hi :)finally i completed the course:)
ham It will stop on itself. I however suggest she stays with someone that will be able to
give ors for every stool.
ham How are you doing? Hope you've settled in for the new school year. Just wishin you a
gr8 day
ham Gud mrng dear hav a nice day
ham Did u got that persons story
ham is your hamster dead? Hey so tmr i meet you at 1pm orchard mrt?
ham Hi its Kate how is your evening? I hope i can see you tomorrow for a bit but i have to
bloody babyjontet! Txt back if u can. :) xxx
ham Found it, ENC <#> , where you at?
ham I sent you <#> bucks
ham Hello darlin ive finished college now so txt me when u finish if u can love Kate xxx
60
SCAS SPAM CLASSIFIER
ham Your account has been refilled successfully by INR <DECIMAL> . Your
KeralaCircle prepaid account balance is Rs <DECIMAL> . Your Transaction ID is KR
<#> .
ham Goodmorning sleeping ga.
ham U call me alter at 11 ok.
ham Ü say until like dat i dun buy ericsson oso cannot oredi lar...
ham As I entered my cabin my PA said, '' Happy B'day Boss !!''. I felt special. She askd me
4 lunch. After lunch she invited me to her apartment. We went there.
61