0% found this document useful (0 votes)

118 views21 pages

Project Report Template AICTE Internship 2025

The document presents a project report on an SMS Spam Detection System utilizing Natural Language Processing (NLP) and machine learning techniques to classify messages as spam or legitimate. The system employs various algorithms, including Naive Bayes and Support Vector Machines, and emphasizes the importance of preprocessing and feature extraction for effective spam detection. Future enhancements may involve deep learning techniques and real-time deployment to improve scalability and performance.

Uploaded by

9231kumarsandesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

118 views21 pages

Project Report Template AICTE Internship 2025

Uploaded by

9231kumarsandesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 21

SMS Spam Detection System Using NLP

A Project Report

submitted in partial fulfillment of the requirements

AICTE Internship on AI: Transformative Learning

with
TechSaksham – A joint CSR initiative of Microsoft & SAP

Sandesh kumar, [email protected]

Under the Guidance of

Abdul Aziz Md
Master trainer, Edunet Foundation
ACKNOWLEDGEMENT

We would like to extend our heartfelt gratitude to everyone who contributed, directly or
indirectly, to the successful completion of this thesis. First and foremost, we express our sincere
thanks to our supervisor, Abdul Aziz Md, for his exceptional mentorship and invaluable guidance.
His advice, encouragement, and constructive feedback have been a constant source of inspiration
and innovation throughout this project. The trust he placed in us greatly motivated and
empowered us to succeed.

Working with him over the past year has been an honor. His unwavering support not only
enriched our project but also provided insights that enhanced our understanding of the
program as a whole. His guidance has not only shaped this work but has also played a
significant role in helping us grow into better professionals and individuals.
ABSTRACT
The SMS Spam Detection System using Natural Language Processing (NLP) tackles the
persistent issue of spam messages, which disrupt user communication and pose potential
security risks. The project aims to develop an efficient and reliable system capable of
accurately classifying SMS messages as either spam or legitimate (ham). By leveraging
NLP techniques and machine learning models, the system addresses the challenges of text-
based spam detection, such as diverse language patterns, informal text, and contextual
ambiguity.

The methodology involves a structured pipeline, starting with the collection of a labeled
dataset containing both spam and ham SMS messages. The raw data undergoes
preprocessing steps, including case normalization, removal of stop words, special
characters, and irrelevant text, as well as tokenization and stemming. Feature extraction is
performed using Term Frequency-Inverse Document Frequency (TF-IDF) to transform text
into numerical representations suitable for machine learning models. Several classification
algorithms, including Naive Bayes, Logistic Regression, and Support Vector Machines
(SVM), are implemented and evaluated based on performance metrics such as accuracy,
precision, recall, and F1-score.

Experimental results demonstrate that the system achieves a high level of accuracy in
detecting spam messages, with the Naive Bayes classifier performing the best due to its
simplicity and effectiveness in text classification tasks. The project highlights the
importance of thorough preprocessing and appropriate feature engineering in improving
the performance of text-based machine learning models.

In conclusion, the SMS Spam Detection System provides a practical and effective
solution for mitigating the impact of spam messages, thereby enhancing user
communication and security. The system's robustness and high accuracy demonstrate its
potential for real-world applications. Future improvements could include the incorporation
of advanced deep learning techniques, such as recurrent neural networks (RNNs) or
transformers, to handle more complex text structures and improve scalability. Additionally,
real-time deployment could further extend the system's utility in preventing spam across
various communication platforms.
TABLE OF CONTENT

Abstract ...............................................................................................................I

Chapter 1. Introduction.........................................................................................1
1.1 Problem Statement ...............................................................................1
1.2 Motivation.............................................................................................1
1.3 Objectives..............................................................................................2
1.4. Scope of the Project.............................................................................2
Chapter 2. Literature Survey................................................................................3
Chapter 3. Proposed Methodology.........................................................................
Chapter 4. Implementation and Results ................................................................
Chapter 5. Discussion and Conclusion ..................................................................
References......................................................................................................................
CHAPTER 1
Introduction

1.1Problem Statement:
The problem addressed by this project is the pervasive issue of spam messages in
SMS communication. Spam messages are unsolicited, irrelevant, or fraudulent
messages sent to users, often with malicious intent, such as phishing scams,
deceptive advertisements, or attempts to spread malware. These messages disrupt
communication, waste user time, and can lead to significant financial and personal
losses if users fall victim to fraudulent schemes.
Significance of the Problem
The widespread use of SMS for personal, professional, and transactional
communication makes it a critical medium for information exchange. However, the
increasing volume of spam messages undermines its reliability and trustworthiness.
According to studies, spam messages account for a significant portion of global
SMS traffic, posing several challenges:
1. User Experience: Spam messages clutter inboxes, leading to frustration and
reduced productivity for users who must manually filter and delete unwanted
messages.
2. Security Risks: Many spam messages contain malicious links or fraudulent
requests designed to deceive users, exposing them to identity theft, financial fraud,
and data breaches.
3. Economic Impact: Organizations face financial losses due to phishing attacks and
additional costs associated with mitigating spam-related threats.
4. Scalability Challenges: With the growing adoption of SMS services in banking, e-
commerce, and other industries, the need for scalable and reliable spam detection
systems has become increasingly critical.

1.2Motivation:
This project was chosen due to the increasing prevalence of spam messages in SMS
communication and the challenges they pose to individuals, businesses, and
society. With SMS being a widely used medium for exchanging personal,
transactional, and promotional information, the growing volume of spam messages
undermines its reliability, causing inconvenience and security risks. By leveraging
advancements in Natural Language Processing (NLP) and machine learning, this
project offers a valuable opportunity to address a real-world problem while gaining
practical insights into text analytics and classification tasks.
Furthermore, spam detection is a fundamental problem in the field of cybersecurity
and data science. The project allows exploration of key concepts such as data

pg. 1
preprocessing, feature extraction, and algorithm selection while contributing to
developing a solution with practical implications.
Potential Applications
1. Telecommunication Providers: Integration of the spam detection system into
SMS gateways can help telecom companies filter spam messages before they reach
users.
2. Mobile Applications: Messaging apps and mobile operating systems can use the
system to automatically classify and filter SMS messages, enhancing user
experience.
3. Banking and E-commerce: Businesses in these sectors can utilize the system to
protect users from phishing and fraudulent messages.
4. Regulatory Compliance: The system can assist organizations in adhering to anti-
spam regulations and maintaining customer trust.
5. Research and Development: The project can serve as a foundation for future
studies in text classification, NLP, and advanced spam detection techniques using
deep learning.

1.3Objective:

 Develop a Robust Classification System

To design and implement an SMS spam detection system capable of accurately
classifying messages as spam or legitimate (ham) using Natural Language Processing
(NLP) and machine learning techniques.
 Improve Accuracy and Efficiency
To achieve high accuracy, precision, and recall in detecting spam messages while
ensuring the system is computationally efficient and scalable.
 Utilize NLP Techniques
To apply effective NLP techniques such as text preprocessing, tokenization, stemming,
and feature extraction (e.g., TF-IDF) to handle diverse and noisy SMS data.
 Evaluate Machine Learning Models
To compare the performance of different machine learning algorithms, including Naive
Bayes, Logistic Regression, and Support Vector Machines, and identify the most
effective model for spam detection.
 Enhance Communication Security
To mitigate the risks associated with spam messages, such as phishing, fraud, and
malware, by providing a reliable filtering mechanism.
 Scalability for Real-World Applications
To develop a system that can be integrated into real-world applications, such as SMS
gateways, messaging apps, and mobile operating systems, ensuring robust spam
filtering for end users.
 Lay the Foundation for Future Work
To establish a baseline for further advancements, including the incorporation of deep
learning techniques and real-time detection capabilities.

1.4Scope of the Project:

pg. 2
1. Spam Detection for SMS Messages
o The system is specifically designed to classify SMS messages into two
categories: spam and legitimate (ham).
o It focuses on text-based analysis and is applicable to datasets containing
short message formats.
2. Natural Language Processing (NLP) Techniques
o Utilizes NLP methods for text preprocessing (e.g., tokenization, stemming,
and stop word removal) and feature extraction (e.g., Term Frequency-
Inverse Document Frequency or TF-IDF).
o Focuses on improving the quality of input data to enhance model
performance.
3. Machine Learning Models
o Implements and evaluates traditional machine learning algorithms such as
Naive Bayes, Logistic Regression, and Support Vector Machines.
o Provides comparative insights into model performance to identify the most
suitable approach for the given problem.
4. Performance Metrics
o Evaluates models based on accuracy, precision, recall, and F1-score to
ensure a balanced assessment of spam detection capabilities.
5. Potential Applications
o The system can be integrated into mobile applications, SMS gateways, and
communication platforms to filter spam and improve user experience.

Limitations of the Project

1. Focus on SMS Messages Only

o The system is tailored for SMS spam detection and may not generalize well
to other forms of communication, such as emails or social media messages,
without further adaptation.
2. Static Dataset

pg. 3
o The system is trained and evaluated on a specific dataset. Variations in
language, regional slang, and message patterns in real-world scenarios may
affect its accuracy.
3. Dependence on Preprocessing
o The effectiveness of the system heavily relies on text preprocessing steps,
which may require adjustments for different datasets or languages.
4. Limited Exploration of Algorithms
o While traditional machine learning algorithms are used, advanced deep
learning models like transformers or recurrent neural networks are not
explored, potentially limiting the system’s ability to handle highly complex
patterns.
5. Scalability and Real-Time Detection
o The current system is not designed for real-time deployment or large-scale
processing, which may limit its application in environments requiring
immediate spam filtering.
6. Lack of Multilingual Support
o The project primarily focuses on messages in English and may not perform
well on datasets containing messages in other languages without additional
preprocessing or training.

pg. 4
CHAPTER 2
Literature Survey

2.1 Review relevant literature or previous work in this domain.

The development of SMS spam detection systems has garnered significant attention
due to the increasing prevalence of spam and its impact on communication channels.
Research in this domain has focused on various approaches, from traditional rule-based
systems to modern machine learning and NLP techniques. Key contributions and
insights from previous work are outlined below:

Rule-Based Systems
Early spam detection systems primarily relied on manually crafted rules to identify
patterns indicative of spam, such as the presence of certain keywords, phrases, or
formatting (e.g., excessive use of capital letters or exclamation marks). While effective
to some extent, these systems were limited by their inability to adapt to evolving spam
tactics.

Machine Learning Approaches

Machine learning has revolutionized spam detection by enabling systems to learn from
data and improve their performance over time. Common algorithms used in SMS spam
detection include:

Naive Bayes Classifier

Popular for text classification due to its simplicity and efficiency.

Research (e.g., Almeida et al., 2013) demonstrates that Naive Bayes performs well for
spam detection, given its ability to handle noisy and sparse datasets.
Support Vector Machines (SVM)

Effective for high-dimensional text data. Studies have shown that SVM achieves good
accuracy in SMS spam detection but may require significant computational resources.
Logistic Regression

Widely used for binary classification tasks, with a balance of interpretability and
performance.
Random Forest and Decision Trees

Ensemble methods such as Random Forest improve robustness and handle complex
data patterns.

pg. 5
NLP Techniques
Text preprocessing and feature engineering are critical in SMS spam detection.
Techniques such as tokenization, stemming, lemmatization, and Term Frequency-
Inverse Document Frequency (TF-IDF) have been widely adopted to transform
unstructured text data into meaningful numerical representations.

Deep Learning Approaches

Recent studies have explored deep learning models like Recurrent Neural Networks
(RNNs), Convolutional Neural Networks (CNNs), and transformers (e.g., BERT).
These models excel in capturing contextual and sequential information in text but often
require substantial computational resources and large datasets.

2.2 Mention any existing models, techniques, or methodologies related to the problem.
Several models, techniques, and methodologies have been developed for SMS spam
detection, leveraging advancements in machine learning and Natural Language
Processing (NLP). Key approaches include:

1. Rule-Based Systems
Early spam detection systems relied on predefined rules, such as filtering messages
with specific keywords (e.g., "WIN", "FREE", "OFFER") or patterns like excessive
punctuation or capital letters.
While straightforward, these systems lack flexibility and adaptability to evolving spam
tactics.
2. Traditional Machine Learning Models
Naive Bayes Classifier: Widely used for text classification due to its simplicity and
efficiency in handling sparse data.
Support Vector Machines (SVM): Effective for high-dimensional data, including text,
achieving good performance in binary classification tasks like spam detection.
Logistic Regression: Common for binary classification, offering a balance between
simplicity and predictive power.
K-Nearest Neighbors (KNN) and Random Forests: Occasionally used for spam
detection but less common due to scalability concerns for larger datasets.
3. NLP-Based Techniques
Text Preprocessing: Tokenization, stop-word removal, stemming, lemmatization, and
case normalization are common preprocessing steps to clean and standardize SMS data.
Feature Extraction: Techniques like Bag of Words (BoW) and Term Frequency-Inverse
Document Frequency (TF-IDF) are used to convert text into numerical representations
for model input.
4. Deep Learning Models
Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks:
Effective in capturing sequential and contextual information in text but require
significant computational resources.

pg. 6
Convolutional Neural Networks (CNNs): Used for extracting features from text with
promising results in classification tasks.
Transformers (e.g., BERT): Advanced models capable of understanding context and
semantics in text, achieving state-of-the-art results in many NLP tasks, including spam
detection.
5. Hybrid Models
Combinations of machine learning and deep learning methods have been explored to
leverage the strengths of both approaches, such as using TF-IDF for feature extraction
combined with deep learning models for classification.
2.3 Gaps or Limitations in Existing Solutions and How the Project Addresses Them
1. Limited Adaptability to Real-World Variability
Limitation: Many existing solutions are trained on static datasets and struggle to adapt
to diverse spam patterns, informal language, and evolving spam tactics.
Proposed Solution: The project emphasizes robust preprocessing and feature extraction
to handle noisy and diverse SMS data. A comparative analysis of models ensures the
selection of the most adaptable approach.
2. Lack of Scalability
Limitation: Some machine learning models, such as KNN or Random Forest, are less
scalable for large datasets or real-time applications.
Proposed Solution: The system focuses on lightweight models like Naive Bayes and
Logistic Regression, which are computationally efficient and suitable for real-time
deployment.
3. Insufficient Exploration of NLP Techniques
Limitation: Many solutions rely on basic feature extraction techniques, overlooking the
potential of advanced NLP methods.
Proposed Solution: This project employs techniques such as TF-IDF and explores n-
grams for capturing contextual information, improving the system’s performance.
4. High Computational Requirements of Deep Learning
Limitation: Deep learning models, while effective, are resource-intensive and often
impractical for deployment in low-resource environments.
Proposed Solution: By focusing on traditional machine learning techniques, the project
ensures an optimal balance between accuracy and computational efficiency, making it
feasible for resource-constrained scenarios.
5. Limited Focus on Multilingual or Multidomain Detection
Limitation: Existing models often focus on English-only datasets and may not
generalize to other languages or domains.
Proposed Solution: While this project primarily targets English SMS spam, it
establishes a framework that can be extended to support multilingual datasets with
minimal modifications in preprocessing and training.

2.3 Highlight the gaps or limitations in existing solutions and how your project will address
them.

pg. 7
A variety of models and methodologies have been applied to SMS spam detection,
leveraging advancements in Natural Language Processing (NLP) and machine learning.
Some notable approaches include:

1. Naive Bayes Classifier

A probabilistic algorithm widely used for text classification tasks, including spam
detection.
Strengths: Simple, fast, and effective for datasets with limited size.
Weaknesses: Assumes feature independence, which may not hold for all SMS
messages.
2. Support Vector Machines (SVM)
Effective in high-dimensional text classification problems.
Strengths: Works well with sparse data and can handle non-linear classification using
kernels.
Weaknesses: Computationally expensive for large datasets.
3. Logistic Regression
A linear model used for binary classification tasks, including spam vs. ham
categorization.
Strengths: Easy to interpret and effective for moderately complex patterns.
Weaknesses: Limited when dealing with non-linear relationships.
4. Random Forest and Decision Trees
Decision Tree-based algorithms that perform well for spam detection.
Strengths: Robust to overfitting (in ensemble methods like Random Forest).
Weaknesses: Slower compared to simpler models for text data.
5. Deep Learning Models
Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM): Capture
sequential dependencies in text.
Convolutional Neural Networks (CNNs): Extract spatial features in text.
Transformers (e.g., BERT): Handle complex language patterns using contextual
embeddings.
Strengths: Exceptional accuracy with large datasets.
Weaknesses: Require significant computational resources and data preprocessing.
6. NLP Techniques
Text preprocessing (e.g., tokenization, stemming, lemmatization).
Feature extraction using Term Frequency-Inverse Document Frequency (TF-IDF),
Bag-of-Words (BoW), and word embeddings (e.g., Word2Vec, GloVe).
7. Hybrid Approaches
Combinations of NLP and machine learning or ensemble methods to improve
performance.
Example: Combining Naive Bayes and SVM to leverage complementary strengths.
2.3 Gaps or Limitations in Existing Solutions and How This Project Addresses Them

Identified Gaps and Limitations

pg. 8
Handling Evolving Spam Patterns

Many existing systems struggle with detecting spam messages that use obfuscation
(e.g., deliberate misspellings) or new tactics.
Real-Time Detection

While effective, some models like SVM or deep learning frameworks are
computationally intensive, making real-time deployment challenging.
Multilingual and Diverse Data

Many studies focus on English datasets, leaving non-English or mixed-language

messages underrepresented.
Overfitting on Small Datasets

Deep learning models, while accurate, often require large datasets to avoid overfitting.
Many existing spam datasets are small or static.
Interpretability

Complex models like deep learning lack transparency, making it difficult to understand
why a message is classified as spam.
Deployment Challenges

Few studies address the practical integration of spam detection systems into SMS
gateways or mobile platforms.
How This Project Addresses the Gaps
Adaptive Preprocessing

Employs advanced text preprocessing techniques to handle obfuscation and evolving

spam patterns effectively.
Efficient Models for Real-Time Use

Focuses on lightweight models like Naive Bayes and Logistic Regression, ensuring
computational efficiency while maintaining high accuracy.
Dataset Augmentation

Uses augmentation techniques to simulate diverse spam patterns, improving model

robustness.
Focus on Scalability and Deployment

Designs a system capable of real-time detection and integration into SMS gateways or
mobile applications.
Multilingual Capability

pg. 9
Extends preprocessing and feature extraction techniques to accommodate non-English
messages, making the system versatile across regions.
Balancing Accuracy and Interpretability

Utilizes interpretable models alongside feature importance analysis to provide

transparency in classification decisions.

pg. 10
CHAPTER 3
Proposed Methodology

3.1 System Design

4 Input SMS Data:
4.1 The system starts with a dataset of SMS messages, which includes
both spam and ham (non-spam) messages.
4.2 This dataset is typically labeled, meaning each message is tagged
as either "spam" or "ham."
5 Preprocessing:
5.1 The raw SMS data is preprocessed to make it suitable for NLP
tasks. This step includes:
5.1.1 Tokenization: Splitting the text into individual words or
tokens.
5.1.2 Lemmatization: Reducing words to their base or root form
(e.g., "running" → "run").
5.1.3 Stopword Removal: Removing common words that do not
contribute much to the meaning (e.g., "the," "is," "and").
5.1.4 Lowercasing: Converting all text to lowercase to ensure
uniformity.
6 Feature Extraction:
6.1 After preprocessing, the text data is converted into numerical
features that can be fed into a machine learning model. Common
techniques include:
6.1.1 TF-IDF (Term Frequency-Inverse Document Frequency):
Weighs the importance of words based on their frequency in a
document and across the dataset.
6.1.2 Word Embeddings: Techniques like Word2Vec or GloVe to
represent words in a dense vector space.
6.1.3 Bag of Words (BoW): Represents text as a vector of word
frequencies.
7 Labeled Dataset:
7.1 The preprocessed and feature-extracted data is combined with
labels (spam/ham) to create a labeled dataset.
7.2 This dataset is split into training and testing sets for model
evaluation.
8 Model Training:
8.1 A machine learning model (e.g., Naive Bayes, SVM, Logistic
Regression, or even deep learning models like LSTM) is trained on
the labeled dataset.

pg. 11
8.2 The training process involves learning patterns in the data that
distinguish spam from ham messages.

8.3 Requirement Specification

1. Programming Language

 Python: The most widely used language for NLP and machine learning tasks due
to its rich ecosystem of libraries and frameworks.

2. Natural Language Processing (NLP) Libraries

 NLTK (Natural Language Toolkit): For tokenization, stemming, lemmatization,

and stopword removal.
 SpaCy: For advanced NLP tasks like entity recognition, part-of-speech tagging,
and dependency parsing.
 Gensim: For topic modeling and word embeddings (e.g., Word2Vec).

3. Machine Learning Libraries

 Scikit-learn: For implementing traditional machine learning algorithms (e.g.,

Naive Bayes, SVM, Logistic Regression) and evaluation metrics.
 TensorFlow/Keras: For building and training deep learning models (e.g., LSTM,
GRU).
 PyTorch: An alternative to TensorFlow for deep learning.

4. Feature Extraction Tools

 TF-IDF (Term Frequency-Inverse Document Frequency): Available in Scikit-learn.

 Word Embeddings: Pre-trained embeddings like Word2Vec, GloVe, or FastText.
 Bag of Words (BoW): Available in Scikit-learn.

pg. 12
5. Data Preprocessing and Visualization

 Pandas: For data manipulation and analysis.

 NumPy: For numerical computations.
 Matplotlib/Seaborn: For data visualization and plotting.

6. Model Evaluation and Metrics

 Scikit-learn: Provides tools for calculating accuracy, precision, recall, F1-score, and
confusion matrix.
 Yellowbrick: For visualizing model performance and evaluation metrics.

pg. 13
CHAPTER 5
Discussion and Conclusion

5.1 Future Work:

6 Advanced Models:
6.1 Use deep learning (LSTM, GRu, BERT) or ensemble methods for
better accuracy.
7 Handling Imbalanced Data:
7.1 Apply data augmentation, class weighting, or SMOTE to
address class imbalance.
8 Feature Engineering:
8.1 Add contextual embeddings (e.g., BERT), n-grams, or additional
features like message length.
9 Real-Time Detection:
9.1 Implement real-time spam detection using streaming
frameworks (e.g., Apache Kafka) or edge deployment.
10 Multilingual Support:
10.1 Use multilingual models (e.g., mBERT) and language detection
for global applicability.
11 User Feedback & Active Learning:
11.1 Incorporate user feedback to improve the model and use active
learning for continuous improvement.
12 Explainability:
12.1 Add model interpretability tools (e.g., SHAP, LIME) to explain
predictions.
13 Robustness:
13.1 Test the model against adversarial attacks and improve
preprocessing for noisy data.
14 Deployment:
14.1 Optimize the model for scalability and integrate with
messaging platforms.
15 Ethical Considerations:
15.1 Ensure fairness, transparency, and compliance with data
privacy regulations.

pg. 14
15.2 Conclusion:
16 Enhanced User Experience:
16.1 Filters spam, saving time and reducing exposure to unwanted
or harmful messages.
17 Improved Security:
17.1 Prevents phishing, scams, and fraud, protecting user privacy
and data.
18 NLP and ML Application:
18.1 Demonstrates effective use of NLP techniques and machine
learning models for text classification.
19 Scalability:
19.1 Supports real-time detection and can be adapted for
multilingual use.
20 Research Contribution:
20.1 Provides a benchmark for spam detection and encourages
open-source collaboration.
21 Business Benefits:
21.1 Offers a cost-effective solution for organizations to reduce
spam-related risks.

pg. 15
REFERENCES

[1]. Ming-Hsuan Yang, David J. Kriegman, Narendra Ahuja, “Detecting Faces in

Images: A Survey”, IEEE Transactions on Pattern Analysis and Machine
Intelligence, Volume. 24, No. 1, 2002.

pg. 16

Non-Equilibrium Statistical Mechanics
100% (2)
Non-Equilibrium Statistical Mechanics
337 pages
Algorithms For Scheduling Problems
No ratings yet
Algorithms For Scheduling Problems
210 pages
Appunti pg8 1 PDF
100% (1)
Appunti pg8 1 PDF
86 pages
Second Law of Thermodynamics
100% (1)
Second Law of Thermodynamics
18 pages
CH 12 Simulation
No ratings yet
CH 12 Simulation
49 pages
Advanced Encryption Standard The Origins of AES
No ratings yet
Advanced Encryption Standard The Origins of AES
12 pages
Naac Lesson Plan Subject-Wsn
No ratings yet
Naac Lesson Plan Subject-Wsn
6 pages
Om 9 2017 CLR
No ratings yet
Om 9 2017 CLR
25 pages
PPT ch02
No ratings yet
PPT ch02
33 pages
Decision Modelling: Project Report Topic: Seven Iims Dilemma Prepared By: Group 2
No ratings yet
Decision Modelling: Project Report Topic: Seven Iims Dilemma Prepared By: Group 2
12 pages
Statistics M6
No ratings yet
Statistics M6
18 pages
Asian School of Management and Technology: Affiliated To Tribhuvan University Gongabu, Kathmandu
No ratings yet
Asian School of Management and Technology: Affiliated To Tribhuvan University Gongabu, Kathmandu
34 pages
Miniproject Thirukumaran
No ratings yet
Miniproject Thirukumaran
38 pages
Print 22may2023
No ratings yet
Print 22may2023
54 pages
ECC Writeup
No ratings yet
ECC Writeup
4 pages
Zeeshan (CS) - Assignment 1
No ratings yet
Zeeshan (CS) - Assignment 1
3 pages
Solutions Chapter 3 and 4 Java
No ratings yet
Solutions Chapter 3 and 4 Java
6 pages
10 B CS3491 AI&ML IAT 2 QP
No ratings yet
10 B CS3491 AI&ML IAT 2 QP
2 pages
Semantic Segmentation Evaluation
No ratings yet
Semantic Segmentation Evaluation
14 pages
DLQR Matlab
No ratings yet
DLQR Matlab
2 pages
Untitled7.ipynb - Colaboratory
No ratings yet
Untitled7.ipynb - Colaboratory
12 pages
Slides 1
No ratings yet
Slides 1
33 pages
Fox Rev 3.0
No ratings yet
Fox Rev 3.0
21 pages
Cascade Control of A Continuous Stirred Tank Reactor (CSTR) : October 2013
No ratings yet
Cascade Control of A Continuous Stirred Tank Reactor (CSTR) : October 2013
9 pages
Final Project Report PDF
No ratings yet
Final Project Report PDF
35 pages
Bio Statistics
No ratings yet
Bio Statistics
33 pages
Blue Print Class XII Maths
No ratings yet
Blue Print Class XII Maths
1 page
Nisha Internship3
No ratings yet
Nisha Internship3
87 pages
Prediction of Stock Price Movements Through Regression Analysis For Sun Pharma and Cipla
No ratings yet
Prediction of Stock Price Movements Through Regression Analysis For Sun Pharma and Cipla
4 pages
EE2005 Problem 07
No ratings yet
EE2005 Problem 07
11 pages
(KAVYA R SHETTY)
No ratings yet
(KAVYA R SHETTY)
21 pages
Anchalora
No ratings yet
Anchalora
29 pages
Solution: March 2018
No ratings yet
Solution: March 2018
8 pages
Kriti Final Report
No ratings yet
Kriti Final Report
60 pages
Spam Detection in Text Using Machine Learning 1
No ratings yet
Spam Detection in Text Using Machine Learning 1
85 pages
Content-Based Sms Spam Filtering Using Machine Learning Technique
No ratings yet
Content-Based Sms Spam Filtering Using Machine Learning Technique
7 pages
Brain Tumor Classification Using Hybrid Single Image Super-Resolution Technique With ResNext101!32!8d and VGG19 PR
No ratings yet
Brain Tumor Classification Using Hybrid Single Image Super-Resolution Technique With ResNext101!32!8d and VGG19 PR
14 pages
Hash Code
No ratings yet
Hash Code
5 pages
2020CSEPID63 - Spam Alert System Synopsis Final
No ratings yet
2020CSEPID63 - Spam Alert System Synopsis Final
12 pages
Machine Learning Paper-2
No ratings yet
Machine Learning Paper-2
4 pages
Spam Detection Thesis
100% (3)
Spam Detection Thesis
6 pages
A Comparative Study For SMS Spam Detection
No ratings yet
A Comparative Study For SMS Spam Detection
4 pages
Aiml Pro
No ratings yet
Aiml Pro
14 pages
EMAIL+SPAM+DETECTION Final Fishries++ (2658+to+2664) - 1
No ratings yet
EMAIL+SPAM+DETECTION Final Fishries++ (2658+to+2664) - 1
7 pages
SMS Spam Detection and Classification Using NLP Thesis
No ratings yet
SMS Spam Detection and Classification Using NLP Thesis
14 pages
Email Spam Detection
No ratings yet
Email Spam Detection
8 pages
NLP Report
No ratings yet
NLP Report
19 pages
Application Development Lab Report: Sree Dattha Group of Institution, Hyderabad
No ratings yet
Application Development Lab Report: Sree Dattha Group of Institution, Hyderabad
32 pages
Rmluo 230822115131
No ratings yet
Rmluo 230822115131
1 page
Department of Cse (Artificial Intelligence & Data Science) : Sms Spam Detection
No ratings yet
Department of Cse (Artificial Intelligence & Data Science) : Sms Spam Detection
27 pages
Final PPT
No ratings yet
Final PPT
18 pages
SMS Spam Detection 1
No ratings yet
SMS Spam Detection 1
9 pages
SMS SPAM FILTERING Report
No ratings yet
SMS SPAM FILTERING Report
38 pages
Digital Signal Processing: Mustansiriyah University College of Engineering Electrical Engineering Department 4 Class
No ratings yet
Digital Signal Processing: Mustansiriyah University College of Engineering Electrical Engineering Department 4 Class
16 pages
Spam Detection Synopsis
No ratings yet
Spam Detection Synopsis
8 pages
Lecture 17 Transfer Learning
No ratings yet
Lecture 17 Transfer Learning
12 pages
Project Report Template AICTE Internship 2025
No ratings yet
Project Report Template AICTE Internship 2025
20 pages
Chapter 1
No ratings yet
Chapter 1
22 pages
Sms Spam Detectionn
No ratings yet
Sms Spam Detectionn
63 pages
B 14 Sms Spam Detection ML Ieee Report
No ratings yet
B 14 Sms Spam Detection ML Ieee Report
5 pages
IJNRD2403165
No ratings yet
IJNRD2403165
5 pages
Sms Spam Detection Project Final
No ratings yet
Sms Spam Detection Project Final
59 pages
Spam SMS (Or) Email Detection and Classification Using Machine Learning
No ratings yet
Spam SMS (Or) Email Detection and Classification Using Machine Learning
5 pages
SMS Spam Detection Using Naïve Bayes Algorithm-5
No ratings yet
SMS Spam Detection Using Naïve Bayes Algorithm-5
6 pages
Spam Message
No ratings yet
Spam Message
12 pages
228w1f0040 Review1
No ratings yet
228w1f0040 Review1
15 pages
Abh 1
No ratings yet
Abh 1
17 pages
SMS Spam Detection Using Machine Learning
No ratings yet
SMS Spam Detection Using Machine Learning
4 pages
Sms Spam Detection
No ratings yet
Sms Spam Detection
51 pages
A Hybrid Machine Learning Approach For Spam and Malware
No ratings yet
A Hybrid Machine Learning Approach For Spam and Malware
14 pages
Email Spam Detection
No ratings yet
Email Spam Detection
8 pages
Table Content 1
No ratings yet
Table Content 1
3 pages
Pruthviraj Micor Foml
No ratings yet
Pruthviraj Micor Foml
26 pages
Vishal FOML Micro Project Vishal & Milan
No ratings yet
Vishal FOML Micro Project Vishal & Milan
26 pages
Aryan Blackbook 1
No ratings yet
Aryan Blackbook 1
29 pages
Spam SMS Filtering Based On Text Features and Supervised Machine Learning Techniques
No ratings yet
Spam SMS Filtering Based On Text Features and Supervised Machine Learning Techniques
19 pages
Email Spam Final
No ratings yet
Email Spam Final
32 pages
Spam Detection NLP Project
No ratings yet
Spam Detection NLP Project
3 pages
Email Spam
No ratings yet
Email Spam
8 pages
Format Termpaper
No ratings yet
Format Termpaper
9 pages
Investigating Evasive Techniques in Sms Spam Filtering A Comparative Analysis of Machine Learning Models Ijariie26436
No ratings yet
Investigating Evasive Techniques in Sms Spam Filtering A Comparative Analysis of Machine Learning Models Ijariie26436
10 pages
Opll
No ratings yet
Opll
20 pages
Email Spam Detection Edited
No ratings yet
Email Spam Detection Edited
30 pages
Functional Document
No ratings yet
Functional Document
3 pages
Final Report Spam Classifier
No ratings yet
Final Report Spam Classifier
24 pages
Mini - Project Report
No ratings yet
Mini - Project Report
21 pages
Second Progress Report
No ratings yet
Second Progress Report
17 pages
PDFF
No ratings yet
PDFF
15 pages
Ijsse 14.01 28
No ratings yet
Ijsse 14.01 28
8 pages
The Power Of SMS
From Everand
The Power Of SMS
Lebuajoang Mahlomola
No ratings yet

Project Report Template AICTE Internship 2025

Uploaded by

Project Report Template AICTE Internship 2025

Uploaded by

SMS Spam Detection System Using NLP

submitted in partial fulfillment of the requirements

AICTE Internship on AI: Transformative Learning

Sandesh kumar, [email protected]

Under the Guidance of

 Develop a Robust Classification System

1.4Scope of the Project:

Limitations of the Project

1. Focus on SMS Messages Only

2.1 Review relevant literature or previous work in this domain.

Machine Learning Approaches

Naive Bayes Classifier

Popular for text classification due to its simplicity and efficiency.

Deep Learning Approaches

1. Naive Bayes Classifier

Identified Gaps and Limitations

Many studies focus on English datasets, leaving non-English or mixed-language

Employs advanced text preprocessing techniques to handle obfuscation and evolving

Uses augmentation techniques to simulate diverse spam patterns, improving model

Utilizes interpretable models alongside feature importance analysis to provide

3.1 System Design

8.3 Requirement Specification

2. Natural Language Processing (NLP) Libraries

 NLTK (Natural Language Toolkit): For tokenization, stemming, lemmatization,

3. Machine Learning Libraries

 Scikit-learn: For implementing traditional machine learning algorithms (e.g.,

4. Feature Extraction Tools

 TF-IDF (Term Frequency-Inverse Document Frequency): Available in Scikit-learn.

 Pandas: For data manipulation and analysis.

6. Model Evaluation and Metrics

5.1 Future Work:

[1]. Ming-Hsuan Yang, David J. Kriegman, Narendra Ahuja, “Detecting Faces in

You might also like