0% found this document useful (0 votes)
49 views59 pages

Sms Spam Detection Project Final

The project report details the development of an SMS spam detection system using machine learning techniques, specifically achieving a high accuracy of 98.5% with an LSTM model. The report includes sections on the introduction, literature survey, proposed system, system design, and various methodologies employed for data processing and analysis. The project was conducted by Jainapuram Raghuvarma as part of the requirements for a Master's degree in Computer Applications at Osmania University.

Uploaded by

abhi1619143
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views59 pages

Sms Spam Detection Project Final

The project report details the development of an SMS spam detection system using machine learning techniques, specifically achieving a high accuracy of 98.5% with an LSTM model. The report includes sections on the introduction, literature survey, proposed system, system design, and various methodologies employed for data processing and analysis. The project was conducted by Jainapuram Raghuvarma as part of the requirements for a Master's degree in Computer Applications at Osmania University.

Uploaded by

abhi1619143
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

A PROJECT REPORT

ON
“SMS SPAM DETECTION USING MACHINE LEARNING”

Project work submitted in partial fulfilment for the award of the degree of

MASTER OF COMPUTER APPLICATION

Submitted by

JAINAPURAM RAGHUVARMA

Enrollment No:101120862022

Department Of Computer Science


University College Of Science Saifabad
(2020-2022)
(CONSTITUENT COLLEGE OF OSMANIA UNIVERSITY)

Under the Guidance of


Mrs. Himabindu
(Project Coordinator, Gnaneshwari)
CERTIFICATE

This is to certify that this project entitled “SMS Span Detection By Using
Machine Learning With GUI” is a bonafide work carried out by
Jainapuram Raghuvarma bearing Hall Ticket No: 101120862022 in
University College Of Science, Saifabad and submitted to Osmania
University in partial fulfillment of the requirements for the award of Master
of Computer Applications.

Project Guide External Examiner Principal


ACKNOWLEDGMENT

It gives me pleasure to record my deep sense of gratitude to all Faculty


members of Computer Science Department of my college “UNIVERSITY
COLLEGE OF SCIENCE SAIFABAD” who guided and supervised me
throughout my project completion by their constant encouragement and
discussion which helped me to present this dissertation so successfully

I express my sincere thanks to Mr.T.Aravind, Mrs.Hima Bindhu,


Mrs.Gnaneshwari, Mrs.B.Neeraja, Mrs.Archana, Mr.Raju, Mr.Satyam
& all faculty members for their experimentation throughout my project.

(JAINAPURAM RAGHUVARMA)
SMS SPAM DETECTION
USING
MACHINE LEARNING
ABSTRACT
Project title: SMS Span Detection By Using Machine Learning.

The number of people who use mobile devices is increasing every day. SMS
(short message service) is a text messaging service that can be used on both
smartphones and regular phones. As a result, SMS traffic skyrocketed. The
number of spam texts has also increased. Spammers attempt to send spam
communications for financial or commercial gain, such as market growth,
lottery ticket information, credit card information, and so on. As a result, spam
classification receives special attention.

We used a combination of machine learning and deep learning techniques to


detect SMS spam in this work. We built a spam detection model using data
from UCI. With an accuracy of 98.5 percent, our LSTM model exceeds
previous models in spam detection. All of the implementations were done in
Python.
INDEX

1. INTRODUCTION
2. LITERATURE SURVEY
3. PROPOSED SYSTEM
4. SYSTEM REQUIREMENT SPECIFICATION DOCUMENT.
a. System Architecture Block Diagram.
b. System Requirements
i) Software Requirements.
ii) Hardware Requirements .
c. Disadvantage
d. Modules Description
5. SYSTEM DESIGN
a. UML Diagrams
6. DESCRIPTION OF TECHNOLOGIES
a. Machine Learning
b. Python
7. CODE
8. TESTING
9. RESULT
10. CONCLUSION
11. REFERENCE
Introduction:-
In just five years, the number of smartphone users has risen from 1 billion
to 3.8 billion [1]. China, India, and the United States are the top three mobile
phone users. SMS, or Short Message Service, is a text messaging service that
has been around for a while. It is also possible to use SMS without having
access to the internet. As a result, SMS is supported by both smartphones and
basic mobile phones. Despite the fact that smart phones come with a variety of
text messaging apps such as WhatsApp, this service is only available via the
internet. SMS, on the other hand, can be sent at any point in time. As a result,
SMS service traffic is steadily expanding. Unsolicited communications are sent
by spammers. Spammers bombard people with a large quantity of messages for
the advantage of their organizations or personal gain. Spam is the term for these
kind of messages. Despite the availability of numerous SMS spam filtering
solutions[2], complex strategies are still required to deal with this problem.
Spam messages on mobile devices can be irritating. SMS spam and email spam
are two different types of spam messages. The term "spam" or "SMS spam"
refers to the same thing. Spammers use these spam mailings to promote their
utilities or businesses. Users may sometimes suffer financial losses as a result of
spam mailings. Machine Learning is a technology that allows machines to learn
from past data and anticipate future data. Machine learning and deep learning
can now be used to tackle most real-world problems in a variety of fields,
including health, security, and market analysis. Machine learning approaches
include supervised learning, unsupervised learning, semi supervised learning,
and others. The dataset in supervised learning has output labels, whereas
datasets without labels are dealt with in unsupervised learning. We used a UCI
dataset with labels and employed multiple supervised learning techniques to
detect SMS spam.
Literature survey:-
It is not a new period to use machine learning and deep learning techniques to
detect spam. Previously, ML approaches were used to classify SMS spam by a
number of academics. Nilam Nur Amir Sjarif[3] et al. combined the TF-IDF
technique with a random forest classifier and reached a 97.5 percent accuracy.
The TF-IDF approach uses two measurements, Term Frequency and Inverse
Document Frequency, to quantify the words in a document. For email spam
filtering, A.Lakshmanarao et colleagues used four machine learning classifiers:
Decision Trees, Naive Bayes, Logistic Regression, and Random Forest, with the
random forest classifier achieving a 97 percent accuracy. Pavas Navaney[5] et
al. suggested several machine learning techniques and used support vector
machines to obtain a 97.4 percent accuracy. Luo GuangJun [6] et al. used a
variety of shallow machine learning techniques and found that the logistic
regression classifier had a high accuracy rate. For the detection of SMS spam,
Tian Xia[7] et al presented the Hidden Markov Mode. Their model was based
on theheir model used the information about the order of words thereby solving
issues with low term frequency.

This model. M. Nivaashini [8] et.al applied a deep neural network for SMS
spam detection and achieved an accuracy of 98%. They also compared DNN
performance with NB, Random Forest, SVM, and KNN. Mehul Gupta[9] et.al
compared various spam detection machine learning models with deep learning
models and shown that deep learning models achieved a high accuracy rate in
SMS spam detection. Gomatham Sai Sravya[10] et.al compared various
machine learning algorithms for SMS spam detection and achieved the best
accuracy with the Naive Bayes classification model. M.Rubin Julis[11] et.al
applied various machine learning classifiers and achieved an accuracy of 97%
with a support vector machine. K. Sree Ram Murthy [12] et.al proposed
Recurrent Neural Networks for SMS spam detection and achieved a good
accuracy rate. S. Sheikh[13] proposed SMS spam detection using feature
selection and the Neural Network model and achieved a good accuracy rate.
Adem Tekerek[14] et.al applied various machine learning classification models
for SMS spam detection and achieved an accuracy of 97% with a support vector
machine classifier.
Proposed System
The prediction method will employ 3 machine learning algorithms which are
Linear Regression , Random Forest Regressor and Decision tree Regressor.
● STEPS for Proposed Approach-
Step 1:-Initialize the dataset containing training data wholesale price
index
Step 2:-Select all the rows and column 1from dataset to “x” Which is
independent variable
Step 3:-Select all of the rows and column 2 from dataset to “y” Which
is dependent variable
Step 4:- Fit DTR/SVR/LR to the dataset
step 5:-Predict the new value
step 6:-Visualize the result and check the accuracy
a. System Architecture Block Diagram.

b. System Requirements:

i) Software Requirements:

1. Anaconda Navigator

2. ML - NLP

ii) Hardware Requirements:

1. Windows 7,8,10 64 bit

2. RAM 4GB

c. Disadvantages:
● Time complexity was more
● Prediction accuracy was not so high
d. System Modules
1. Data Ingestion:
Data ingestion is the transportation of data from assorted sources to a storage
medium where it can be accessed, used, and analyzed by an organization. The
destination is typically a data warehouse, data mart, database, or a document
store. Sources may be almost anything – including SaaS data, in-house apps,
databases, spreadsheets, or even information scraped from the internet. The data
ingestion layer is the backbone of any analytics architecture. Downstream
reporting and analytics systems rely on consistent and accessible data. There are
different ways of ingesting data, and the design of a particular data ingestion
layer can be based on various models or architectures.

2. Data Preprocessing:
Data Preprocessing is a data mining technique used to transform the raw data
into useful and efficient format. The data here goes through 2 stages 1. Data
Cleaning: It is very important for data to be error free and free of unwanted data.
So, the data is cleansed before performing the next steps. Cleansing of data
includes checking for missing values, duplicate records and invalid formatting
and removing them. 2. Data Transformation: Data Transformation is
transformation of the datasets mathematically; data is transformed into
appropriate forms suitable for data mining process. This allows us to understand
the data more keenly by arranging the 100‟s of records in an orderly way.
Transformation includes Normalization, Standardization, Attribute Selection.

3. Exploratory data analysis:


Exploratory data analysis(EDA) is an approach to understand the datasets more
keenly by the means of visual elements like scatter plots, bar plots, etc. This
allows us to identify the trends in the data more accurately and to perform
analysis accordingly. From the yearly trends graphs it is observed that, US
Exports depend on and follows the areas planted and harvested annually. A
sudden drop in China‟s Exports in the year 2009 is observed and in the mean
time its imports kept increasing in the last 12 years regardless of the global yield,
which implies China has a huge and lasting demand of soybean crop but now it
relies on the global supply to meet the needs
4. Feature Extraction : Correlations
Finally, let's take a look at the relationships between numeric features and other
numeric features. Correlation is a value between -1 and 1 that represents how
closely values for two separate features move in unison. Positive correlation
means that as one feature increases, the other increases; eg. a child's age and her
height. Negative correlation means that as one feature increases, the other
decreases; eg. hours spent studying and number of parties attended. Correlations
near -1 or 1 indicate a strong relationship. Those closer to 0 indicate a weak
relationship. 0 indicates no relationship.
5. Evaluation Metric
Modelling of data involves creating a data model for the data to be stored in the
database. The process of modeling means training a Machine Learning
Algorithm to predict the labels from the features, tuning it for business need,
and validating it on the hold out data. The output from modeling is a trained
model that can be used for inference, making predictions on new data points.
Modeling is independent of the previous steps in the Machine Learning process
and has standardized inputs which means we can alter the prediction problem
without needing to rewrite all our code. If the business requirements change, we
can generate new label times, build corresponding features, and input them into
the model. Models are implemented and later evaluated for their accuracies
using root mean square error
Regressors used for prediction purpose -
● Random Forest Regressor- regression method
● Support Vector Regression (SVR) – uses kernel functions
● Linear Regression – regression method
● Decision Tree Regression – regression method
Since this is multi classification problem, we use the following metrics:
● R2 score - The r2-score of a regression is the percentage of the test set
tuples that are correctly classified by the regressor.
● Root Mean Square Error: The Root Mean Square Error is evaluated for
every model and the accuracies are measured.
UML DIAGRAMS
The Unified Modeling Language (UML) is used to specify, visualize, modify,
construct and document the artifacts of an object-oriented software intensive
system under development. UML offers a standard way to visualize a system's
architectural blueprints, including elements such as:
● actors
● business processes
● (logical) components
● activities
● programming language statements
● database schemas, and
● Reusable software components.

UML combines best techniques from data modeling (entity relationship


diagrams), business modeling (work flows), object modeling, and component
modeling. It can be used with all processes, throughout the software
development life cycle, and across different implementation technologies. UML
has synthesized the notations of the Booch method, the Object-modeling
technique (OMT) and Object-oriented software engineering (OOSE) by fusing
them into a single, common and widely usable modeling language. UML aims
to be a standard modeling language which can model concurrent and distributed
systems.
i) Sequence Diagram:
Sequence Diagrams Represent the objects participating the interaction
horizontally and time vertically. A Use Case is a kind of behavioral classifier
that represents a declaration of an offered behavior. Each use case specifies
some behavior, possibly including variants that the subject can perform in
collaboration with one or more actors. Use cases define the offered behavior of
the subject without reference to its internal structure. These behaviors, involving
interactions between the actor and the subject, may result in changes to the state
of the subject and communications with its environment. A use case can include
possible variations of its basic behavior, including exceptional behavior and
error handling.
ii) Activity Diagrams:
Activity diagrams are graphical representations of Workflows of stepwise
activities and actions with support for choice, iteration and concurrency. In the
Unified Modeling Language, activity diagrams can be used to describe the
business and operational step-by-step workflows of components in a system. An
activity diagram shows the overall flow of control.
iii) Usecase diagram:
● UML is a standard language for specifying, visualizing, constructing, and
documenting the artifacts of software systems.
● UML was created by Object Management Group (OMG) and UML 1.0
specification draft was proposed to the OMG in January 1997.
● OMG is continuously putting effort to make a truly industry standard.
● UML stands for Unified Modeling Language.
● UML is a pictorial language used to make software blue prints
iv) Class diagram
The class diagram is the main building block of object-oriented modeling. It is
used for general conceptual modeling of the systematic of the application, and
for detailed modeling translating the models into programming code. Class
diagrams can also be used for data modeling.[1] The classes in a class diagram
represent both the main elements, interactions in the application, and the classes
to be programmed.
In the diagram, classes are represented with boxes that contain three
compartments:
The top compartment contains the name of the class. It is printed in bold and
centered, and the first letter is capitalized.
The middle compartment contains the attributes of the class. They are left-
aligned and the first letter is lowercase.
The bottom compartment contains the operations the class can execute. They
are also left-aligned and the first letter is lowercase.
Domain Specification

MACHINE LEARNING
Machine Learning is a system that can learn from example through self-
improvement and without being explicitly coded by programmer. The
breakthrough comes with the idea that a machine can singularly learn from the
data (i.e., example) to produce accurate results.
Machine learning combines data with statistical tools to predict an output. This
output is then used by corporate to makes actionable insights. Machine learning
is closely related to data mining and Bayesian predictive modeling. The
machine receives data as input, use an algorithm to formulate answers.

A typical machine learning tasks are to provide a recommendation. For those


who have a Netflix account, all recommendations of movies or series are based
on the user's historical data. Tech companies are using unsupervised learning to
improve the user experience with personalizing recommendation.

Machine learning is also used for a variety of task like fraud detection,
predictive maintenance, portfolio optimization, automatize task and so on.

Machine Learning vs. Traditional Programming

Traditional programming differs significantly from machine learning. In


traditional programming, a programmer code all the rules in consultation with
an expert in the industry for which software is being developed. Each rule is
based on a logical foundation; the machine will execute an output following the
logical statement. When the system grows complex, more rules need to be
written. It can quickly become unsustainable to maintain.

DATA RULES
COMPUTER

Machine Learning
How does Machine learning work?
Machine learning is the brain where all the learning takes place. The way the
machine learns is similar to the human being. Humans learn from experience.
The more we know, the more easily we can predict. By analogy, when we face
an unknown situation, the likelihood of success is lower than the known
situation. Machines are trained the same. To make an accurate prediction, the
machine sees an example. When we give the machine a similar example, it can
figure out the outcome. However, like a human, if its feed a previously unseen
example, the machine has difficulties to predict.
The core objective of machine learning is the learning and inference. First of
all, the machine learns through the discovery of patterns. This discovery is made
thanks to the data. One crucial part of the data scientist is to choose carefully
which data to provide to the machine. The list of attributes used to solve a
problem is called a feature vector. You can think of a feature vector as a subset
of data that is used to tackle a problem.
The machine uses some fancy algorithms to simplify the reality and transform
this discovery into a model. Therefore, the learning stage is used to describe the
data and summarize it into a model.

For instance, the machine is trying to understand the relationship between the
wage of an individual and the likelihood to go to a fancy restaurant. It turns out
the machine finds a positive relationship between wage and going to a high-end
restaurant: This is the model
Inferring

When the model is built, it is possible to test how powerful it is on never-seen-


before data. The new data are transformed into a features vector, go through the
model and give a prediction. This is all the beautiful part of machine learning.
There is no need to update the rules or train again the model. You can use the
model previously trained to make inference on new data.

The life of Machine Learning programs is straightforward and can be


summarized in the following points:

1. Define a question
2. Collect data
3. Visualize data
4. Train algorithm
5. Test the Algorithm
6. Collect feedback
7. Refine the algorithm
8. Loop 4-7 until the results are satisfying
9. Use the model to make a prediction

Once the algorithm gets good at drawing the right conclusions, it applies that
knowledge to new sets of data.
Machine learning Algorithms and where they are used?

Machine learning can be grouped into two broad learning tasks: Supervised and
Unsupervised. There are many other algorithms

Supervised learning

An algorithm uses training data and feedback from humans to learn the
relationship of given inputs to a given output. For instance, a practitioner can
use marketing expense and weather forecast as input data to predict the sales of
cans.
You can use supervised learning when the output data is known. The algorithm
will predict new data.

There are two categories of supervised learning:

● Classification task
● Regression task

Classification
Imagine you want to predict the gender of a customer for a commercial. You
will start gathering data on the height, weight, job, salary, purchasing basket, etc.
from your customer database. You know the gender of each of your customer, it
can only be male or female. The objective of the classifier will be to assign a
probability of being a male or a female (i.e., the label) based on the information
(i.e., features you have collected). When the model learned how to recognize
male or female, you can use new data to make a prediction. For instance, you
just got new information from an unknown customer, and you want to know if it
is a male or female. If the classifier predicts male = 70%, it means the algorithm
is sure at 70% that this customer is a male, and 30% it is a female.
The label can be of two or more classes. The above example has only two
classes, but if a classifier needs to predict object, it has dozens of classes (e.g.,
glass, table, shoes, etc. each object represents a class)

Regression
When the output is a continuous value, the task is a regression. For instance, a
financial analyst may need to forecast the value of a stock based on a range of
feature like equity, previous stock performances, macroeconomics index. The
system will be trained to estimate the price of the stocks with the lowest
possible error.

Algorithm Description Type


Name

Linear Finds a way to correlate each feature to the Regression


regression output to help predict future values.

Logistic Extension of linear regression that's used for Classificatio


regression classification tasks. The output variable 3is n
binary (e.g., only black or white) rather than
continuous (e.g., an infinite list of potential
colors)

Decision Highly interpretable classification or Regression


tree regression model that splits data-feature Classificatio
values into branches at decision nodes (e.g., n
if a feature is a color, each possible color
becomes a new branch) until a final decision
output is made

Naive The Bayesian method is a classification Regression


method that makes use of the Bayesian Classificatio
Bayes theorem. The theorem updates the prior n
knowledge of an event with the independent
probability of each feature that can affect
the event.

Support Support Vector Machine, or SVM, is Regression


vector typically used for the classification task. (not very
machine SVM algorithm finds a hyperplane that common)
optimally divided the classes. It is best used Classificatio
with a non-linear solver. n

Random The algorithm is built upon a decision tree to Regression


forest improve the accuracy drastically. Random Classificatio
forest generates many times simple decision n
trees and uses the 'majority vote' method to
decide on which label to return. For the
classification task, the final prediction will
be the one with the most vote; while for the
regression task, the average prediction of all
the trees is the final prediction.

AdaBoost Classification or regression technique that Regression


uses a multitude of models to come up with a Classificatio
decision but weighs them based on their n
accuracy in predicting the outcome

Gradient- Gradient-boosting trees is a state-of-the-art Regression


boosting classification/regression technique. It is Classificatio
trees focusing on the error committed by the n
previous trees and tries to correct it.
Unsupervised learning

In unsupervised learning, an algorithm explores input data without being given


an explicit output variable (e.g., explores customer demographic data to identify
patterns)
You can use it when you do not know how to classify the data, and you want the
algorithm to find patterns and classify the data for you

Algorithm Description Type

K-means Puts data into some groups (k) that each Clustering
clustering contains data with similar characteristics (as
determined by the model, not in advance by
humans)

Gaussian A generalization of k-means clustering that Clustering


mixture model provides more flexibility in the size and shape
of groups (clusters

Hierarchical Splits clusters along a hierarchical tree to Clustering


clustering form a classification system.

Can be used for Cluster loyalty-card customer

Recommender Help to define the relevant data for making a Clustering


system recommendation.

PCA/T-SNE Mostly used to decrease the dimensionality of Dimension


the data. The algorithms reduce the number Reduction
of features to 3 or 4 vectors with the highest
variances.
Application of Machine learning
Augmentation:

● Machine learning, which assists humans with their day-to-day tasks,


personally or commercially without having complete control of the output.
Such machine learning is used in different ways such as Virtual Assistant,
Data analysis, software solutions. The primary user is to reduce errors
due to human bias.

Automation:

● Machine learning, which works entirely autonomously in any field


without the need for any human intervention. For example, robots
performing the essential process steps in manufacturing plants.

Finance Industry

● Machine learning is growing in popularity in the finance industry. Banks


are mainly using ML to find patterns inside the data but also to prevent
fraud.

Government organization

● The government makes use of ML to manage public safety and utilities.


Take the example of China with the massive face recognition. The
government uses Artificial intelligence to prevent jaywalker.

Healthcare industry

● Healthcare was one of the first industry to use machine learning with
image detection.

Marketing

● Broad use of AI is done in marketing thanks to abundant access to data.


Before the age of mass data, researchers develop advanced mathematical
tools like Bayesian analysis to estimate the value of a customer. With the
boom of data, marketing department relies on AI to optimize the
customer relationship and marketing campaign.
Example of application of Machine Learning in Supply Chain
Machine learning gives terrific results for visual pattern recognition, opening up
many potential applications in physical inspection and maintenance across the
entire supply chain network.

Unsupervised learning can quickly search for comparable patterns in the diverse
dataset. In turn, the machine can perform quality inspection throughout the
logistics hub, shipment with damage and wear.
For instance, IBM's Watson platform can determine shipping container damage.
Watson combines visual and systems-based data to track, report and make
recommendations in real-time.
In past year stock manager relies extensively on the primary method to evaluate
and forecast the inventory. When combining big data and machine learning,
better forecasting techniques have been implemented (an improvement of 20 to
30 % over traditional forecasting tools). In term of sales, it means an increase of
2 to 3 % due to the potential reduction in inventory costs.

Example of Machine Learning Google Car


For example, everybody knows the Google car. The car is full of lasers on the
roof which are telling it where it is regarding the surrounding area. It has radar
in the front, which is informing the car of the speed and motion of all the cars
around it. It uses all of that data to figure out not only how to drive the car but
also to figure out and predict what potential drivers around the car are going to
do. What's impressive is that the car is processing almost a gigabyte a second of
data.

Deep Learning
Deep learning is a computer software that mimics the network of neurons in a
brain. It is a subset of machine learning and is called deep learning because it
makes use of deep neural networks. The machine uses different layers to learn
from the data. The depth of the model is represented by the number of layers in
the model. Deep learning is the new state of the art in term of AI. In deep
learning, the learning phase is done through a neural network.

Reinforcement Learning
Reinforcement learning is a subfield of machine learning in which systems are
trained by receiving virtual "rewards" or "punishments," essentially learning by
trial and error. Google's DeepMind has used reinforcement learning to beat a
human champion in the Go games. Reinforcement learning is also used in video
games to improve the gaming experience by providing smarter bot.
One of the most famous algorithms are:

● Q-learning
● Deep Q network
● State-Action-Reward-State-Action (SARSA)
● Deep Deterministic Policy Gradient (DDPG)

Applications/ Examples of deep learning applications


AI in Finance: The financial technology sector has already started using AI to
save time, reduce costs, and add value. Deep learning is changing the lending
industry by using more robust credit scoring. Credit decision-makers can use AI
for robust credit lending applications to achieve faster, more accurate risk
assessment, using machine intelligence to factor in the character and capacity of
applicants.
Underwrite is a Fintech company providing an AI solution for credit makers
company. underwrite.ai uses AI to detect which applicant is more likely to pay
back a loan. Their approach radically outperforms traditional methods.

AI in HR: Under Armour, a sportswear company revolutionizes hiring and


modernizes the candidate experience with the help of AI. In fact, Under Armour
Reduces hiring time for its retail stores by 35%. Under Armour faced a growing
popularity interest back in 2012. They had, on average, 30000 resumes a month.
Reading all of those applications and begin to start the screening and interview
process was taking too long. The lengthy process to get people hired and on-
boarded impacted Under Armour's ability to have their retail stores fully staffed,
ramped and ready to operate.
At that time, Under Armour had all of the 'must have' HR technology in place
such as transactional solutions for sourcing, applying, tracking and onboarding
but those tools weren't useful enough. Under armour choose HireVue, an AI
provider for HR solution, for both on-demand and live interviews. The results
were bluffing; they managed to decrease by 35% the time to fill. In return, the
hired higher quality staffs.

AI in Marketing: AI is a valuable tool for customer service management and


personalization challenges. Improved speech recognition in call-center
management and call routing as a result of the application of AI techniques
allows a more seamless experience for customers.
For example, deep-learning analysis of audio allows systems to assess a
customer's emotional tone. If the customer is responding poorly to the AI
chatbot, the system can be rerouted the conversation to real, human operators
that take over the issue.
Apart from the three examples above, AI is widely used in other
sectors/industries.

Artificial Intelligence

Difference between Machine Learning and Deep Learning


Machine Learning Deep Learning

Data Excellent performances on a Excellent performance on a


Dependencies small/medium dataset big dataset

Hardware Work on a low-end machine. Requires powerful machine,


dependencies preferably with GPU: DL
performs a significant
amount of matrix
multiplication

Feature Need to understand the No need to understand the


engineering features that represent the best feature that represents
data the data

Execution From few minutes to hours Up to weeks. Neural


time Network needs to compute a
significant number of
weights

Interpretabilit Some algorithms are easy to Difficult to impossible


y interpret (logistic, decision
tree), some are almost
impossible(SVM, XGBoost)
When to use ML or DL?
In the table below, we summarize the difference between machine learning and
deep learning.

Machine Deep
learning learning

Training dataset Small Large

Choose features Yes No

Number of Many Few


algorithms

Training time Short Long

With machine learning, you need fewer data to train the algorithm than deep
learning. Deep learning requires an extensive and diverse set of data to identify
the underlying structure. Besides, machine learning provides a faster-trained
model. Most advanced deep learning architecture can take days to a week to
train. The advantage of deep learning over machine learning is it is highly
accurate. You do not need to understand what features are the best
representation of the data; the neural network learned how to select critical
features. In machine learning, you need to choose for yourself what features to
include in the model.
TensorFlow
the most famous deep learning library in the world is Google's TensorFlow.
Google product uses machine learning in all of its products to improve the
search engine, translation, image captioning or recommendations.
To give a concrete example, Google users can experience a faster and more
refined the search with AI. If the user types a keyword a the search bar, Google
provides a recommendation about what could be the next word.
Google wants to use machine learning to take advantage of their massive
datasets to give users the best experience. Three different groups use machine
learning:

● Researchers
● Data scientists
● Programmers.

They can all use the same toolset to collaborate with each other and improve
their efficiency.
Google does not just have any data; they have the world's most massive
computer, so TensorFlow was built to scale. TensorFlow is a library developed
by the Google Brain Team to accelerate machine learning and deep neural
network research.
It was built to run on multiple CPUs or GPUs and even mobile operating
systems, and it has several wrappers in several languages like Python, C++ or
Java.

In this tutorial, you will learn


TensorFlow Architecture
Tensor flow architecture works in three parts:

● Pre processing the data


● Build the model
● Train and estimate the model

It is called Tensor flow because it takes input as a multi-dimensional array, also


known as tensors. You can construct a sort of flowchart of operations (called a
Graph) that you want to perform on that input. The input goes in at one end, and
then it flows through this system of multiple operations and comes out the other
end as output.
This is why it is called TensorFlow because the tensor goes in it flows through a
list of operations, and then it comes out the other side.
Where can Tensor flow run?
TensorFlow can hardware, and software requirements can be classified into
Development Phase: This is when you train the mode. Training is usually done
on your Desktop or laptop.
Run Phase or Inference Phase: Once training is done Tensorflow can be run on
many different platforms. You can run it on

● Desktop running Windows, macOS or Linux


● Cloud as a web service
● Mobile devices like iOS and Android

You can train it on multiple machines then you can run it on a different machine,
once you have the trained model.
The model can be trained and used on GPUs as well as CPUs. GPUs were
initially designed for video games. In late 2010, Stanford researchers found that
GPU was also very good at matrix operations and algebra so that it makes them
very fast for doing these kinds of calculations. Deep learning relies on a lot of
matrix multiplication. TensorFlow is very fast at computing the matrix
multiplication because it is written in C++. Although it is implemented in C++,
TensorFlow can be accessed and controlled by other languages mainly, Python.
Finally, a significant feature of Tensor Flow is the Tensor Board. The Tensor
Board enables to monitor graphically and visually what TensorFlow is doing.
List of Prominent Algorithms supported by TensorFlow

● Linear regression: tf. estimator .Linear Regressor


● Classification :tf. Estimator .Linear Classifier
● Deep learning classification: tf. estimator. DNN Classifier
● Booster tree regression: tf.estimator.BoostedTreesRegressor
● Boosted tree classification: tf.estimator.BoostedTreesClassifier
PYTHON OVERVIEW
Python is a high-level, interpreted, interactive and object-oriented scripting
language. Python is designed to be highly readable. It uses English keywords
frequently where as other languages use punctuation, and it has fewer
syntactical constructions than other languages.

● Python is Interpreted: Python is processed at runtime by the interpreter.


You do not need to compile your program before executing it. This is
similar to PERL and PHP.

● Python is Interactive: You can actually sit at a Python prompt and


interact with the interpreter directly to write your programs.

● Python is Object-Oriented: Python supports Object-Oriented style or


technique of programming that encapsulates code within objects.

● Python is a Beginner's Language: Python is a great language for the


beginner-level programmers and supports the development of a wide
range of applications from simple text processing to WWW browsers to
games.

History of Python
Python was developed by Guido van Rossum in the late eighties and early
nineties at the National Research Institute for Mathematics and Computer
Science in the Netherlands.

Python is derived from many other languages, including ABC, Modula-3, C,


C++, Algol-68, SmallTalk, Unix shell, and other scripting languages.

Python is copyrighted. Like Perl, Python source code is now available under the
GNU General Public License (GPL).

Python is now maintained by a core development team at the institute, although


Guido van Rossum still holds a vital role in directing its progress.
Python Features
Python's features include:

Easy-to-learn: Python has few keywords, simple structure, and a clearly


defined syntax. This allows the student to pick up the language quickly.

Easy-to-read: Python code is more clearly defined and visible to the eyes.

Easy-to-maintain: Python's source code is fairly easy-to-maintain.

A broad standard library: Python's bulk of the library is very portable and
cross-platform compatible on UNIX, Windows, and Macintosh.

Interactive Mode: Python has support for an interactive mode which allows
interactive testing and debugging of snippets of code.

Portable: Python can run on a wide variety of hardware platforms and has the
same interface on all platforms.

Extendable: You can add low-level modules to the Python interpreter. These
modules enable programmers to add to or customize their tools to be more
efficient.

Databases: Python provides interfaces to all major commercial databases.

GUI Programming: Python supports GUI applications that can be created and
ported to many system calls, libraries, and windows systems, such as Windows
MFC, Macintosh, and the X Window system of Unix.

Scalable: Python provides a better structure and support for large programs
than shell scripting.

Apart from the above-mentioned features, Python has a big list of good features,
few are listed below:

● IT supports functional and structured programming methods as well as


OOP.

● It can be used as a scripting language or can be compiled to byte-code for


building large applications.
● It provides very high-level dynamic data types and supports dynamic type
checking.

● IT supports automatic garbage collection.

● It can be easily integrated with C, C++, COM, ActiveX, CORBA, and


Java.

Python is available on a wide variety of platforms including Linux and Mac OS


X. Let's understand how to set up our Python environment.

ANACONDA NAVIGATOR
Anaconda Navigator is a desktop graphical user interface (GUI) included in
Anaconda distribution that allows you to launch applications and easily manage
conda packages, environments and channels without using command-line
commands. Navigator can search for packages on Anaconda Cloud or in a local
Anaconda Repository. It is available for Windows, mac OS and Linux.
Why use Navigator?
In order to run, many scientific packages depend on specific versions of
other packages. Data scientists often use multiple versions of many
packages, and use multiple environments to separate these different versions.
The command line program conda is both a package manager and an
environment manager, to help data scientists ensure that each version of each
package has all the dependencies it requires and works correctly.
Navigator is an easy, point-and-click way to work with packages and
environments without needing to type conda commands in a terminal window.
You can use it to find the packages you want, install them in an environment,
run the packages and update them, all inside Navigator.
WHAT APPLICATIONS CAN I ACCESS USING NAVIGATOR?
The following applications are available by default in Navigator:
● Jupyter Lab
● Jupyter Notebook
● QT Console
● Spyder
● VS Code
● Glue viz
● Orange 3 App
● Rodeo
● RStudio
Advanced conda users can also build your own Navigator applications
How can I run code with Navigator?
The simplest way is with Spyder. From the Navigator Home tab, click Spyder,
and write and execute your code.
You can also use Jupyter Notebooks the same way. Jupyter Notebooks are an
increasingly popular system that combine your code, descriptive text, output,
images and interactive interfaces into a single notebook file that is edited,
viewed and used in a web browser.
What’s new in 1.9?
● Add support for Offline Mode for all environment related actions.
● Add support for custom configuration of main windows links.
● Numerous bug fixes and performance enhancements.
In [18] :
In [19] :

In [20] :

In [21] :

In [22] :
TESTING
Software testing is an investigation conducted to provide stakeholders
with information about the quality of the product or service under test.
Software Testing also provides an objective, independent view of the
software to allow the business to appreciate and understand the risks at
implementation of the software. Test techniques include, but are not
limited to, the process of executing a program or application with the
intent of finding software bugs.
Software Testing can also be stated as the process of validating and
verifying that a software program/application/product:
● Meets the business and technical requirements that guided its
design and Development.
● Works as expected and can be implemented with the same
characteristics.

TESTING METHODS
Functional Testing

Functional tests provide systematic demonstrations that functions tested


are available as specified by the business and technical requirements,
system documentation, and user manuals.
Functional testing is centered on the following items:
● Functions: Identified functions must be exercised.
● Output: Identified classes of software outputs must be exercised.
● Systems/Procedures: system should work properly

Integration Testing
Software integration testing is the incremental integration testing of two
or more integrated software components on a single platform to produce
failures caused by interface defects.

Test Case for Excel Sheet Verification:

Here in machine learning we are dealing with dataset which is in excel


sheet format so if any test case we need means we need to check excel
file. Later on classification will work on the respective columns of
dataset .
RESULTS
Data mining is a process to extract knowledge from existing data. It is
used as a tool in banking and finance, in general, to discover useful
information from the operational and historical data to enable better
decision-making. It is an interdisciplinary field, the confluence of
Statistics, Database technology, Information science, Machine learning,
and Visualization. It involves steps that include data selection, data
integration, data transformation, data mining, pattern evaluation,
knowledge presentation.
CONCLUSION
The research aims at predicting the messages whether the messages are
spam or true one and it is runs on efficient machine learning algorithms
and technologies having an good accuracy. The training datasets so
obtained provide the enough insights for predicting the appropriate
messages . Thus, the system helps the users in identification of their
messages whether they are spam messages or true messages with certain
accurate prediction .
REFERENCES :-
[1] Online: https://www.statista.com/statistics/330695/number-
ofsmartphone-users-worldwide/
[2] S. M. Abdulhamid, M.S.Abd Latif, Haruna Chiroma, “Robust Heart
Disease Prediction A Review on Mobile SMS Spam Filtering
Techniques”, EEE Access, vol. 5, pp. 15650-15666, 2017, doi:
10.1109/ACCESS.2017.2666785
[3] Nilam Nur Amir Sjarif, N F Mohd Azmi, Suriayati Chuprat, “SMS
Spam Message Detection using Term Frequenct-Inverse Document
Frequency and Random Forest Algorithm,” in The Fifth Information
Systems International Conference 2019, Procedia Computer Science 161
(2019) 509-515,ScienceDirect.
[4] A.Lakshmanarao,K.Chandra Sekhar, Y.Swathi, “An Efficient Spam
Classification System using Ensemble Machine Learning Algorithm,” in
Journal of Applied Science and Computations, Volume 5, Issue 9,
September/2018.
[5] Pavas Navaney, Gaurav Dubey, Ajay Rana, “SMS Spam Filtering
using Supervised Machine Learning Algorithms.,” in 8th International
Conference on Cloud Computing, Data Science & Engineering, 978-1-
5386-1719-9/18/ 2018 IEEE.
[6] Luo GuangJun,, Shah Nazir, Habib Ullah Khan, Amin Ul Haq, “Spam
Detection Approach for Secure Mobile Messgae Communication using
Machine Learning Algorithms.,” in Hindawi,Security and
Communication Netwroks,Volume 2020,Article id:8873639.July-2020.
[7] Tian Xia, Xuemin Chen, “A Discrete Hidden Markov Model for SMS
Spam Detection.,” in Applied Science,MDPI, Appl. Sci. 2020, 10, 5011;
doi:10.3390/app10145011.
[8] M. Nivaashini, R.S.Soundariya, A.Kodieswari, P.Thangaraj, “: SMS
Spam Detection using Deep Neural Network.,” in International Journal of
Pure and Applied Mathematics, Volume 119 No. 18 2018, 2425-2436.
[9] Mehul Gupta, Aditya Bakliwal, Shubhangi Agarwal,Pulkit
Mehndiratta, “: A Comparative Study of Spam SMS Detection using
Machine Learning Classifiers.,” in 2018 Eleventh International
Conference on Contemporary Computing (IC3), 2-4 August, 2018.
[10] Gomatham Sai Sravya, G Pradeepini, Vaddeswaram, “: Mobile Sms
Spam Filter Techniques Using Machine Learning Techniques.,”
International Journal Of Scientific & Technology Research Volume 9,
Issue 03, March 2020.
[11] M.Rubin Julis, S.AIagesan:, “Spam Detection In Sms Using
Machine Learning through Textmining”, International Journal Of
Scientific & Technology Research Volume 9, Issue 02, February 2020.
[12] K. Sree Ram Murthy,K.Kranthi Kumar, K.Srikar, CH.Nithya,
S.Alagesan:, “SMS Spam Detection using RNN”, : International

You might also like