0% found this document useful (0 votes)

121 views

Random Forests Machine Learning Technique For Email Spam Filtering E. G. Dada and S. B. Joseph

The document summarizes a study that used the random forest machine learning technique to classify emails as spam or not spam. The researchers used the Enron public email dataset containing over 5000 emails to extract prominent spam email features. They then applied the random forest algorithm, which resulted in a very high classification accuracy of 99.92% and low false positive rate of 0.01. The random forest algorithm was simulated using the WEKA data mining tool. The study aims to develop an effective spam email filter with high prediction accuracy and fewer required features.

Uploaded by

Rafterbang Putra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

121 views

Random Forests Machine Learning Technique For Email Spam Filtering E. G. Dada and S. B. Joseph

Uploaded by

Rafterbang Putra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

University of Maiduguri

Faculty of Engineering Seminar Series

Volume 9 number 1, July 2018
Random Forests Machine Learning Technique for Email Spam Filtering
E. G. Dada and S. B. Joseph
Department of Computer Engineering, University of Maiduguri, Maiduguri – Borno State,
Nigeria.
[email protected]; +2349084222298
Abstract
Email spam is one of the major challenges faced daily by every email user in the world. On a
daily basis email users receive hundreds of spam mails having a new content, from
anonymous addresses which are automatically generated by robot software agents. The
traditional methods of spam filtering such as black lists and white lists using (domains, IP
addresses, mailing addresses) have proven to be grossly ineffective in curtailing the menace of
spam messages. This have brought afore the need for the invention of highly reliable email
spam filters. Of recent, machine learning approach have been successfully applied in detecting
and filtering spam emails. This paper proposes the use of random forest machine learning
algorithm for efficient classification of email spam messages. The main purpose is to develop
a spam email filter with better prediction accuracy and less numbers of features. From the
Enron public dataset consisting of 5180 emails of both ham, spam and normal emails, a set of
prominent spam email features (from the literatures) were extracted and applied by the
random forests algorithm with a resultant classification accuracy of 99.92%, very low false
positive rate (0.01) and very high true positive rate of 0.999. All experiments are conducted on
WEKA data mining and machine learning simulation environment.
Keywords: Machine learning, Spam filtering, Random Forests, Neural Networks, Support
Vector Machines, Naïve Bayes

1.0 Introduction
Recently, unsolicited commercial bulk emails popularly referred to as spam has constituted a
big problem on the internet. The spammer sending the fraudulent emails harvests email
addresses using various websites, viruses and malwares (Awad and Foqaha, 2016). Spam
hinders internet users from maximizing storage capacity and network bandwidth. The
presence of large volume of spam mails in computer networks is detrimental to the effective
usage of email server’s memory, bandwidth, CPU processing speed and user time (Fonseca et
al., 2016). Reports showed that spam mails are accountable for more than 77% of the email
traffic globally (Kaspersky, 2017). Spam emails are very annoying and inimical to users who
have fallen victim of 419 internet mails and other fraudulent practices of sending emails with
the purpose of luring unsuspecting persons to release confidential information such as user
name and passwords, Bank Verification Number (BVN) and credit card numbers. Several
work have been published in literature that proposed various approaches to email spam
filtering. And have been successfully applied to classify emails into either spam or non-spam.
These techniques include probabilistic, decision tree, artificial immune system (Bahgat et al.,
2016), support vector machine (SVM) (Bouguila and Amayri, 2009), artificial neural networks
(ANN) (Cao et al., 2004), and case-based technique (Fdez-Riverola, 2007). It has been
demonstrated that it is possible to use these machine learning techniques to filter out spam
mails by employing content-based filtering approach that have the ability to identify particular

Seminar Series Volume 9(1), 2018 Page 29

Dada & Joseph: Random Forests Machine Learning Technique for Email Spam Filtering
features in email messages (usually keywords frequently used in spam emails). The frequency
at which these features occur in emails determine the likelihood that the email will be classified
as spam when measured against the threshold value. Email messages that exceed the threshold
value are classified as spam (Mason, 2003). Karthika and Visalakshi (2015) compared the
performance of hybridized ACO and SVM with KNN, NB and SVM algorithms on spambase
dataset taken from UCI repository. Awad and Foqaha (2016) evaluated the performance of
PSO, RBFNN, MLP and ANN using the UCI spambase dataset. Sharma and Suryawanshi
(2016) compared the performance of kNN with spearman and kNN with Euclidean using the
spambase dataset taken from UCI repository. Awad and ELseuofi (2011) reviewed six state of
the art machine learning methods (Bayesian classification, k-NN, ANNs, SVMs, Artificial
Immune System and Rough sets) and their applicability to the problem of spam email
classification. Alkaht and Al-Khatib (2016) compared the performance of NN, MLP, Perceptron
on dataset based on randomly collected emails. Dhanaraj and Palaniswami (2014) evaluated
the performance of Firefly, NB, NN and PSO algorithm on CSDMC2010 spam corpus dataset.
Palanisamy, Kumaresan and Varalakshmi (2016) compared the performance of NSA, PSO,
SVM, NB and DFS-SVM using Ling spam dataset. Zavvar, Rezaei and Garavand (2016)
compared the performance of PSO, SOM, kNN and SVM on spambase datasets retrieved from
UCI repository. Sosa (2010) evaluated the performance of Sinespam, a spam classification
technique using machine learning a corpus of 2200 e-mails from several senders to various
receivers gathered by the ISP. Akshita (2016) applied the Deep Learning technique to content
based spam classification. The author used DL4J deep network on PU1, PU2, PU3, PUA and
Enron spam datasets. The main problem with many of these techniques discussed above is the
low performance of the filters and there is need to increase the classification accuracy of the
filters. Also, many of them are not robust and find it difficult to cope with the evolving nature
of spams.

2.0 Materials and Methods

Majority of the email spam filtering methods uses text categorization approaches.
Consequently, spam filters perform poorly and cannot efficiently prevent spam mails from
getting to the inbox of the users. This work employs, rules using Random Forests (RFs)
algorithm to extract important features from emails, and classify the emails into either ham,
spam or normal. The Enron spam dataset was used as the benchmark dataset. The Random
Forests machine learning algorithm was simulated using WEKA (Wang, 2005). WEKA have a
set of machine learning algorithms that can be used for data preprocessing, classification,
regression, clustering and association rules. Machine learning techniques implemented in
WEKA are helpful in solving different real world problems. The toolkit provides a well-
defined structure for researchers and developers to experiment with different machine
learning algorithms, to build and evaluate their models. All experiments were conducted on a
machine with a AMD A 10-7300 Radeon R6, 10 Compute Cores 4C+6G, 1.90 GHz, 8.00GB of
RAM.

2.1 Random Forests

Random forests (RFs) is a classic example of ensemble learning and regression technique
suitable for solving data classification problems (Akinyelu and Adewumi, 2016). Breiman and
Cutler (2007) proposed the RFs algorithm. The algorithm classify data into different classes
Seminar Series, Volume 9(1), 2018 Page 30
Dada & Joseph: Random Forests Machine Learning Technique for Email Spam Filtering
using decision trees. During the training phase, some of decision trees are created and later
used for the classification tasks. This works by considering the elected class of individual trees
and the class with the highest number of vote is considered as the final result. RF algorithm
has become very popular over the years and it is being applied to solve analogous problems
in various fields of human endeavor (Fette et al., 2007), (Koprinska, et al., 2007) and (Whittaker
et al., 2010). Random forests have several advantages such as: reduced classification error and
better f-scores when compared to some other machine learning techniques. Moreover, its
performance is generally as good as or even superior to that of SVMs. It can efficiently handle
unbalanced data sets that have missing values. It serves as an efficient algorithm for calculating
the estimated value of missing data and maintaining accuracy of the data in circumstances
where a significant proportion of the data are missing. The training time for RFs is usually
shorter compared to that of SVMs and Neural Networks (though this depends on individual
implementation). RF is better than most of the existing machine learning algorithms in terms
of accuracy. Its performance in large databases is very good. It can efficiently process hundreds
of thousands of input variables. RF creates an internal unbiased prediction of the collective
error during forest cultivation. It provides approach for soothing errors in population class
that have bias data sets. RFs also have the ability to effectively process unlabeled data making
it a very appropriate technique for clustering unlabeled data. Random forests is not
complicated and it uses fewer parameters when compared to the number of observations. RFs
permits the user to cultivate as many trees as possible at a high speed. RFs classify a new data
from an input vector by enlisting the input vector near individual trees in the forest. Each tree
carries out its classification which is usually known as the tree "votes" for that class. The forest
selects the class with the overall highest votes in the forest. The steps for cultivating trees are
outlined below:
1. Suppose N is the number of training instances, randomly representing N instances
which can be substituted from the existing data. Such instances are used as the training
set for growing the tree.
2. Suppose there are P input variables, a number p<<P is specified such that for each of
the corresponding node, p variables are randomly selected from P and the finest portion
on p is used to partition the node so that p now have a fixed value all through the period
of growing the forest.
3. Pruning is prohibited as each tree is cultivated to the biggest feasible level possible.
A tree is referred to as a strong classifier when it has a small error rate. Moreover, the error
rate of the forest reduces as the concentration of each trees in the forest increases. Reducing
the value of p mutually decreases the relationship and the power of the forest while enhancing
the value of p increases both in the area of the best boundary of p which is usually very
expansive. The value of p can be computed using the Out-of-bag (OOB) error (also known as
out-of-bag estimate) a value of p within the limit can be located promptly. This is the only
numerical factor that the random forests is slightly susceptible to its fine-tuning.
The algorithm below concisely outlined the steps required for the creation of forest trees.
Start RF Algorithm
Input: X: number of nodes
N: number of features
Y: number of trees to be grown
Output: G: the class with the highest number of vote
Seminar Series, Volume 9(1), 2018 Page 31
Dada & Joseph: Random Forests Machine Learning Technique for Email Spam Filtering
While stopping criteria is not true do
Select a self-starting sample S randomly from the training data Y
Create tree R𝑖 from the selected self-starting sample S using the steps below:
(1) Select n features randomly from N; where n≪N
(2) Calculate the best dividing point for node d among the n features
(3) Divide the parent node to two offspring nodes through the optimal divide
(4) Execute steps 1-3 till the maximum number of nodes (x) is created
Create your forest by iterating steps 1-4 for Y number of times
EndWhile
Produce output for each created trees {Rt}1Y
Use a new sample for each created trees starting from the root node
Assign the sample to the class matching the leaf node.
Merge the votes or results of every tree
Output the class with the highest number of vote (G).
End RF Algorithm

2.2 Dataset Used for Experiment

The Enron spam datasets was used for our experiment (Koprinska, et al., 2007). The Enron
spam datasets from the Enron corporation is used in this study. There are 5180 emails as
dataset in three folders: norm for normal, ham for non-spam and spam for Spam emails. Enron
has 5180 instances, 3672 ham, 8 norm, and 1500 spam emails. The dataset features are as
follows:
i. Some specific word or character was recurrent in the emails.
ii. The run-length attributes (55-57) measure the length of sequences of consecutive
capital letters.

2.3 Data Normalization Process

The original dataset used in our experiments consists of 5180 text files. The data contained in
those files are not normalized. This means that they have to be normalized before it can serve
as input to WEKA. It is required that all data be converted to one .arff file before it can be given
to WEKA for training. To achieve this, we use the following command in command line
interface of WEKA.
“java weka.core.converters.TextDirectoryLoader -dir D:/Enron > D:/Spam_mails.arff”
After the normalization process, the normalized file was given to WEKA for pre-processing.

2.4 Feature Extraction

After the pre-processing phase comes the feature extraction. Feature extraction is the process
of choosing a subset of the terms occurring in the training set and using only this subset as
features in text classification. This is achieved using some set of rules. Feature extraction makes
training and applying a classifier more efficient by decreasing the size of the effective
vocabulary. And also usually enhances classification accuracy by removing noise features.
Some of the important email features we used for our spam filtering include: Message body
and subject, Volume of the message, Occurrence count of words, Number of semantic
discrepancies patterns in the message, Recipient age, Sex and country, Recipient replied, Adult

Seminar Series, Volume 9(1), 2018 Page 32

Dada & Joseph: Random Forests Machine Learning Technique for Email Spam Filtering
content, Bag of words from the message content, Domain name, IP Address, More blank lines
in body.

3.0 Results and Discussion

This section presents the results of the experiments performed. The Random Forest algorithm
was applied to classify and evaluate the dataset, we used the 10-fold cross validation test which
is an approach employed in appraising predictive models that divide the original set into a
training sample to train the model, and a test set for its evaluation. Firstly, the training of the
datasets was performed with the feature vectors extracted by analyzing each message header,
checking of keywords and whitelist/blacklist. The performance of the trained models is
evaluated using 10-fold cross validation for its classification accuracy. Classification accuracy
is one of the performance metrics for email spam classification. It is measured as the ratio of
number of correctly classified instances in the test dataset and the total number of test cases.
In spam filtering, false negatives mean that some spam mails were wrongly classified as non-
spam and allowed to enter the user’s inbox. False positive mean that non-spam emails were
mistakenly classified as spam and moved to spam folder or discarded. For most users,
erroneously classifying valid emails as spam can be very costly than receiving spam mails in
their inbox. The false positive rate is also one of the performance metrics used in evaluating
the effectiveness of email spam filter. Depicted in figure 1 below is the screen shot of our output
on WEKA simulation environment.

Fig. 1: Screen shot of Random Forests classification output for Enron spam emails datasets
3.1 Effectiveness
In this section, we evaluate the effectiveness of all machine learning classifiers in terms of time
taken to create the model, correctly classified instances, incorrectly classified instances and
classification accuracy. The results are shown in Table 1.
Seminar Series, Volume 9(1), 2018 Page 33
Dada & Joseph: Random Forests Machine Learning Technique for Email Spam Filtering
Table 1: Performance Evaluation of RFs Algorithm
Evaluation Criteria
Time taken to create model(s) 17.75
Correctly classified instances 5176
Incorrectly classified instances 4
Accuracy (%) 99.92

To do a fair and better performance evaluation of the machine learning algorithms we are
considering, simulation error is also taken into account in this work. The effectiveness of these
algorithms is assessed using the following terms: Kappa statistic (KS), Mean Absolute Error
(MAE), Root Mean Squared Error (RMSE), Relative Absolute Error (RAE), Root Relative
Squared Error (RRSE). The KS, MAE and RMSE are in numeric values. RAE and RRSE are in
percentage. The results are shown in Table 2.

Table 2. Training and Simulation Error of RFs Algorithms

Evaluation Criteria RFs
Kappa Statistics (KS) 0.9981
Mean Absolute Error (MAE) 0.0296
Root Mean Square Error (RMSE) 0.06
Relative Absolute Error (RAE) % 10.7404
Root Relative Squared Absolute Error (RRAE) % 16.1506

3.2 Efficiency
After creating the predictive model, the efficiency of the RFs algorithm was evaluated as
shown in table 3 below.

Table 3. Performance evaluation of RFs algorithms based on TPR, FTR, Precision, and F-
Score
Technique TPR FPR Precision F-Score Class
RFs 0.999 0.001 1.000 0.998 Ham
1.000 0.000 1.000 1.000 Norm
0.999 0.001 0.998 0.998 Spam

From Table 1, it took RFs about 17.75 sec to create its model. The RFs has a classification
accuracy of 99.92%. It is also clear from the results that RFs has performed excellently in term
of very high correctly classified instances and very low number of incorrectly classified
instances. The training and simulation error depicted in table 2 shows that RFs produced an
excellent classification result (0.9992%) and very low error rate (0.0296). Once the model has
been created, the next step is to analyse the results generated to determine the efficiency of the
algorithms under consideration. Table 3 indicates that RFs have very good result in term of
TPR, FTR, Precision and F-Score for ham, norm and spam classes. Below in table 4 is the
confusion matrices of the RFs algorithm which also provide a practical way for assessing the
performance of the classifiers, each row of the table denotes actual rates of the class whereas
each column indicates the predictions

Seminar Series, Volume 9(1), 2018 Page 34

Dada & Joseph: Random Forests Machine Learning Technique for Email Spam Filtering
Table 4. Confusion Matrix for RFs Algorithm
Ham Norm Spam Class
RFs 3669 0 3 Ham
0 8 0 Norm
1 0 1499 Spam

From the table 4 above, RFs accurately predicts 3669 instances out of 3672 instances (3669 ham
instances that are truly ham and 1 spam instance that is really spam), and 3 instances wrongly
predicted (3 instances of ham class predicted as spam). From our experiments it is clear that
RFs performed excellently in term of effectiveness and efficiency considering its classification
accuracy, TPR, FPR, precision and F-score. It also correctly predicts 1499 instances out of 1500
instances (1499 spam instances that are truly spam and 1 ham instance that is really spam), and
3 instances wrongly predicted (3 instances of spam class predicted as ham).

4.0 Conclusion
Many of the existing email spam filtering techniques cannot effectively handle some of the
spams been sent on daily basis by spammers. This is because spammers kept on inventing
more sophisticated techniques for evading detection by spam filter. With continuous adoption
of new technique by spammers, email spam filtering has become a hot research area for
researchers. In this study, we proposed Random Forests algorithm for effective and efficient
email spam filtering. And evaluated the performance of RFs algorithm on Enron spam datasets
using accuracy, TPR, FPR, precision and F-measure to determine the effectiveness and
efficiency of the algorithm. We conclude by stating that RFs is a promising algorithm that can
be adopted either at mail server or at mail client side to further decrease the volume of spam
messages in email users inbox.

References
Akinyelu A. A., and Adewumi A.O. (2016). Classification of Phishing Email Using
Random Forest Machine Learning Technique. Journal of Applied Mathematics, 2014, 6,
Article ID 425731, Retrieved on July 12, 2017 from
http://dx.doi.org/10.1155/2014/425731
Akshita T. (2016). Content Based Spam Classification- A Deep Learning Approach. A
Thesis Submitted to The Faculty Of Graduate Studies, University Of Calgary, Alberta,
Canada.
Alkaht I.J., Al-Khatib B. (2016). Filtering SPAM Using Several Stages Neural Networks.
International Review on Computers and Software, 11, 2.
Awad M. and Foqaha M. (2016). Email Spam Classification Using Hybrid Approach of RBF
Neural Network and Particle Swarm Optimization. July 2016 International Journal of
Network Security & Its Applications 8(4):17-28. DOI: 10.5121/ijnsa.2016.8402
Awad W.A. and Elseuofi S.M. (2011). Machine Learning Methods for Spam E-mail
Classification. International Journal of Computer Science and Information Technology,
3(1):173–184.
Bahgat E.M., Rady S. and Gad W. (2016). An e-mail filtering approach using classification
techniques. In The 1st International Conference on Advanced Intelligent System and
Informatics (AISI2015), November 28-30, 2015, BeniSuef, Egypt, Springer International
Seminar Series, Volume 9(1), 2018 Page 35
Dada & Joseph: Random Forests Machine Learning Technique for Email Spam Filtering
Publishing, 321-331.
Bouguila N. and Amayri O. (2009) ..A discrete mixture-based kernel for SVMs: application
to spam and image categorization, Information Processing & Management, 45(6): 631-
642.
Breiman L, Cutler A (2007). Random forests-classification description, Department of
Statistics Homepage, 2007, http://www.stat.berkeley.edu/∼breiman/RandomForests
/cchome.htm.
Cao Y, Liao X, Li Y (2004). An e-mail filtering approach using neural network, In
International Symposium on Neural Networks, Springer Berlin Heidelberg, 688-694.
Dhanaraj KR, Palaniswami V (2014). Firefly and Bayes Classifier for Email Spam
Classification in a Distributed Environment. Australian Journal of Basic and Applied
Sciences, 8(17):118-130.
Fdez-Riverola F, Iglesias EL, Diaz F, Méndez JR, Corchado JM (2007). SpamHunting: An
instance-based reasoning system for spam labelling and filtering, Decision Support
Systems, 43(3):722-736.
Fette I, Sadeh N, Tomasic A (2007). Learning to detect phishing emails, in Proceedings of
the 16th International World Wide Web Conference (WWW ’07), 649–656, Alberta,
Canada, May 2007.
Fonseca DM, Fazzion OH, Cunha E, Las-Casas I, Guedes PD, Meira W, Chaves M (2016).
Measuring Characterizing, and Avoiding Spam Traffic Costs. IEEE Internet
Computing, 99.
Karthika R, Visalakshi P (2015). A Hybrid ACO Based Feature Selection Method for Email
Spam Classification. WSEAS Transaction on Computers, 14, pp. 171-177.
Kaspersky lab Spam Report (2017) .Visited on May 15, 2018
https://www.securelist.com/en/ analysis/204792230/Spam_Report_April_2012,
2012.
Koprinska I., Poon J., Clark J., Chan J. (2007). Learning to classify e-mail, Information
Sciences, 177(10): 2167–2187.
Mason S (2003). New Law Designed to Limit Amount of Spam in E-Mail.
http://www.wral.com/technolog
Sharma A. and Suryawansi A. (2016). A Novel Method for Detecting Spam Email using
KNN Classification with Spearman Correlation as Distance Measure. International
Journal of Computer Applications, 136 (6):28-34
Sosa J.N. (2010). Spam Classification using Machine Learning Techniques – Sinespam.
Master of Science Thesis. Master in Artificial Intelligence (UPC-URV-UB).
Wang X. (2005). Learning to classify email: A survey. Proceedings of 2005 International
Conference on Machine Learning and Cybernetics.
Whittaker C., Ryner B., Nazif M. (2010). Large-scale automatic classification of phishing
pages. In: Proceedings of the 17th Annual Network & Distributed System Security
Symposium (NDSS ’10), The Internet Society, San Diego, Calif., USA.
Zavvar M., Rezaei M., Garavand S. (2016) Email Spam Detection using Combination of
Particle Swarm Optimization and Artificial Neural Network and Support Vector
Machine. International Journal of Modern Education and Computer Science, pp. 68-
74

Seminar Series, Volume 9(1), 2018 Page 36

Evaluating A New Generation of Expansive Claims About Vote Manipulation
No ratings yet
Evaluating A New Generation of Expansive Claims About Vote Manipulation
57 pages
A Study On Spam Classification Using Machine Learning Techniques
No ratings yet
A Study On Spam Classification Using Machine Learning Techniques
14 pages
Spam Email Using Machine Learning
No ratings yet
Spam Email Using Machine Learning
13 pages
Jebin 2
No ratings yet
Jebin 2
22 pages
46_ijme...Mech Engg..Research Paper-1
No ratings yet
46_ijme...Mech Engg..Research Paper-1
10 pages
A Study of Machine Learning Algorithms On Email Spam Classification
No ratings yet
A Study of Machine Learning Algorithms On Email Spam Classification
10 pages
Id - 3747 - Literature Review
No ratings yet
Id - 3747 - Literature Review
3 pages
Pending Proj
No ratings yet
Pending Proj
37 pages
Elshoush 2019
No ratings yet
Elshoush 2019
6 pages
VBK23 Cse 041
No ratings yet
VBK23 Cse 041
6 pages
Hybrid Machine Learning Based E-Mail Spam Filtering Technique
100% (2)
Hybrid Machine Learning Based E-Mail Spam Filtering Technique
58 pages
Synopsis Email Spam
No ratings yet
Synopsis Email Spam
9 pages
122 14211291439 13 PDF
No ratings yet
122 14211291439 13 PDF
5 pages
Research Paper Spam Detection
No ratings yet
Research Paper Spam Detection
4 pages
44 Decision Tree Model for Email Classification
No ratings yet
44 Decision Tree Model for Email Classification
4 pages
Email Spam Detection (Research Paper)
No ratings yet
Email Spam Detection (Research Paper)
8 pages
Survey On Spam Filtering in Text Analysis: Saksham Sharma, Rabi Raj Yadav
No ratings yet
Survey On Spam Filtering in Text Analysis: Saksham Sharma, Rabi Raj Yadav
7 pages
Email Classification Using Naive Bayes Classifier: Domain Algorithms Framework Platform
No ratings yet
Email Classification Using Naive Bayes Classifier: Domain Algorithms Framework Platform
7 pages
Moutafis EWS 098
No ratings yet
Moutafis EWS 098
8 pages
Survey On Spam Filtering in Text Analysis: Saksham Sharma, Rabi Raj Yadav
No ratings yet
Survey On Spam Filtering in Text Analysis: Saksham Sharma, Rabi Raj Yadav
7 pages
E-Mail Spam Detection
No ratings yet
E-Mail Spam Detection
8 pages
1822 b Deleted Merged Cropped
No ratings yet
1822 b Deleted Merged Cropped
40 pages
Enhancing Email Security with Naïve Bayes Spam Detection.docx Fully edited
No ratings yet
Enhancing Email Security with Naïve Bayes Spam Detection.docx Fully edited
64 pages
Spam Detection
No ratings yet
Spam Detection
4 pages
Decision Tree Model For Email Classification: Ivana Čavor
No ratings yet
Decision Tree Model For Email Classification: Ivana Čavor
4 pages
Saurabh
No ratings yet
Saurabh
26 pages
EMAIL+SPAM+DETECTION Final Fishries++ (2658+to+2664) - 1
No ratings yet
EMAIL+SPAM+DETECTION Final Fishries++ (2658+to+2664) - 1
7 pages
Email Spam Detection Using Machine Learning
No ratings yet
Email Spam Detection Using Machine Learning
2 pages
Considering Behavior of Sender in Spam Mail Detection: S. Naksomboon, C. Charnsripinyo and N. Wattanapongsakorn
No ratings yet
Considering Behavior of Sender in Spam Mail Detection: S. Naksomboon, C. Charnsripinyo and N. Wattanapongsakorn
5 pages
Using Support Vector Machine For Classification and Feature Extraction of Spam in Email
No ratings yet
Using Support Vector Machine For Classification and Feature Extraction of Spam in Email
7 pages
A hybrid machine learning approach for spam and malware
No ratings yet
A hybrid machine learning approach for spam and malware
14 pages
Email (Research) 3
No ratings yet
Email (Research) 3
7 pages
1822 b Deleted
No ratings yet
1822 b Deleted
38 pages
Published Paper
No ratings yet
Published Paper
9 pages
Madhavan_2021_IOP_Conf._Ser.__Mater._Sci._Eng._1022_012113
No ratings yet
Madhavan_2021_IOP_Conf._Ser.__Mater._Sci._Eng._1022_012113
12 pages
Efficient Spam Classification by Appropriate Feature Selection
No ratings yet
Efficient Spam Classification by Appropriate Feature Selection
17 pages
Spam Detection in Email Using Machine Le
No ratings yet
Spam Detection in Email Using Machine Le
8 pages
(IJCST-V11I2P16) :shikha, Jatinder Singh Saini
No ratings yet
(IJCST-V11I2P16) :shikha, Jatinder Singh Saini
9 pages
Presentation 3
No ratings yet
Presentation 3
13 pages
Amrit Science Campus: Submitted by
No ratings yet
Amrit Science Campus: Submitted by
35 pages
ICavor Paper IT2021
No ratings yet
ICavor Paper IT2021
4 pages
Review (2) - Machine Learning For SPAM Detection 2023
No ratings yet
Review (2) - Machine Learning For SPAM Detection 2023
13 pages
Kongunadu College of Engineering and Technology: Automated Spam Filtering: A Fuzzy Similarity Approach
No ratings yet
Kongunadu College of Engineering and Technology: Automated Spam Filtering: A Fuzzy Similarity Approach
6 pages
Evaluating the Effectiveness of Machine Learning Methods for
No ratings yet
Evaluating the Effectiveness of Machine Learning Methods for
8 pages
Spam Detection Using BERT
No ratings yet
Spam Detection Using BERT
6 pages
Emai Spam Detection Using Machine Learning and Python - IJRPR3714
No ratings yet
Emai Spam Detection Using Machine Learning and Python - IJRPR3714
6 pages
ETCW15
No ratings yet
ETCW15
4 pages
(IJCST-V12I1P3) :ipsita Panda, Sidharth Dash
No ratings yet
(IJCST-V12I1P3) :ipsita Panda, Sidharth Dash
6 pages
Spam_filtering_on_social_media_using_machine_learning_ijariie21244
No ratings yet
Spam_filtering_on_social_media_using_machine_learning_ijariie21244
6 pages
Machine Learning Paper-2
No ratings yet
Machine Learning Paper-2
4 pages
Spam - Defence - Format - To Check
No ratings yet
Spam - Defence - Format - To Check
19 pages
A Model To Detect Spam Email Using Support Vector Classifier and Random Forest Classifier
No ratings yet
A Model To Detect Spam Email Using Support Vector Classifier and Random Forest Classifier
11 pages
Spam Classification Based On Supervised Learning U
No ratings yet
Spam Classification Based On Supervised Learning U
6 pages
Spam Mail Detection5x9,x8,w6
No ratings yet
Spam Mail Detection5x9,x8,w6
11 pages
IJRPR8167
No ratings yet
IJRPR8167
7 pages
Feature Selection and Similarity Coefficient Based Method For Email Spam Filtering
No ratings yet
Feature Selection and Similarity Coefficient Based Method For Email Spam Filtering
4 pages
1 s2.0 S0950705106001390 Main
No ratings yet
1 s2.0 S0950705106001390 Main
6 pages
E-Mail Spam Detection and Classification Using SVM and Feature Extraction
No ratings yet
E-Mail Spam Detection and Classification Using SVM and Feature Extraction
5 pages
E-Mail Spam Detection Using Machine Learning Naive Bayes Theorem
No ratings yet
E-Mail Spam Detection Using Machine Learning Naive Bayes Theorem
5 pages
Multi-Purpose Chat Bot: Team Formation Team Members
No ratings yet
Multi-Purpose Chat Bot: Team Formation Team Members
15 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Cyber Threat Discovery From Dark Web
No ratings yet
Cyber Threat Discovery From Dark Web
10 pages
Artificial Intelligence in Geotechnical Engineering
No ratings yet
Artificial Intelligence in Geotechnical Engineering
35 pages
Building A Malware Detection System Based On A Mac
No ratings yet
Building A Malware Detection System Based On A Mac
6 pages
Cancers 15 00569 v2
No ratings yet
Cancers 15 00569 v2
13 pages
Boook of Presentations of Svit
No ratings yet
Boook of Presentations of Svit
51 pages
Project Report
No ratings yet
Project Report
39 pages
Handout9 Trees Bagging Boosting
100% (1)
Handout9 Trees Bagging Boosting
23 pages
Rouse Final
No ratings yet
Rouse Final
8 pages
Prediction and Reliability Analysis of Shear Stren
No ratings yet
Prediction and Reliability Analysis of Shear Stren
16 pages
Maximising Operational Uptime: A Strategic Approach To Mitigate Unplanned Machine Downtime and Boost Productivity Using Machine Learning Techniques
No ratings yet
Maximising Operational Uptime: A Strategic Approach To Mitigate Unplanned Machine Downtime and Boost Productivity Using Machine Learning Techniques
13 pages
Solar PV Module Fault Classification Using Artificial Intelligence and Machine Learning Techniques
No ratings yet
Solar PV Module Fault Classification Using Artificial Intelligence and Machine Learning Techniques
18 pages
2022 - Predicting Nominal Capacity of RC Wall in Building
No ratings yet
2022 - Predicting Nominal Capacity of RC Wall in Building
23 pages
Credit Card Project Review
No ratings yet
Credit Card Project Review
59 pages
Machine Learning Practical
No ratings yet
Machine Learning Practical
59 pages
Gold Price Prediction System
No ratings yet
Gold Price Prediction System
8 pages
Project Presentation Template - Business Interpretation
No ratings yet
Project Presentation Template - Business Interpretation
29 pages
Commonly Used Machine Learning Algorithms (With Python and R Codes)
No ratings yet
Commonly Used Machine Learning Algorithms (With Python and R Codes)
19 pages
Report Editing
No ratings yet
Report Editing
17 pages
Diabetic Retinopathy Detection Using Machine Learn
No ratings yet
Diabetic Retinopathy Detection Using Machine Learn
7 pages
Customer Churn Prediction in Telecom Sector Using Machine Learning Techniques
No ratings yet
Customer Churn Prediction in Telecom Sector Using Machine Learning Techniques
16 pages
ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING QUESTION BANK (1)
No ratings yet
ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING QUESTION BANK (1)
23 pages
2025 Ensemble Learning.docx
No ratings yet
2025 Ensemble Learning.docx
25 pages
Face Recognition Based Attendance System Using Opencv
No ratings yet
Face Recognition Based Attendance System Using Opencv
64 pages
Financial Churn Modeling
No ratings yet
Financial Churn Modeling
20 pages
ML Unit 1
No ratings yet
ML Unit 1
27 pages
Identifying Flood Prediction Using Machine Learning Techniques
No ratings yet
Identifying Flood Prediction Using Machine Learning Techniques
4 pages
minerali-et-al-2020-evaluation-of-assay-central-machine-learning-models-for-rat-acute-oral-toxicity-prediction
No ratings yet
minerali-et-al-2020-evaluation-of-assay-central-machine-learning-models-for-rat-acute-oral-toxicity-prediction
8 pages
COMPAG-S-24-08520
No ratings yet
COMPAG-S-24-08520
18 pages
MentalRiskES IberLEF 2023 TextualTherapists
No ratings yet
MentalRiskES IberLEF 2023 TextualTherapists
18 pages

Random Forests Machine Learning Technique For Email Spam Filtering E. G. Dada and S. B. Joseph

Uploaded by

Random Forests Machine Learning Technique For Email Spam Filtering E. G. Dada and S. B. Joseph

Uploaded by

University of Maiduguri

Faculty of Engineering Seminar Series

Seminar Series Volume 9(1), 2018 Page 29

2.0 Materials and Methods

2.1 Random Forests

2.2 Dataset Used for Experiment

2.3 Data Normalization Process

2.4 Feature Extraction

Seminar Series, Volume 9(1), 2018 Page 32

3.0 Results and Discussion

Table 2. Training and Simulation Error of RFs Algorithms

Seminar Series, Volume 9(1), 2018 Page 34

Seminar Series, Volume 9(1), 2018 Page 36

You might also like