2 Utilizing Improved Bayesian Algorithm To Identify Blog Comment Spam

The document describes an improved Bayesian algorithm for identifying spam in blog comments. It analyzes the deficiencies of traditional Bayesian algorithms and modifies the algorithm to calculate spam probability based on the frequency of strings in spam and non-spam samples. The experimental results show the modified algorithm effectively improves the classification accuracy of spam comments.

Uploaded by

Om Chandwadkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views4 pages

2 Utilizing Improved Bayesian Algorithm To Identify Blog Comment Spam

Uploaded by

Om Chandwadkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

2012 IEEE Symposium on Robotics and Applications(ISRA)

Utilizing Improved Bayesian Algorithm to Identify Blog Comment Spam

LI Aiwu LIU Hongying

Dept. of Computer Science Dept. of Computer Science and Engineering
Guangdong Vocational College of Posts and Telecom Guangzhou Vocational & Technical Institute of
Guangzhou, China Industry & Commerce
e-mail: [email protected] Guangzhou, China
e-mail: [email protected]

Abstract—In this paper, according to the blog website team algorithm, such as TAN (tree augmented Bayes network)
dealing with comment spam demand more and more, analyzed algorithm [4-5].
the traditional Bayesian algorithm based on statistical method Compared with all other classification algorithm in
of defects, pointed out the deficiency in practical application, theory, has the lowest Bayesian model of error rate. However,
improved the rough Bayesian algorithm, utilized a string of in practice is not always the case. This is because of its
comments appear based on the garbage probability value, application assumption (such as kind of condition
calculated the number of geometrical average algorithm. The independence) inaccurate, and the probability of lack of
experimental results show that the modified Bayesian available data creates. However a variety of the experiment,
classification algorithm can effectively improve the
and decision tree and neural network classification algorithm
classification of spam effect, garbage afr, legal review
comments afr and average afr has dropped substantially.
and, in some areas, this classification algorithm can be
compared with it, in addition, Bayes classification is also can
Keywords-blog comment spam; Bayesian algorithm; be used for not directly use the Bayesian theorem
geometric mean algorithm classification algorithm of other provide theoretical
determination [6].
Paul Graham proposed a new method to filter spam email
I. INTRODUCTION based on statistical Bayesian algorithm in 2002, making
With the rapid development of the Internet, blog is spam identification accuracy greatly improved. With the rise
becoming one of the most rapid and most economic means of blog services, spam blog comments in the blog become a
of communication. But the blog in became a kind of problem, to identify and shield spam comments have become
information communication tools, at the same time it is increasingly demanding, the content of spam comment is
becoming a large commercial advertising, and useless similar to spam email, Bayesian algorithm can also be used
information carrier, which requires users spend a lot of time to identify spam blog comment [7-8].
and energy to deal with these so-called "junk" comment. The test data in this paper use blog comments sample
How to pull the blog comments as filtering is concerned library of Institute for Information and Language Processing
about a big problem of users, so "spam" method research is System of University of Amsterdam
an important subject in the processing of blog comment [1-2]. (http://ilps.science.uva.nl/resources/commentspam),
At present some blog system has taken a certain implementing it in Microsoft's C# programming language.
technological means to deal with the rubbish, these
technologies have some shortage or technical not perfect. II. PROBLEM DESCRIPTION
Therefore, studying a kind of effective spam system has the Each sample data set with a n d feature vector to describe
very vital significance [2-3]. the n attribute value, namely: X = {x1 , x2 ,..., xn } , assume m
Bayes classification algorithm is statistical classification
method, it is a kind of using probability statistics knowledge class, respectively for C1 , C 2 ,..., C m said. Given a unknown
classification algorithm. On many occasions, and simple data sample X (i.e. no class label), if the simple Bayesian
Bayesian (Naiumlve Bayes, NB) classification algorithm can classification will unknown sample X assigned to kind of Ci ,
and decision tree and neural network comparable it must be P (Ci | X ) > P (C j | X ) 1 ≤ j ≤ m, j ≠ i .
classification algorithm, the proposed algorithm can use to
large database, and the method is simple, classification Because P( X ) for all kind of constant, maximum a
accuracy is high, the speed. Due to the Bayesian theorem posteriori probability P(Ci | X ) can be converted into the
suppose that a property values to a given the influence of the maximization prior probability P( X | Ci ) P(Ci ) . If the
class independently of other property values, and this
assumption in fact often was not set up, so its classification training data set many properties and tuples, the calculation
accuracy may decline. Therefore, there is a lot of lower the of P( X | Ci ) costs may be very big, therefore, usually
assumption of the independence Bayesian classification hypothesis of each attribute values independent each other,

978-1-4673-2207-2/12/$31.00 嘋2012 IEEE 423

2012 IEEE Symposium on Robotics and Applications(ISRA)

such prior probability P( x1 | Ci ) , P( x2 | Ci ) ,…, P( xn | Ci ) III. IMPROVED BAYESIAN ALGORITHM

can be obtained from the training data set. Modify the meaning of Bcomment and Gcomment in original
According to this method, to an unknown category of algorithm as the total number of strings in two samples
sample X, can be separately calculated the X belongs to
library respectively, and using Bstring and Gstring to denote
every category of Ci probability P( X | Ci ) P(Ci ) , and then
choose one of the largest categories as its probability them. The new algorithm can be shown as follows:
categories. rg = min(1,2( good ( w) / Gstring )) (5)
Simple Bayesian algorithm is the premise of each
attribute was established between independent each other. rb = min(1, bad ( w) / Bstring ) (6)
When data set to meet this independence hypothesis, p spam|w = max(0.01, min(0.99, rb /( rg + rb ))) (7)
classification accuracy is higher, or you could lower. In

∏
addition, the algorithm is no classification rules output. N
The following is the basic steps of Bayesian algorithm to P= P (8)
i =1 spam| wi
identify blog comment spam.
Q=∏
N
First, create two sample libraries, which are composed of (1 − Pspam|wi ) (9)
spam samples and non-spam samples respectively. Suppose i =1
the number of comments in two library is Bcomment Pspam = P /( P + Q ) (10)
and Gcomment , the number of occurrences of a string w in two Use the above formulas, the calculation of rg and rb is
library is good(w) and bad(w).
not affected by comment length.
If a comment includes string w, the spam probability of
The corresponding C# code of above algorithm can be
this comment pspam| w can be shown as follows: implemented as following:
rg = min(1,2( good ( w) / Gcomment )) (1) This algorithm can make the comment spam recognition
rate improved significantly; Comparison of correct
rb = min(1, bad ( w) / Bcomment ) (2) recognition rate of two algorithms is shown in Table 1.
p spam|w = max(0.01, min(0.99, rb /(rg + rb ))) (3) TABLE I. COMPARISON OF CORRECT RECOGNITION RATE OF TWO
ALGORITHMS
To determine whether a comment is spam, we can
compute the p spam| w of every string w, then find the absolute
spam recognition rate of
value of the difference between p spam| w and 0.5. Sort the comment spam recognition rate of
improved Bayesian
number Bayesian algorithm
result value from small to large, and take out the first N algorithm
values. Suppose their pspam| w values are w1 , w2 ,..., wN
400 87.5% 96.75%
respectively, the spam probability of this comment can be
computed as below: 500 84.4% 95.6%

∏
N
P 600 81.17% 94%
i =1 spam| wi
Pspam = (4)
∏ ∏
N N
P + (1 − Pspam|wi ) IV. USING GEOMETRIC MEAN ALGORITHM
i =1 spam| wi i =1
While computing P and Q, use (∏ N (1 − Pspam|wi )) N and
1
If a string is not included in the comment, its pspam| w
i =1
value can be set to 0.4. If the value of Pspam is greater than
∏
N 1
( P )N instead it, which is the geometric mean value
i =1 spam|wi
0.99, the comment is determined as spam, or it is determined of 1 − Pspam| wi and Pspam| wi respectively, can get a new
as non-spam.
After analysis, there are two flaws in above algorithm: algorithm.
(1) While computing rg and rb , the algorithm uses the Following three steps can be used to compute pspam| w ,
number of comments in two sample library. If the number of the spam probability of the comment w:
strings in every comment varies greatly, rg and rb can not rg = min(1,2( good ( w) / Gstring )) (11)
reflect the actual situation of the sample libraries, resulting in rb = min(1, bad ( w) / Bstring ) (12)
a low recognition rate.
(2) The spam probability derived through the algorithm p spam|w = max(0.01, min(0.99, rb /(rg + rb ))) (13)
processing will normally be close to 0 or 1, the median value
The corresponding C# code can be implemented as
does not appear, making it difficult to determine the extent of
a suspected spam. following:
private void CalculateTokenProbability(string token)
{
int g=_good.Tokens.ContainsKey(token)?

424
2012 IEEE Symposium on Robotics and Applications(ISRA)

_good.Tokens[token]*Knobs.GoodTokenWeight:0; According to the results of actual test, if Pspam >0.52, the

int comment can be determined as spam.
b=_bad.Tokens.ContainsKey(token)?_bad.Tokens[token]:0; Use above geometric mean algorithm, the recognition
if (g+b>=Knobs.MinCountForInclusion) rate of comment spam can be further improved. Comparison
{ of recognition rate of two algorithms is shown in Table.2.
double
goodfactor=Min(1,(double)g/(double)_ngood); TABLE II. COMPARISON OF RECOGNITION RATE OF IMPROVED
double badfactor=Min(1,(double)b/(double)_nbad); BAYESIAN ALGORITHM AND GEOMETRIC MEAN ALGORITHM
double prob=Max(0.0001,Min(0.9999,
badfactor/(goodfactor+badfactor))); number of
recognition rate of recognition rate of
double prob=badfactor/(goodfactor+badfactor); improved Bayesian geometric mean
comment spam
algorithm algorithm
if (g==0)
{ 400 96.75% 97%
prob=(b>Knobs.CertainSpamCount)?
500 95.6% 96.2%
Knobs.CertainSpamScore:Knobs.LikelySpamScore;
} 600 94% 94.83%
_prob[token]=prob; In addition, we can get a more balanced distribution of
} spam probability, which can be used to determine the extent
} of a suspect comment which has not correctly recognized
In above code, the variale goodfactor denotes rg , being spam comment. Using two algorithm on a sample
library including 600 spam comments respectively, the
badfactor denotes rb , and prob denotes pspam| w . number of spam comments which has not correctly
Using geometric mean value of 1 − Pspam| wi and Pspam| wi , recognized is compared in Table.3
the spam probability of this comment can be obtained TABLE III. SPAM PROBABILITY DISTRIBUTION OF TWO ALGORITHM
through following steps:

∏
N 1
P = 1− ( (1 − Pspam|wi )) N (14)
using geometric mean
using Bayesian algorithm
i =1 algorithm

Q = 1 − (∏
N 1
Pspam|wi ) N (15) probability scope
number of
probability scope
number of
i =1 comments comments
Pspam = P /( P + Q) (16)
Pspam >0.5 12 0.82< Pspam <0.98 13
The corresponding C# code can be immplemented as
following: 0.4< Pspam <0.5 14 0.3< Pspam <0.53 3
double p,q,s;
double mult=1; Pspam <0.4 5 0< Pspam <0.2 20
double comb=1;
int index=0; V. CONCLUSION
foreach (string key in probs.Keys)
The traditional Bayesian algorithm use the number of
{
comments in the spam sample library and non-spam library
double prob=(double)probs[key];
as the calculation basis, the comment length have
mult=mult*prob;
considerable impact on the identification results, resulting in
comb=comb*(1-prob);
low spam recognition rate. This paper use respective total
if (++index>Knobs.InterestingWordCount)
number of strings in two samples library to improve the
break;
efficiency of Bayesian algorithm, resulting in a substantial
}
increase in the spam recognition rate. On this basis, use the
p=1-Math.Pow(comb,(double)1/(double)index);
geometric mean algorithm instead of Bayesian algorithm to
q=1-Math.Pow(mult,(double)1/(double)index);
further improve the spam recognition rate, and makes the
s=p/(p+q);
spam probability distribution of comments that have not
return s;
correctly identified more balanced which can be used to
In above code, the variable mult used for holding
determine the extent of a suspect comment being spam
∏ ∏
N N
P , and comb for holding (1 − Pspam| wi ) . comment.
i =1 spam| wi i =1
variable p and q denotes the geometric mean value of
1− Pspam| wi and Pspam | wi respectively. REFERENCES
[1] Abu-Nimeh, S.; Chen, T. “Proliferation and Detection of Blog
Spam”. Security & Privacy. Vol.8,No.5, pp.42-47,2010.

425
2012 IEEE Symposium on Robotics and Applications(ISRA)

[2] Kamaliha E.; Riahi, F.; Qazvinian V.; Adibi, J. “Characterizing [6] Di Michele, S.; Tassa, A.; Mugnai, A.; Marzano, F.S.; Bauer,
Network Motifs to Identify Spam Comments”. 2008 IEEE P.; Baptista, J.P.V.P. “Bayesian algorithm for microwave-based
International Conference on Data Mining Workshops. pp.919-928, precipitation retrieval: description and application to TMI
2008. measurements over ocean”. Geoscience and Remote Sensing.
[3] Fei-Fei Li, Rob Fergus, Pietro Perona. “Learning generative visual Vol.43,No.4, pp.778-791,2005.
models from few training examples: An incremental Bayesian [7] Bhattarai, A.; Rus, V.; Dasgupta, D. “Characterizing comment
approach tested on 101 object categories ”. Computer Vision and spam in the blogosphere through content analysis”. Computational
Image Understanding. Vol.106. No.1, pp.59-70, 2007. Intelligence in Cyber Security. pp.37-44, 2009.
[4] Liwei Wang, Xiao Wang, Jufu Feng. “Subspace distance analysis [8] Beatrice Cynthia Dhinakaran, Dhinaharan Nagamalai and Jae-Kwang
with application to adaptive Bayesian algorithm for face Lee. “Bayesian Approach Based Comment Spam Defending Tool ”.
recognition ”. Pattern Recognition. Vol.39, No.3, pp.456-464, 2006. Lecture Notes in Computer Science.Vol.5576, pp.578-587, 2009.
[5] Byoung-Tak Zhang and Ha-Young Jang. “A Bayesian Algorithm for
In Vitro Molecular Evolution of Pattern Classifiers ”. Lecture Notes
in Computer Science. Vol.3384, pp.720-722, 2005.

426

Naive Bayes Spam Classifier
0% (1)
Naive Bayes Spam Classifier
44 pages
PPT
0% (1)
PPT
15 pages
Windchill REST Services 1.5
No ratings yet
Windchill REST Services 1.5
257 pages
Digital Signal Processing Ppt-1
100% (1)
Digital Signal Processing Ppt-1
12 pages
Sms Spam Filtering Pres
No ratings yet
Sms Spam Filtering Pres
18 pages
RBS 6102 4+4+4 900 and 1800 PDF
96% (23)
RBS 6102 4+4+4 900 and 1800 PDF
16 pages
Interplay Between Probabilistic Classifiers and Boosting Algorithms For Detecting Complex Unsolicited Emails
100% (1)
Interplay Between Probabilistic Classifiers and Boosting Algorithms For Detecting Complex Unsolicited Emails
5 pages
Enhancing Email Security With Naïve Bayes Spam Detection - Docx Fully Edited
No ratings yet
Enhancing Email Security With Naïve Bayes Spam Detection - Docx Fully Edited
64 pages
Improving Naive Bayesian Spam Filtering: Master Thesis
No ratings yet
Improving Naive Bayesian Spam Filtering: Master Thesis
68 pages
Final
No ratings yet
Final
51 pages
DWM Exp5 C49
No ratings yet
DWM Exp5 C49
12 pages
1822 B Deleted
No ratings yet
1822 B Deleted
38 pages
Spam Filter Project Report
No ratings yet
Spam Filter Project Report
16 pages
Youtube Spam Comments Detection
No ratings yet
Youtube Spam Comments Detection
6 pages
Supervised Learningclassification Part3
No ratings yet
Supervised Learningclassification Part3
42 pages
Lec6 Parametricvsnonparametric
No ratings yet
Lec6 Parametricvsnonparametric
29 pages
Spam Detection
No ratings yet
Spam Detection
39 pages
Saurabh
No ratings yet
Saurabh
26 pages
Amrit Science Campus: Submitted by
No ratings yet
Amrit Science Campus: Submitted by
35 pages
Implementation of Naïve Bayesian Spam Filter Algorithm
No ratings yet
Implementation of Naïve Bayesian Spam Filter Algorithm
16 pages
Major Project by Ali (Intrainz)
No ratings yet
Major Project by Ali (Intrainz)
25 pages
Bayesian Inference
No ratings yet
Bayesian Inference
20 pages
Survey On Spam Filtering in Text Analysis: Saksham Sharma, Rabi Raj Yadav
No ratings yet
Survey On Spam Filtering in Text Analysis: Saksham Sharma, Rabi Raj Yadav
7 pages
On Naive Bayes Algorithm
No ratings yet
On Naive Bayes Algorithm
17 pages
Module3 Ids
No ratings yet
Module3 Ids
17 pages
Ba Yes I An Filtering
No ratings yet
Ba Yes I An Filtering
8 pages
Aayush Nihar Spam Mail Filtering
No ratings yet
Aayush Nihar Spam Mail Filtering
18 pages
Kongunadu College of Engineering and Technology: Automated Spam Filtering: A Fuzzy Similarity Approach
No ratings yet
Kongunadu College of Engineering and Technology: Automated Spam Filtering: A Fuzzy Similarity Approach
6 pages
Unit III
No ratings yet
Unit III
10 pages
24 Shivangi DMDW
No ratings yet
24 Shivangi DMDW
12 pages
A Comparison of The Accuracy of Support Vector
No ratings yet
A Comparison of The Accuracy of Support Vector
17 pages
ML6 Naive Bayes Spam Filter
No ratings yet
ML6 Naive Bayes Spam Filter
11 pages
Chung-Kwei Spam IA
No ratings yet
Chung-Kwei Spam IA
18 pages
ETCW15
No ratings yet
ETCW15
4 pages
How To Submit Your Homework: EECS 349 Machine Learning Homework 5
No ratings yet
How To Submit Your Homework: EECS 349 Machine Learning Homework 5
4 pages
A Plan For Spam
No ratings yet
A Plan For Spam
10 pages
A Support Vector Machine Based Naive Bayes Algorithm For Spam Filtering
No ratings yet
A Support Vector Machine Based Naive Bayes Algorithm For Spam Filtering
8 pages
Spam Email Classifier
No ratings yet
Spam Email Classifier
16 pages
Maths Answers
No ratings yet
Maths Answers
4 pages
1 s2.0 S0950705106001390 Main
No ratings yet
1 s2.0 S0950705106001390 Main
6 pages
Detecting Spam Mail With Naive Bayes
No ratings yet
Detecting Spam Mail With Naive Bayes
5 pages
SCIEX QTRAP 5500 System Specification
No ratings yet
SCIEX QTRAP 5500 System Specification
13 pages
Spam Filter Project Report
No ratings yet
Spam Filter Project Report
16 pages
Spam Filtering Using Bayesian Approach: Presented By: Nitin Kumar
No ratings yet
Spam Filtering Using Bayesian Approach: Presented By: Nitin Kumar
11 pages
Lab7&8 NaiveBayes
No ratings yet
Lab7&8 NaiveBayes
5 pages
DM Chapter 3
No ratings yet
DM Chapter 3
6 pages
Aiml Assignment-2
No ratings yet
Aiml Assignment-2
8 pages
A Comparative Performance Evaluation of Content Based Spam and Malicious URL Detection in E-Mail
No ratings yet
A Comparative Performance Evaluation of Content Based Spam and Malicious URL Detection in E-Mail
6 pages
Lab5 NaiveBayes Full
No ratings yet
Lab5 NaiveBayes Full
5 pages
Machine Learning Based Spam Comments Detection On YouTube
No ratings yet
Machine Learning Based Spam Comments Detection On YouTube
6 pages
Elshoush 2019
No ratings yet
Elshoush 2019
6 pages
Detecting Spam Messages Using The Naive Bayes Algorithm of Basic Machine Learning
No ratings yet
Detecting Spam Messages Using The Naive Bayes Algorithm of Basic Machine Learning
3 pages
Spam Filtering Algorithm
No ratings yet
Spam Filtering Algorithm
19 pages
Email Classification Using Naive Bayes Classifier: Domain Algorithms Framework Platform
No ratings yet
Email Classification Using Naive Bayes Classifier: Domain Algorithms Framework Platform
7 pages
Mail Type Spam Classifier: Abstarct
No ratings yet
Mail Type Spam Classifier: Abstarct
9 pages
Related Work
No ratings yet
Related Work
5 pages
Machine Learning Paper-2
No ratings yet
Machine Learning Paper-2
4 pages
HW4 Text-1
No ratings yet
HW4 Text-1
8 pages
Spam Detection Model
No ratings yet
Spam Detection Model
4 pages
Anti-Spam Filter Based On Naïve Bayes, SVM, and KNN Model
No ratings yet
Anti-Spam Filter Based On Naïve Bayes, SVM, and KNN Model
5 pages
Content Based Spam Detection in Email Us PDF
No ratings yet
Content Based Spam Detection in Email Us PDF
5 pages
Acquiring Bank
No ratings yet
Acquiring Bank
6 pages
Big As References
No ratings yet
Big As References
1 page
Reo Guide To Fixed Installation Best Practice
No ratings yet
Reo Guide To Fixed Installation Best Practice
187 pages
ME990-IH-Section 2a - LongBoltFlangeDesignProblems
No ratings yet
ME990-IH-Section 2a - LongBoltFlangeDesignProblems
15 pages
Project Time Management PDF
No ratings yet
Project Time Management PDF
95 pages
Spiral Wound Gasket - Type LS
No ratings yet
Spiral Wound Gasket - Type LS
1 page
Course Unit - Human Flourishing in Science and Technology-Merged
No ratings yet
Course Unit - Human Flourishing in Science and Technology-Merged
24 pages
JD700B User Guide R06.0
No ratings yet
JD700B User Guide R06.0
690 pages
Force Analysis of Spur Gears PDF
No ratings yet
Force Analysis of Spur Gears PDF
5 pages
Accessioning Best Practices v.1.0.2 2025
No ratings yet
Accessioning Best Practices v.1.0.2 2025
103 pages
Brio Ir
No ratings yet
Brio Ir
11 pages
Sify Safescrypt
No ratings yet
Sify Safescrypt
1 page
Gnucash Guide
No ratings yet
Gnucash Guide
226 pages
Full Mobile App Development With Ionic Cross Platform Apps With Ionic Angular and Cordova Griffith Ebook All Chapters
100% (3)
Full Mobile App Development With Ionic Cross Platform Apps With Ionic Angular and Cordova Griffith Ebook All Chapters
38 pages
Institute of Space Technology: Submitted by
No ratings yet
Institute of Space Technology: Submitted by
12 pages
1 s2.0 S2772940024000171 Main1
No ratings yet
1 s2.0 S2772940024000171 Main1
10 pages
Soal
No ratings yet
Soal
14 pages
Work Order Traveler - Tata Standard Work Order
No ratings yet
Work Order Traveler - Tata Standard Work Order
4 pages
5543978
No ratings yet
5543978
2 pages
Candidate Handbook
No ratings yet
Candidate Handbook
66 pages
K-Means and K-NN Methods For Determining Student Interest
No ratings yet
K-Means and K-NN Methods For Determining Student Interest
13 pages
EN - Update0910 - Datasheet BDH-800
No ratings yet
EN - Update0910 - Datasheet BDH-800
2 pages
FAQ Professional Assessment Under Mbot: Prepared by Author: Hrdf/Mbot Creation Date: 12 MAC 2019: 1.0
No ratings yet
FAQ Professional Assessment Under Mbot: Prepared by Author: Hrdf/Mbot Creation Date: 12 MAC 2019: 1.0
5 pages
43 MLD STP at Valak Bl2 Status Report
No ratings yet
43 MLD STP at Valak Bl2 Status Report
2 pages
TH460 Service Report 023832
No ratings yet
TH460 Service Report 023832
1 page
Install build IOAPI 3.2 昏眼看日新浪博客
No ratings yet
Install build IOAPI 3.2 昏眼看日新浪博客
3 pages
Rubrica 3: Conalep 1
No ratings yet
Rubrica 3: Conalep 1
4 pages
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Statistical Classification: Fundamentals and Applications
From Everand
Statistical Classification: Fundamentals and Applications
Fouad Sabry
No ratings yet

2 Utilizing Improved Bayesian Algorithm To Identify Blog Comment Spam

Uploaded by

2 Utilizing Improved Bayesian Algorithm To Identify Blog Comment Spam

Uploaded by

2012 IEEE Symposium on Robotics and Applications(ISRA)

Utilizing Improved Bayesian Algorithm to Identify Blog Comment Spam

LI Aiwu LIU Hongying

978-1-4673-2207-2/12/$31.00 嘋2012 IEEE 423

such prior probability P( x1 | Ci ) , P( x2 | Ci ) ,…, P( xn | Ci ) III. IMPROVED BAYESIAN ALGORITHM

_good.Tokens[token]*Knobs.GoodTokenWeight:0; According to the results of actual test, if Pspam >0.52, the

You might also like