Logistic Regression Classification in Natural Language Processing (NLP) Final
Logistic Regression Classification in Natural Language Processing (NLP) Final
COLLEGE OF INFORMATICS-AKRE
Equation
THE SIGMOID FUNCTION
This is the Sigmoid Function Plot, showing how logistic regression maps input
values to probabilities between 0 and 1. The red dashed line represents the decision
threshold at 0.5.
TURNING A PROBABILITY INTO A CLASSIFIER
if w∙x+b > 0
if w∙x+b ≤ 0
COMPONENTS OF LOGISTIC REGRESSION
Keyword Sender Type Link Count Spam (1) / Not Spam (0)
Discount Company Many 1
Free Unknown Moderate 1
Meeting Individual Few 0
Offer Company Moderate 1
Hello Individual Few 0
9
Apply One-Hot Encoding
Moderate
Unknown
Discount
Meeting
Many
Hello
Offer
Free
Few
(0)
1 0 0 0 0 1 0 0 1 0 0 1
0 1 0 0 0 0 0 1 0 1 0 1
Now,
0 our0 categorical
1 features
0 are
0 transformed
0 into numerical
1 0 0binary0values,1 0
10
which
0 can
0 be used0 in logistic
1 regression.
0 1 0 0 0 1 0 1
LOGISTIC REGRESSION MODEL
Assuming the model has been trained, we define the learned weights as:
11
COMPUTE MODEL OUTPUT
Features (One-Hot
g(z) Predicted y
Encoded)
•Emails with strong spam keywords (like "Discount," "Free," or "Offer") are classified as Spam (1).
•Emails with neutral words (like "Meeting" or "Hello") are more likely to be Not Spam (0).
•The sender type and the number of links also influence classification.
Conclusions