0% found this document useful (0 votes)
26 views19 pages

Protecting The Consumer Web

This document discusses protecting consumer websites from attackers. It covers how attackers can directly or indirectly monetize access to consumer websites. It then describes common types of abuse like account takeover, fraudulent account creation, and bot activity. It proposes using supervised machine learning models to classify account creation requests and detect abuse. Challenges include labeling data, handling false positives and negatives, and addressing large-scale attacks. The document concludes by discussing how clustering techniques can help detect and stop large attacks on consumer websites.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views19 pages

Protecting The Consumer Web

This document discusses protecting consumer websites from attackers. It covers how attackers can directly or indirectly monetize access to consumer websites. It then describes common types of abuse like account takeover, fraudulent account creation, and bot activity. It proposes using supervised machine learning models to classify account creation requests and detect abuse. Challenges include labeling data, handling false positives and negatives, and addressing large-scale attacks. The document concludes by discussing how clustering techniques can help detect and stop large attacks on consumer websites.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 19

CHAPTER 6:

PROTECTING THE CONSUMER WEB

PRESENTING BY:
ANJALI R PILLAI
ANJANA MOHAN
WINLIYA JEWEL SUNNY
INTRODUCTION

• Here, we are considering attackers who use the consumer-facing website or application
functionality to achieve the goal.

• A ‘consumer web’ means any product can be accessible over the public internet.

• Attack surfaces: account access, payment interface, content generation.


MONETIZING CONSUMER WEB

• Consumer-facing websites make it possible for hackers to monetize directly by gaining access to the
account.

• Fraud attempt is another primary concern as the hackers can use stolen credit cards to pay for whatever the
sites offer.

• Click frauds are most common in advertising.

• Review frauds are another in which the users give biased reviews on a product to inflate or deflate its rating.

• Even if it is impossible for hackers to do direct monetization, there are sites that provide indirect
monetization.
TYPES OF ABUSE AND THE DATA THAT CAN STOP THEM

Authentication and Account Takeover


• For making payments or content creation in a site need authentication.

• These may be the use of passwords; these are information that is only known by user.

• Have flaws: using same passwords everywhere, easy to remember, sharing with friends or family.

• Second factor logins are recommended for suspicious logins.

• But it also has some demerits


TYPES OF ABUSE AND THE DATA THAT CAN STOP THEM

Account Creation
• For a secure system, protection of account creation is essential.

• Two approaches: scoring the account creation to delete, lock or to restrict the fake account creation.

• Blocking helps to prevent the damage from the attackers.

• Scoring model has two features: velocity features and reputation scores.
TYPES OF ABUSE AND THE DATA THAT CAN STOP THEM

Bot Activity
• Attackers sometimes collect a great deal of value from a single victim.
• Bot takes some steps :
• Account creation
• Credential stuffing: trying leaked passwords on the login to compromise the account. These
bot tries to stop the access, without leaking the account information.
• Scraping: these bots prevent from serving data to the hackers.
• Ranking fraud: increases view to increase their reach. These are discounted soon after they
occur.
SUPERVISED LEARNING FOR ABUSE
PROBLEMS
• Commercial website attack prevention and detection.
• Creation of account creation classifier.
• We try to classify the account creation request into abuse or good.
LABELING DATA
• Positive and negative samples.
• There are certain risks involved:
• Blindness
• Feedback loop

• To mitigate these risks:


• Oversample the examples in training data.
• Oversample false positives from the model when retraining.
• Under sample positive examples from previous iterations
• If possible, manually label some accounts.
COLD START V/S WARM START
• Cold Start deploys a new machine-learning model without any pre-trained parameters.
• Whereas warm start initializes a model with pre-trained parameters. Thus, it can start with
a foundation based on prior information.
• Different approaches can be implemented:
• Never throw away training data  v1 union v2
• Run simultaneous models
• Sample positives from the deployed v1 model to augment the v2 training data.
FALSE POSITIVES AND FALSE NEGATIVES
• False positives are accounts that our model verified as good but ended up bad.
• They are very hard to detect.
• They are mislabelled.
• False negatives are banned, stating bad but ended up being good.
• False positives can be handled by giving them a fraction of account access.
MULTIPLE RESPONSES
• Threshold value is allotted to the model for verification
• Block if the score is bad.
• Allow the score to be good
• A challenge for the ‘grey area’ score

 Occurrence of false positive!!!


LARGE ATTACKS
• If numerous attacks are made from a single IP address it would confuse the model.
• Overfitting.
• Solution: downsample large attacks,
• Attack with x results -> log(x) results are sampled for training.,
CLUSTERING ABUSE
• Group entities similar to one another.
• Verify each cluster and check whether it is legitimate or abusive.
• How does a cluster become bad?!
-> In some cases, a single bad entity makes the entire entity bad.
-> In some other cases, most cases should be bad to verify it as a bad cluster
Eg IP address
CLUSTERING SPAM DOMAIN
• The objective is to maximize recall and maintain high precision.
• Cluster must be at least of size 10.
• 75% of spam should be present to be considered as a bad cluster.
• This would minimize the chances of good domains getting caught up in clusters of bad
domains.
GENERATING CLUSTERS
• Selecting features for our domains.
• Features ->categorial, numerical, text-based
• Bad things happen in bunches and in clustering if we get good clusters than bad clusters, clustering
is not perfect.
• Best clustering strategy:
• Proportion of clusters that are labeled bad.
• Proportion of domain.
• Recall of bad cluster

Clustering can be done by Grouping and Locality-sensitive hashing


GENERATING CLUSTERS
• Grouping
Grouping is applied to features that have distinct values.
Grouping does not help us much in account classification because bad clusters are
underrepresented.

• Locality-sensitive hashing
It is used to find the approximate matches in large datasets.

• Problems of k-mean
Even though k-mean is the best clustering method it is not used here because the value of K is unknown
here. Another problem related to k-mean is that it does not work with categorial features.
SCORING CLUSTERS
We have to perform labeling and feature extraction to identify abusive and legitimate clusters in a large dataset.
 To label a cluster good or bad, take threshold t and set a value, and if the percent of bad accounts in a cluster is greater than
threshold, it is bad or good.
 In feature extraction to convert cluster into a numerical vector, these features are selected
1. Min, max, median, and quartiles
2. Mean and standard deviation
3. Percent of null or zero values

 Categorial account level features are:


1. Number of distinct values.
2. Percent of values belonging to the mode.
3. Percent of null values
4. Entropy

 For classification, a Random forest classifier, which is a non-linear classifier, is used.


FURTHER DIRECTION IN CLUSTERING

Clustering can be extended into using:


 Different clustering methods
 Different classifiers and parameters
 Adding new features at the item level
 Sampling data
 Updation of weight on items
 Adding a second classifier to detect false positive items in the cluster
CONCLUSION
• We discussed consumer web/apps and how to prevent attackers from accessing those
websites.
• Machine learning has many challenges and one of them is to get true data and the other is
difficulty in balancing between what is known and uncovering new attack techniques
• Combining machine learning and clustering techniques helps to detect and stop large
attacks quickly.

You might also like