Protecting The Consumer Web
Protecting The Consumer Web
PRESENTING BY:
ANJALI R PILLAI
ANJANA MOHAN
WINLIYA JEWEL SUNNY
INTRODUCTION
• Here, we are considering attackers who use the consumer-facing website or application
functionality to achieve the goal.
• A ‘consumer web’ means any product can be accessible over the public internet.
• Consumer-facing websites make it possible for hackers to monetize directly by gaining access to the
account.
• Fraud attempt is another primary concern as the hackers can use stolen credit cards to pay for whatever the
sites offer.
• Review frauds are another in which the users give biased reviews on a product to inflate or deflate its rating.
• Even if it is impossible for hackers to do direct monetization, there are sites that provide indirect
monetization.
TYPES OF ABUSE AND THE DATA THAT CAN STOP THEM
• These may be the use of passwords; these are information that is only known by user.
• Have flaws: using same passwords everywhere, easy to remember, sharing with friends or family.
Account Creation
• For a secure system, protection of account creation is essential.
• Two approaches: scoring the account creation to delete, lock or to restrict the fake account creation.
• Scoring model has two features: velocity features and reputation scores.
TYPES OF ABUSE AND THE DATA THAT CAN STOP THEM
Bot Activity
• Attackers sometimes collect a great deal of value from a single victim.
• Bot takes some steps :
• Account creation
• Credential stuffing: trying leaked passwords on the login to compromise the account. These
bot tries to stop the access, without leaking the account information.
• Scraping: these bots prevent from serving data to the hackers.
• Ranking fraud: increases view to increase their reach. These are discounted soon after they
occur.
SUPERVISED LEARNING FOR ABUSE
PROBLEMS
• Commercial website attack prevention and detection.
• Creation of account creation classifier.
• We try to classify the account creation request into abuse or good.
LABELING DATA
• Positive and negative samples.
• There are certain risks involved:
• Blindness
• Feedback loop
• Locality-sensitive hashing
It is used to find the approximate matches in large datasets.
• Problems of k-mean
Even though k-mean is the best clustering method it is not used here because the value of K is unknown
here. Another problem related to k-mean is that it does not work with categorial features.
SCORING CLUSTERS
We have to perform labeling and feature extraction to identify abusive and legitimate clusters in a large dataset.
To label a cluster good or bad, take threshold t and set a value, and if the percent of bad accounts in a cluster is greater than
threshold, it is bad or good.
In feature extraction to convert cluster into a numerical vector, these features are selected
1. Min, max, median, and quartiles
2. Mean and standard deviation
3. Percent of null or zero values