0% found this document useful (0 votes)

26 views19 pages

Protecting The Consumer Web

This document discusses protecting consumer websites from attackers. It covers how attackers can directly or indirectly monetize access to consumer websites. It then describes common types of abuse like account takeover, fraudulent account creation, and bot activity. It proposes using supervised machine learning models to classify account creation requests and detect abuse. Challenges include labeling data, handling false positives and negatives, and addressing large-scale attacks. The document concludes by discussing how clustering techniques can help detect and stop large attacks on consumer websites.

Uploaded by

Winliya Jewel Sunny

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views19 pages

Protecting The Consumer Web

Uploaded by

Winliya Jewel Sunny

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 19

CHAPTER 6:

PROTECTING THE CONSUMER WEB

PRESENTING BY:
ANJALI R PILLAI
ANJANA MOHAN
WINLIYA JEWEL SUNNY
INTRODUCTION

• Here, we are considering attackers who use the consumer-facing website or application
functionality to achieve the goal.

• A ‘consumer web’ means any product can be accessible over the public internet.

• Attack surfaces: account access, payment interface, content generation.

MONETIZING CONSUMER WEB

• Consumer-facing websites make it possible for hackers to monetize directly by gaining access to the
account.

• Fraud attempt is another primary concern as the hackers can use stolen credit cards to pay for whatever the
sites offer.

• Click frauds are most common in advertising.

• Review frauds are another in which the users give biased reviews on a product to inflate or deflate its rating.

• Even if it is impossible for hackers to do direct monetization, there are sites that provide indirect
monetization.
TYPES OF ABUSE AND THE DATA THAT CAN STOP THEM

Authentication and Account Takeover

• For making payments or content creation in a site need authentication.

• These may be the use of passwords; these are information that is only known by user.

• Have flaws: using same passwords everywhere, easy to remember, sharing with friends or family.

• Second factor logins are recommended for suspicious logins.

• But it also has some demerits

TYPES OF ABUSE AND THE DATA THAT CAN STOP THEM

Account Creation
• For a secure system, protection of account creation is essential.

• Two approaches: scoring the account creation to delete, lock or to restrict the fake account creation.

• Blocking helps to prevent the damage from the attackers.

• Scoring model has two features: velocity features and reputation scores.
TYPES OF ABUSE AND THE DATA THAT CAN STOP THEM

Bot Activity
• Attackers sometimes collect a great deal of value from a single victim.
• Bot takes some steps :
• Account creation
• Credential stuffing: trying leaked passwords on the login to compromise the account. These
bot tries to stop the access, without leaking the account information.
• Scraping: these bots prevent from serving data to the hackers.
• Ranking fraud: increases view to increase their reach. These are discounted soon after they
occur.
SUPERVISED LEARNING FOR ABUSE
PROBLEMS
• Commercial website attack prevention and detection.
• Creation of account creation classifier.
• We try to classify the account creation request into abuse or good.
LABELING DATA
• Positive and negative samples.
• There are certain risks involved:
• Blindness
• Feedback loop

• To mitigate these risks:

• Oversample the examples in training data.
• Oversample false positives from the model when retraining.
• Under sample positive examples from previous iterations
• If possible, manually label some accounts.
COLD START V/S WARM START
• Cold Start deploys a new machine-learning model without any pre-trained parameters.
• Whereas warm start initializes a model with pre-trained parameters. Thus, it can start with
a foundation based on prior information.
• Different approaches can be implemented:
• Never throw away training data  v1 union v2
• Run simultaneous models
• Sample positives from the deployed v1 model to augment the v2 training data.
FALSE POSITIVES AND FALSE NEGATIVES
• False positives are accounts that our model verified as good but ended up bad.
• They are very hard to detect.
• They are mislabelled.
• False negatives are banned, stating bad but ended up being good.
• False positives can be handled by giving them a fraction of account access.
MULTIPLE RESPONSES
• Threshold value is allotted to the model for verification
• Block if the score is bad.
• Allow the score to be good
• A challenge for the ‘grey area’ score

 Occurrence of false positive!!!

LARGE ATTACKS
• If numerous attacks are made from a single IP address it would confuse the model.
• Overfitting.
• Solution: downsample large attacks,
• Attack with x results -> log(x) results are sampled for training.,
CLUSTERING ABUSE
• Group entities similar to one another.
• Verify each cluster and check whether it is legitimate or abusive.
• How does a cluster become bad?!
-> In some cases, a single bad entity makes the entire entity bad.
-> In some other cases, most cases should be bad to verify it as a bad cluster
Eg IP address
CLUSTERING SPAM DOMAIN
• The objective is to maximize recall and maintain high precision.
• Cluster must be at least of size 10.
• 75% of spam should be present to be considered as a bad cluster.
• This would minimize the chances of good domains getting caught up in clusters of bad
domains.
GENERATING CLUSTERS
• Selecting features for our domains.
• Features ->categorial, numerical, text-based
• Bad things happen in bunches and in clustering if we get good clusters than bad clusters, clustering
is not perfect.
• Best clustering strategy:
• Proportion of clusters that are labeled bad.
• Proportion of domain.
• Recall of bad cluster

Clustering can be done by Grouping and Locality-sensitive hashing

GENERATING CLUSTERS
• Grouping
Grouping is applied to features that have distinct values.
Grouping does not help us much in account classification because bad clusters are
underrepresented.

• Locality-sensitive hashing
It is used to find the approximate matches in large datasets.

• Problems of k-mean
Even though k-mean is the best clustering method it is not used here because the value of K is unknown
here. Another problem related to k-mean is that it does not work with categorial features.
SCORING CLUSTERS
We have to perform labeling and feature extraction to identify abusive and legitimate clusters in a large dataset.
 To label a cluster good or bad, take threshold t and set a value, and if the percent of bad accounts in a cluster is greater than
threshold, it is bad or good.
 In feature extraction to convert cluster into a numerical vector, these features are selected
1. Min, max, median, and quartiles
2. Mean and standard deviation
3. Percent of null or zero values

 Categorial account level features are:

1. Number of distinct values.
2. Percent of values belonging to the mode.
3. Percent of null values
4. Entropy

 For classification, a Random forest classifier, which is a non-linear classifier, is used.

FURTHER DIRECTION IN CLUSTERING

Clustering can be extended into using:

 Different clustering methods
 Different classifiers and parameters
 Adding new features at the item level
 Sampling data
 Updation of weight on items
 Adding a second classifier to detect false positive items in the cluster
CONCLUSION
• We discussed consumer web/apps and how to prevent attackers from accessing those
websites.
• Machine learning has many challenges and one of them is to get true data and the other is
difficulty in balancing between what is known and uncovering new attack techniques
• Combining machine learning and clustering techniques helps to detect and stop large
attacks quickly.

Safetica PUBLIC Complete-Documentation en 10 2021-09-09
No ratings yet
Safetica PUBLIC Complete-Documentation en 10 2021-09-09
108 pages
Cissp: Question & Answers
No ratings yet
Cissp: Question & Answers
27 pages
Manual LevistudioU PDF
No ratings yet
Manual LevistudioU PDF
291 pages
HIPAA-Training 2022
67% (3)
HIPAA-Training 2022
35 pages
FIRMS - User Manual PDF
No ratings yet
FIRMS - User Manual PDF
121 pages
OneIM AzureActiveDirectory Administration
No ratings yet
OneIM AzureActiveDirectory Administration
160 pages
Oracle Security: Radoslav Rusinov
No ratings yet
Oracle Security: Radoslav Rusinov
58 pages
A+ Core 2 220-1002
No ratings yet
A+ Core 2 220-1002
61 pages
Getting Started With Bartender: White Paper
No ratings yet
Getting Started With Bartender: White Paper
21 pages
CIS SUSE Linux Enterprise 11 Benchmark v2.1.0
No ratings yet
CIS SUSE Linux Enterprise 11 Benchmark v2.1.0
377 pages
Sites For Free Books Download
100% (2)
Sites For Free Books Download
3 pages
UEM Getting Started With REST
No ratings yet
UEM Getting Started With REST
12 pages
Sysadmin Magazine March 2022 FR
No ratings yet
Sysadmin Magazine March 2022 FR
24 pages
User Manual For Online Super Market Website
No ratings yet
User Manual For Online Super Market Website
3 pages
Dfa Online - CPRS - Services Showwwsss
No ratings yet
Dfa Online - CPRS - Services Showwwsss
43 pages
A03 - Injection
No ratings yet
A03 - Injection
16 pages
Vendors Quick Guide
No ratings yet
Vendors Quick Guide
2 pages
Final Year Project Presentation
No ratings yet
Final Year Project Presentation
13 pages
Fake Url
No ratings yet
Fake Url
64 pages
Ch5 Intruders Virus Firewall
No ratings yet
Ch5 Intruders Virus Firewall
93 pages
An Improved Authentication Scheme For Mobile Satellite Communication Systems
No ratings yet
An Improved Authentication Scheme For Mobile Satellite Communication Systems
12 pages
INS2061 Introductions
No ratings yet
INS2061 Introductions
75 pages
Security+ Notes v.2
100% (3)
Security+ Notes v.2
69 pages
Web Application Attack Detection Using Deep Learning
No ratings yet
Web Application Attack Detection Using Deep Learning
14 pages
1ds19scn09 - Mtech Project Phase-3
No ratings yet
1ds19scn09 - Mtech Project Phase-3
27 pages
Web Usage Mining Negative-Association: S.vignesh
No ratings yet
Web Usage Mining Negative-Association: S.vignesh
48 pages
Unsupe - Rvised Learning: Able T Understand and Prehend
No ratings yet
Unsupe - Rvised Learning: Able T Understand and Prehend
25 pages
PUB TroubleshootsmartcardlogontoWindows 210822 1714 22023
No ratings yet
PUB TroubleshootsmartcardlogontoWindows 210822 1714 22023
4 pages
CEH Lesson 5 - Web Server Hacking
No ratings yet
CEH Lesson 5 - Web Server Hacking
25 pages
Ch2 DTasks
No ratings yet
Ch2 DTasks
44 pages
Business Data Mining
No ratings yet
Business Data Mining
9 pages
Phishing Seminar
No ratings yet
Phishing Seminar
19 pages
Advanced Web Technology
No ratings yet
Advanced Web Technology
2 pages
Agenda: - Introduction - Basics - Classification - Clustering - Regression - Use-Cases
No ratings yet
Agenda: - Introduction - Basics - Classification - Clustering - Regression - Use-Cases
30 pages
Phishing Seminar
No ratings yet
Phishing Seminar
19 pages
10 AJSE+18 (1) +2024+pp+1-21+Detection+of+Malicious+Websites
No ratings yet
10 AJSE+18 (1) +2024+pp+1-21+Detection+of+Malicious+Websites
21 pages
Web Mining
No ratings yet
Web Mining
48 pages
Python Machine Learning
No ratings yet
Python Machine Learning
19 pages
Intrusion Detection System Using Unsupervised ML Algorithms: School of Information Technology and Engineering
No ratings yet
Intrusion Detection System Using Unsupervised ML Algorithms: School of Information Technology and Engineering
15 pages
Electricity Theft Detection: Using Machine Learning
100% (1)
Electricity Theft Detection: Using Machine Learning
23 pages
Sat - 97.Pdf - Bank Fraud Detection Using Machine Learning Algorithm
No ratings yet
Sat - 97.Pdf - Bank Fraud Detection Using Machine Learning Algorithm
11 pages
Meng - 2020 - J. - Phys. - Conf. - Ser. - 1601 - 052016
No ratings yet
Meng - 2020 - J. - Phys. - Conf. - Ser. - 1601 - 052016
7 pages
Smart Phishing Detection in Web Pages Using Supervised Deep Learning Classification and Optimization Technique ADAM
No ratings yet
Smart Phishing Detection in Web Pages Using Supervised Deep Learning Classification and Optimization Technique ADAM
17 pages
Bia Unit-3 Part-2
No ratings yet
Bia Unit-3 Part-2
43 pages
NPTEL Management - Organizational Behaviour
No ratings yet
NPTEL Management - Organizational Behaviour
6 pages
Web Content Classification: A Survey
No ratings yet
Web Content Classification: A Survey
5 pages
Chapter 4
No ratings yet
Chapter 4
12 pages
Bijamr 20221102 Compressed
No ratings yet
Bijamr 20221102 Compressed
8 pages
Data Types
No ratings yet
Data Types
2 pages
Applsci 13 04649
No ratings yet
Applsci 13 04649
16 pages
25C16 - Fingerprint Recognition System
No ratings yet
25C16 - Fingerprint Recognition System
40 pages
Usa Uncek 58
No ratings yet
Usa Uncek 58
36 pages
Data Clustering Seminar
No ratings yet
Data Clustering Seminar
34 pages
Technothon Phishing Detection
No ratings yet
Technothon Phishing Detection
30 pages
Data Mining
No ratings yet
Data Mining
23 pages
Web Attack Runtime Detection (WAR) : Mr. Pratik Kadam, Prof. Neelkamal More
No ratings yet
Web Attack Runtime Detection (WAR) : Mr. Pratik Kadam, Prof. Neelkamal More
3 pages
Fraud Detection in E-Commerce Using Machine Learning
No ratings yet
Fraud Detection in E-Commerce Using Machine Learning
8 pages
Building Machine Learning Systems With Python - Second Edition - Sample Chapter
100% (2)
Building Machine Learning Systems With Python - Second Edition - Sample Chapter
32 pages
Classification Clustering Overview
No ratings yet
Classification Clustering Overview
7 pages
100 Techniques For Writing Readable Code in NodeJs NodeJs Readable Code 100 Knock (Yuka, Horikawa Tatsuya, Minamino Shou Etc.) (Z-Library)
No ratings yet
100 Techniques For Writing Readable Code in NodeJs NodeJs Readable Code 100 Knock (Yuka, Horikawa Tatsuya, Minamino Shou Etc.) (Z-Library)
223 pages
Format Kertas Kerja
No ratings yet
Format Kertas Kerja
5 pages
Types of Web Security Threats & Their Fixes
No ratings yet
Types of Web Security Threats & Their Fixes
7 pages
INFOCOMP+Journal+Final 3
No ratings yet
INFOCOMP+Journal+Final 3
6 pages
Statistical Machine Learning With Python Week #1
No ratings yet
Statistical Machine Learning With Python Week #1
37 pages
Facial Skin Analyzer User Manual
No ratings yet
Facial Skin Analyzer User Manual
26 pages
Python Lesson 5 - Selection
No ratings yet
Python Lesson 5 - Selection
19 pages
Cybersecurity in The Era of Data Science Examining New Adversarial Models
No ratings yet
Cybersecurity in The Era of Data Science Examining New Adversarial Models
8 pages
Engineering-A Review Web Data Scrapping
No ratings yet
Engineering-A Review Web Data Scrapping
4 pages
Week3 Report RSelga
No ratings yet
Week3 Report RSelga
13 pages
Workshop 0
No ratings yet
Workshop 0
22 pages
Data Mining
No ratings yet
Data Mining
7 pages
Mlproject
No ratings yet
Mlproject
8 pages
DWM Ia-2 QB
No ratings yet
DWM Ia-2 QB
10 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
18 pages
Amc Associate Self Service
No ratings yet
Amc Associate Self Service
4 pages
Web Content Classification: A Survey: Prabhjot Kaur
No ratings yet
Web Content Classification: A Survey: Prabhjot Kaur
5 pages
Data Mining Assignment
No ratings yet
Data Mining Assignment
5 pages
5 - Chapter # 5 (Com)
No ratings yet
5 - Chapter # 5 (Com)
16 pages
DataMining Chapter1
No ratings yet
DataMining Chapter1
13 pages
FPA Unit 3
No ratings yet
FPA Unit 3
17 pages
Business Portal Guide
No ratings yet
Business Portal Guide
16 pages
EB Ining: Dvanced Opics
0% (1)
EB Ining: Dvanced Opics
48 pages
Unit Iv, V
No ratings yet
Unit Iv, V
35 pages
60 Assignment
No ratings yet
60 Assignment
3 pages
Data Mining Techniques
No ratings yet
Data Mining Techniques
11 pages
Ethical Hacking - UNIT 2-8
No ratings yet
Ethical Hacking - UNIT 2-8
9 pages
ML Material Unit-4
No ratings yet
ML Material Unit-4
38 pages

Protecting The Consumer Web

Uploaded by

Protecting The Consumer Web

Uploaded by

CHAPTER 6:

PROTECTING THE CONSUMER WEB

• Attack surfaces: account access, payment interface, content generation.

• Click frauds are most common in advertising.

Authentication and Account Takeover

• Second factor logins are recommended for suspicious logins.

• But it also has some demerits

• Blocking helps to prevent the damage from the attackers.

• To mitigate these risks:

 Occurrence of false positive!!!

Clustering can be done by Grouping and Locality-sensitive hashing

 Categorial account level features are:

 For classification, a Random forest classifier, which is a non-linear classifier, is used.

Clustering can be extended into using:

You might also like