Big Data Analytics (2017 Regulation) : Insurance Fraud Detection

The document discusses various applications of big data analytics including fraud detection using machine learning, rideshare data analysis, cyber profiling criminals, call record detail analysis, and automatic clustering of IT alerts. It then covers advantages and disadvantages of k-means clustering and describes three methods to determine the optimal number of clusters: the elbow method, average silhouette method, and gap statistic method.

Uploaded by

cskinit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views8 pages

Big Data Analytics (2017 Regulation) : Insurance Fraud Detection

Uploaded by

cskinit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 8

BIG DATA ANALYTICS (2017 REGULATION)

Insurance Fraud Detection

 Machine learning has a critical role to play in fraud detection and has numerous applications in automobile,
healthcare, and insurance fraud detection.
 Utilizing past historical data on fraudulent claims, it is possible to isolate new claims based on its proximity
to clusters that indicate fraudulent patterns.
Rideshare Data Analysis
 The publicly available Uber ride information dataset provides a large amount of valuable data around traffic,
transit time, peak pickup localities, and more.
Cyber-Profiling Criminals
 Cyber-profiling is the process of collecting data from individuals and groups to identify significant co-
relations.
 The idea of cyber profiling is derived from criminal profiles, which provide information on the investigation
division to classify the types of criminals who were at the crime scene.
Call Record Detail Analysis
 A call detail record (CDR) is the information captured by telecom companies during the call, SMS, and
internet activity of a customer.
 This information provides greater insights about the customer’s needs when used with customer
demographics.
Automatic Clustering of IT Alerts
 Large enterprise IT infrastructure technology components such as network, storage, or database generate
large volumes of alert messages.
 Because alert messages potentially point to operational issues, they must be manually screened for
prioritization for downstream processes.
Others: Image segmentation, Image Compression, Identifying cancerous data, Search engines etc.
BIG DATA ANALYTICS (2017 REGULATION)

Advantages:
 It is fast
 Easy to understand
 Robust
 Comparatively efficient
 If data sets are distinct then gives the best results
 Produce tighter clusters
 When centroids are recomputed the cluster changes.
 Flexible
 Easy to interpret
 Better computational cost
 Enhances Accuracy

Disadvantages:
 Sometimes choosing the centroids randomly cannot give fruitful results
 Needs prior specification for the number of cluster centers
 If there are two highly overlapping data then it cannot be distinguished and cannot tell that there are two
clusters
 With the different representation of the data, the results achieved are also different
 Euclidean distance can unequally weight the factors
 If very large data sets are encountered then the computer may crash
 Prediction issues
BIG DATA ANALYTICS (2017 REGULATION)

Determining Optimal Clusters:

 When using k-means clustering, users need some way to determine whether they are using the right number
of clusters.
Methods:
1. Elbow Method
2. Average Silhouette Method
3. Gap Statistic Method

Cluster the observed data, varying the number of clusters from k = 1, …, kmax, and compute the corresponding

total within intra-cluster variation Wk.

BIG DATA ANALYTICS (2017 REGULATION)

Elbow Method:
1. Compute clustering algorithm (e.g., k-means clustering) for different values of k. For instance, by
varying k from 1 to 10 clusters
2. For each k, calculate the total within-cluster sum of square (WSS)
3. Plot the curve of WSS according to the number of clusters k.
4. The location of a bend (knee) in the plot is generally considered as an indicator of the appropriate number
of clusters.
5. 4 is the optimal number of clusters.
BIG DATA ANALYTICS (2017 REGULATION)

Average Silhouette Method: (The average silhouette approach measures the quality of a clustering)
 Compute the average distance from all data points in the same cluster (ai).
 Compute the average distance from all data points in the closest cluster (bi).
 The coefficient can take values in the interval [-1, 1].
 If it is 0 –> the sample is very close to the neighboring clusters.

 It it is 1 –> the sample is far away from the neighboring clusters.

 It it is -1 –> the sample is assigned to the wrong clusters or overlapping

 A high average silhouette width indicates a good clustering.

Compute the coefficient:

BIG DATA ANALYTICS (2017 REGULATION)

Average Silhouette Method: (The average silhouette approach measures the quality of a clustering)
A high avg. silhouette score indicates a good clustering.
BIG DATA ANALYTICS (2017 REGULATION)

Gap Statistic Method:

 The approach can be applied to any clustering method.
 The gap statistic compare the total intra-cluster variation for different values of k with their expected values
under null reference distribution of the data.
The gap statistics for a given k is defined as follows:
BIG DATA ANALYTICS (2017 REGULATION)

Gap Statistic Method:

According to this observation k = 2 is the optimal number of clusters in the data.

Stages of The Data Life Cycle
100% (1)
Stages of The Data Life Cycle
22 pages
Idms Faq
No ratings yet
Idms Faq
4 pages
Sven Hedin-Central Asia and Tibet-1
No ratings yet
Sven Hedin-Central Asia and Tibet-1
703 pages
Big Data Analytics (2017 Regulation) : Overview of Clustering
No ratings yet
Big Data Analytics (2017 Regulation) : Overview of Clustering
9 pages
Big Data Analytics
No ratings yet
Big Data Analytics
25 pages
Clustering Unit4
No ratings yet
Clustering Unit4
9 pages
Concepts and Techniques: - Chapter 7
No ratings yet
Concepts and Techniques: - Chapter 7
70 pages
5.cluster Analysis
No ratings yet
5.cluster Analysis
16 pages
Clustering For Big Data Analytics
No ratings yet
Clustering For Big Data Analytics
28 pages
Chapter 9
No ratings yet
Chapter 9
22 pages
CLUSTRING
No ratings yet
CLUSTRING
13 pages
Big Data Analytics (2017 Regulation) : Unit - 2 Clustering and Classification
No ratings yet
Big Data Analytics (2017 Regulation) : Unit - 2 Clustering and Classification
7 pages
Clustering Notes
No ratings yet
Clustering Notes
17 pages
CC - Unit IV - Chapters
No ratings yet
CC - Unit IV - Chapters
47 pages
Chapter 7
No ratings yet
Chapter 7
3 pages
Clustering
No ratings yet
Clustering
16 pages
Contextual Image Classification: Understanding Visual Data for Effective Classification
From Everand
Contextual Image Classification: Understanding Visual Data for Effective Classification
Fouad Sabry
No ratings yet
Chap8-Cluster Analysis
No ratings yet
Chap8-Cluster Analysis
103 pages
Data Mining - UNIT-IV
No ratings yet
Data Mining - UNIT-IV
24 pages
17 GM ASAP Data Mining - Clustering
No ratings yet
17 GM ASAP Data Mining - Clustering
107 pages
Cluster Analysis
No ratings yet
Cluster Analysis
36 pages
Clustering Methods
No ratings yet
Clustering Methods
14 pages
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
Module V
No ratings yet
Module V
16 pages
Unit 4
No ratings yet
Unit 4
5 pages
Ijcrcst January17 12
No ratings yet
Ijcrcst January17 12
4 pages
Clustering
No ratings yet
Clustering
65 pages
CC Unit IV
No ratings yet
CC Unit IV
30 pages
DM 3rd Unit
No ratings yet
DM 3rd Unit
5 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
101 pages
Data Mining
No ratings yet
Data Mining
98 pages
Datawarehousing and Data Mining
No ratings yet
Datawarehousing and Data Mining
119 pages
Unit5 Clustering
No ratings yet
Unit5 Clustering
74 pages
Clustering
No ratings yet
Clustering
51 pages
IV Unit Big Data Analysis
No ratings yet
IV Unit Big Data Analysis
17 pages
Cluster Analysis
No ratings yet
Cluster Analysis
18 pages
Unit 4
No ratings yet
Unit 4
106 pages
Discovering Knowledge in Data: Lecture Review of
No ratings yet
Discovering Knowledge in Data: Lecture Review of
20 pages
DMDWUNITV
No ratings yet
DMDWUNITV
72 pages
Introduction To Data Analytics MCA-3282 Open Elective - 6 Sem B.Tech Topic - Grouping
No ratings yet
Introduction To Data Analytics MCA-3282 Open Elective - 6 Sem B.Tech Topic - Grouping
44 pages
Clustering Part 1
No ratings yet
Clustering Part 1
12 pages
BD Unit 3
No ratings yet
BD Unit 3
27 pages
10 Clus Basic
No ratings yet
10 Clus Basic
95 pages
DM Clustering
No ratings yet
DM Clustering
51 pages
DM Unit 5
No ratings yet
DM Unit 5
15 pages
Machine Learning Unit-4
No ratings yet
Machine Learning Unit-4
24 pages
Cluster Analysis: Basic Concepts Partitioning Methods Hierarchical Methods Density-Based Methods Grid-Based Methods Evaluation of Clustering
No ratings yet
Cluster Analysis: Basic Concepts Partitioning Methods Hierarchical Methods Density-Based Methods Grid-Based Methods Evaluation of Clustering
38 pages
Dmbi Unit-4
No ratings yet
Dmbi Unit-4
18 pages
DEP 80-80-00 15 EPE Guidance For Selection of SCEs
No ratings yet
DEP 80-80-00 15 EPE Guidance For Selection of SCEs
10 pages
Clustering
No ratings yet
Clustering
34 pages
Big Data Analytics For Business
No ratings yet
Big Data Analytics For Business
12 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Design and Implementation of High End Multiple Security Based ATM Monitoring System
No ratings yet
Design and Implementation of High End Multiple Security Based ATM Monitoring System
3 pages
Cluster Analysis: Kaushik B
No ratings yet
Cluster Analysis: Kaushik B
41 pages
Unit IV
No ratings yet
Unit IV
96 pages
DMDW R20 Unit 5
No ratings yet
DMDW R20 Unit 5
21 pages
DMDW Unit-5
No ratings yet
DMDW Unit-5
21 pages
2015KS Krishnappa-Big Data Analytics For Cyber Security
No ratings yet
2015KS Krishnappa-Big Data Analytics For Cyber Security
15 pages
Improved K-Means Map Reduce Algorithm For Big Data Cluster Analysis
No ratings yet
Improved K-Means Map Reduce Algorithm For Big Data Cluster Analysis
7 pages
AIMLB PGP 2024 Session 12
No ratings yet
AIMLB PGP 2024 Session 12
46 pages
Unit VII
No ratings yet
Unit VII
30 pages
Data Science through R. Unsupervised Learning. Dimension Reduction Techniques: Principal Components, Factor Analysis and Correspondence Analysis
From Everand
Data Science through R. Unsupervised Learning. Dimension Reduction Techniques: Principal Components, Factor Analysis and Correspondence Analysis
César Pérez López
No ratings yet
Assignment of DMDW kg11
No ratings yet
Assignment of DMDW kg11
17 pages
Devops Lab - Create A Maven Build Pipeline
No ratings yet
Devops Lab - Create A Maven Build Pipeline
3 pages
Big Data Analytics (2017 Regulation) : Hadoop Distributed File System (HDFS)
No ratings yet
Big Data Analytics (2017 Regulation) : Hadoop Distributed File System (HDFS)
7 pages
1) Velocity: Speed of Data: Generation and Processing
No ratings yet
1) Velocity: Speed of Data: Generation and Processing
9 pages
A File Is A Named Collection of Related Information That Is Residing On Secondary Storage
No ratings yet
A File Is A Named Collection of Related Information That Is Residing On Secondary Storage
7 pages
Big Data Analytics (2017 Regulation)
No ratings yet
Big Data Analytics (2017 Regulation)
8 pages
Vendi Dad
No ratings yet
Vendi Dad
887 pages
Library DFD
50% (8)
Library DFD
34 pages
AMT302 QUESTION BANK - Format
No ratings yet
AMT302 QUESTION BANK - Format
3 pages
Attendance Sheet ISO Format 1
No ratings yet
Attendance Sheet ISO Format 1
29 pages
Final Year Bo12
No ratings yet
Final Year Bo12
3 pages
SMA Session 1
No ratings yet
SMA Session 1
24 pages
Database Principles: Fundamentals of Design, Implementation, and Management
100% (1)
Database Principles: Fundamentals of Design, Implementation, and Management
50 pages
Rdbms Unit II
No ratings yet
Rdbms Unit II
68 pages
Virtual Assitant With NLP Proposal
No ratings yet
Virtual Assitant With NLP Proposal
9 pages
Romney Ais13 PPT 04
No ratings yet
Romney Ais13 PPT 04
29 pages
Erwin Documents
No ratings yet
Erwin Documents
8 pages
CSE2004 - Database Management Systems
No ratings yet
CSE2004 - Database Management Systems
102 pages
Chapter 2 and 3 Database and System Planning in HRIS
No ratings yet
Chapter 2 and 3 Database and System Planning in HRIS
44 pages
Chapter 2 Powerpoint
No ratings yet
Chapter 2 Powerpoint
24 pages
Lecture 2 Distriburted Databases
No ratings yet
Lecture 2 Distriburted Databases
45 pages
Oracle 11gR2 Core DBA Course-1
No ratings yet
Oracle 11gR2 Core DBA Course-1
84 pages
CaseStudy1 HRIS (Solution)
No ratings yet
CaseStudy1 HRIS (Solution)
2 pages
Interpretation
No ratings yet
Interpretation
2 pages
An A To Z Guide To Understanding and Implementing SAP Content Server
No ratings yet
An A To Z Guide To Understanding and Implementing SAP Content Server
16 pages
Compal LS-B161P Power SW PDF
No ratings yet
Compal LS-B161P Power SW PDF
2 pages
Keys in Database Management System
No ratings yet
Keys in Database Management System
14 pages
Arun Kumar Data Analyst
No ratings yet
Arun Kumar Data Analyst
2 pages
Data Warehouses
No ratings yet
Data Warehouses
6 pages
Lift (Data Mining)
No ratings yet
Lift (Data Mining)
3 pages
Resume Parse
No ratings yet
Resume Parse
3 pages
DMX and Vmax Commands Quick References
No ratings yet
DMX and Vmax Commands Quick References
14 pages
SEO Roadmap for 2025
No ratings yet
SEO Roadmap for 2025
12 pages

Big Data Analytics (2017 Regulation) : Insurance Fraud Detection

Uploaded by

Big Data Analytics (2017 Regulation) : Insurance Fraud Detection

Uploaded by

BIG DATA ANALYTICS (2017 REGULATION)

Insurance Fraud Detection

Determining Optimal Clusters:

total within intra-cluster variation Wk.

 It it is 1 –> the sample is far away from the neighboring clusters.

 It it is -1 –> the sample is assigned to the wrong clusters or overlapping

 A high average silhouette width indicates a good clustering.

Compute the coefficient:

Gap Statistic Method:

Gap Statistic Method:

You might also like