Case Study: Ensure To Insure

CASE STUDY
ENSURE TO INSURE
BIG DATA - GROUP 01
MEMBERS
✭ Đoàn Thảo Nguyên 718H1720
✭ Trần Thanh Vy 718H1819
✭ Nguyễn Thị Cẩm Thi 718H1764
✭ Nguyễn Hoàng Long 718H1691
✭ Trần Nguyễn Minh Thư 718H1769
✭ Phạm Phương Thảo 718H0200
TABLE OF CONTENTS
01 PRIMARY CONSIDERATIONS
FOR BIG DATA ADOPTION
02 BIG DATA
ANALYTICS LIFECYCLE
PRIMARY
CONSIDERATIONS
01 FOR BIG DATA
ADOPTION
ORGANIZATION PREREQUISITES
Big Data trained IT members
pointed out that applying Big Data is
not as simple as applying technology
platforms.
To ensure successful Big Data adoption,

the IT team and managers got together to:
• Create a feasibility report.
• Create an environment where the gap
between management's perceived
expectations and what the IT team can
actually deliver.
• Provide the IT team with insights that

allow them to anticipate changes that
may be needed in the future to keep the
Big Data solution platform relevant to
any emerging business requirements.
DATA PROCUREMENT
• Collect data from external data sources
such as social networks and census data.
• Collect data from third party data
providers to minimize costs.
PRIVACY
• Consider collecting additional customer data that
could lead to customer distrust.
• Offer a promotional incentive, such as lower
premiums, to increase customer acceptance and
trust.
SECURITY
• Note that additional development effort
will be required to ensure
standardization, role-based access
control in Big Data solution
environments and open source databases
that will store non-relational data.
PROVENANCE
• Raise a question regarding how confident they
can be in the results because the analysis
involves data from third-party data providers.
• Add, update metadata for each data set stored,
processed so that provenance is maintained at all
times and the results of the processing can be
tracked from all the way back to the constituent
data source.
LIMITED REAL-TIME SUPPORT
ETI's current goals include reducing the time it takes to
resolve claims and detect fraudulent claims, a resolution is
required that provides results in a timely manner. The IT
team anticipates that support for real-time data analytics
will not be required. Because of that, they responded by
developing a batch-based Big Data solution based on open
source Big Data technology.
DISTINCT PERFORMANCE CHALLENGES
As ETI's current IT infrastructure includes:
• Old network standards.
• The specifications of most servers, such as processor speed, disk space,
and disk speed, are unlikely to provide optimal data processing
performance.
• The existing IT Infrastructure needs to be upgraded before Big data
solutions can be designed and built.
DISTINCT GOVERNANCE REQUIREMENTS
A governance framework is required to ensure that the data and the
solution environment itself are regulated, standardized and evolved in a
controlled manner.
DISTINCT METHODOLOGY
An iterative data analysis approach that includes
business personnel from the relevant department needs
to be adopted.
CLOUDS
• None of ETI’s systems are currently hosted in the
cloud.
• Thus, the IT team does not possess cloud-related skill
sets.
• These facts alongside data privacy concerns lead the
IT team to the decision to build an on-premise Big
Data solution.
BIG DATA
ANALYTICS 02
LIFECYCLE
Data
Business Case Data
Acquisition &
Evaluation Identification
Filtering
Data
Aggregation Data
Data
& Validation &
Extraction
Representatio Cleansing
n
Utilization of
Data
Data Analysis Analysis
Visualization
Results
Figure: The nine stages of Big Data analytics

lifecycle
STAGE 1. BUSINESS CASE EVALUATION
● Detect fraudulent claims is intended to

reduce monetary loss and bring
comprehensive development to the
business. While fraud detection can be
performed across all four sectors of
ETI.
ETI provides building and contents insurance

to domestic and commercial clients. Insurance
fraud may be both opportunistic and
organized. KPIs that reduces fraudulent
claims by 15% is a way to determine the
success of Big Data when fraud is detected.
Budget: The biggest open source technologies
will be used to support batch processing. Then
the lifecycle of Big Data analytics needs to be
wider so budgeting acquisitions of cleaning
tools, additional data quality, and newer data
visualization technologies. The benefit return
expenses multiple times if a targeted fraud
detection KPI is achieved.
STAGE 2. DATA IDENTIFICATION
Internal data External data Claim data

Policy data, insurance Social media data, Multiple fields, one of
application documents, weather reports, which will determine
claims data, claims geographical data and if the claim is
adjustment notes, incident census data. fraudulent or
photographs, call center legitimate.
agent notes, and emails.
STAGE 3. DATA ACQUISITION AND FILTERING
• Policy data is taken from the policy
administration system, the claim data,
incident photographs.
• Claims adjusters notes are obtained from
the claims management system.
• The insurance application documents are
taken from the document management
system.
STAGE 3. DATA ACQUISITION AND FILTERING
• Claims adjusters notes and claim data
embedded together, a separate process used
to extract.
• A compressed copy is stored on the disk.
Metadata: name, source, size, format,
checkum, acquired date and number of
records; about 4% to 5%of records are
corrupted. Data filtering jobs are set up to
eliminate corrupt records.
STAGE 4. DATA EXTRACTION
The tweets dataset is in JSON, which includes
tweets, user ID, timestamps and tweet text, weather
dataset in hierarchical format (XML), timestamps,
temperature forecasts, wind speed forecasts, wind
direction forecasts, snow forecasts and flood
forecasts, extracted, converted and saved in tabular
form.
Figure: Metadata is added to data from internal and external sources
STAGE 5. DATA VALIDATION AND CLEANSING
Free versions of weather and census datasets that do not
guarantee 100% accuracy. Based on published field
information, check the fields extracted due to typographical
errors or inaccuracies as well as the type of data and range
validation. a record will not be deleted if it contains some
level of meaningful information besides possibly containing
invalid data.
STAGE 6. DATA AGGREGATION AND REPRESENTATION
Combining policy data, claim data and call center agent notes can be
referenced through a data query. The benefit is the detection of fraudulent
claims, risk assessment or speedy settlement of claims. The results of the
dataset are stored in the NoSQL database.
STAGE 7. DATA ANALYSIS
Analyze the nature of fraudulent claims to find
characteristics that distinguish between
fraudulent claims and legitimate claims. The
exploratory data analysis approach is applied
along with a wide range of analysis skills. This
stage is repeated several times, attributes that
are less likely to indicate fraudulent claims are
removed, but attributes with a direct
relationship are kept or added.
STAGE 8. DATA VISUALIZATION
Visualization methods:
• Bar charts
• Line graphs
• Scatter plots: analyzes claim groups based on different factors, such as
customer age, age of the policy, number of claims made, and value of
claim.
STAGE 9. UTILIZATION OF
ANALYSIS RESULTS
Developed an understanding of the

nature of fraudulent claims based on the
results of data analysis. A model based
on machine-learning technique is
created, which is then integrated into
the claim processing system to flag
fraudulent claims.
THANK YOU!
Do you have any questions?

Case Study: Ensure To Insure

Uploaded by

Copyright:

Available Formats

Case Study: Ensure To Insure

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Case Study: Ensure To Insure

Uploaded by

Copyright:

Available Formats

CASE STUDY

To ensure successful Big Data adoption,

• Provide the IT team with insights that

Figure: The nine stages of Big Data analytics

● Detect fraudulent claims is intended to

ETI provides building and contents insurance

Internal data External data Claim data

Developed an understanding of the

You might also like