0% found this document useful (0 votes)
7 views70 pages

Chapter-1. Introduction

Chapter-1. Introduction

Uploaded by

Khánh Linh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views70 pages

Chapter-1. Introduction

Chapter-1. Introduction

Uploaded by

Khánh Linh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 70

CHAPTER 1.

UNDERSTANDING
BIG DATA

Editor: David Loshin

BIG DATA FUNDAMENTALS: CONCEPTS, DRIVERS & TECHNIQUES – PRENTICE HALL


1

20/05/2020 Big data fundamentals


OUTLINE

Concepts and Terminology


Big Data Characteristics
Different Types of Data
Case Study Background
2

20/05/2020
20/05/2019 701015 - ỨngBig
dụngdata
dữfundamentals
liệu lớn trong kinh doanh
BIG DATA OVERVIEW

What is Big Data?


When is Big Data used?
- - > Big Data is a field dedicated to the analysis, processing, and
storage of large collections of data that frequently originate from
disparate sources.
- - > Big Data is used when traditional data analysis, processing
and storage technologies and techniques are insufficient. 3

20/05/2020 Big data fundamentals


INSIGHTS AND BENEFITS OF BIG DATA
Operational optimization
Actionable intelligence
Identification of new markets
Accurate predictions
Fault and fraud detection
More detailed records
Improved decision-making
Scientific discoveries
4

20/05/2020 Big data fundamentals


CONCEPTS AND TERMINOLOGY

5
DATASETS

Datasets are collections or groups of related data.


Each group or dataset member (datum) shares the same set of
attributes or properties as others in the same dataset.
Examples:
tweets stored in a flat file
a collection of image files in a directory
an extract of rows from a database table stored in a CSV formatted file
historical weather observations that are stored as XML files 6

20/05/2020 Big data fundamentals


DATASETS

Figure 1.1: three datasets based on three different data formats.

20/05/2020 Big data fundamentals


DATA ANALYSIS

Data Analysis is the process of examining data to find facts,


relationships, patterns, insights and/or trends.
Overall goal of data analysis is to support better decision
making.
Example:
the analysis of ice cream sales data in order to determine how the number of ice
cream cones sold is related to the daily temperature.
The results support decisions related to how much ice cream a store should order in
relation to weather forecast information. 8

20/05/2019 Big data fundamentals


DATA ANALYSIS

Figure 1.2: The symbol used to represent data analysis.

20/05/2019 Big data fundamentals


DATA ANALYTICS

Data Analytics is a discipline that includes


(a) the management of the complete data lifecycle (which
encompasses collecting, cleansing, organizing, storing,
analyzing and governing data),
(b) the development of analysis methods, scientific
techniques and automated tools.
Is a broader term that encompasses data analysis.
20/05/2019 Big data fundamentals
DATA ANALYTICS

Figure 1.3: The symbol used to represent data analytics.

20/05/2019 Big data fundamentals


DATA ANALYTICS

Different kinds of organizations use data analytics tools and techniques in


different ways.
Example:
In business-oriented environments, data analytics results can lower operational
costs and facilitate strategic decision-making.
In the scientific domain, data analytics can help identify the cause of a
phenomenon to improve the accuracy of predictions.
In service-based environments like public sector organizations, data analytics can
help strengthen the focus on delivering high-quality services by driving down costs.

20/05/2019 Big data fundamentals


DATA ANALYTICS

Four general categories of analytics that are distinguished by the


results they produce.

descriptive diagnostic predictive prescriptive


analytics analytics analytics analytics

20/05/2019 Big data fundamentals


DATA ANALYTICS
Figure 1.4: Value and complexity increase from descriptive to prescriptive
analytics.

20/05/2019 Big data fundamentals


DESCRIPTIVE ANALYTICS
To answer questions about events that have already occurred. It contextualizes
data to generate information.
Sample questions:
What was the sales volume over the past 12 months?
What is the number of support calls received as categorized by severity and geographic location?
What is the monthly commission earned by each sales agent?
The result reports are generally static, display historical data in the form of data grids or charts.

Queries are executed on operational data stores from within an enterprise, for
example a Customer Relationship Management system (CRM) or Enterprise
Resource Planning (ERP) system via ad-hoc reporting or dashboards. (see Figure
1.5) 15

20/05/2019 Big data fundamentals


DESCRIPTIVE ANALYTICS
Figure 1.5: The operational systems, pictured left, are queried via descriptive
analytics tools to generate reports or dashboards, pictured right.

16

20/05/2019 Big data fundamentals


DIAGNOSTIC ANALYTICS
To determine the cause of a phenomenon that occurred in the past using
questions that focus on the reason behind the event.
Sample questions:
Why were Q 2 sales less than Q 1 sales?
Why have there been more support calls originating from the Eastern region than
from the Western region?
Why was there an increase in patient re-admission rates over the past three
months?
The executed queries are performed on multidimensional data held in
analytic processing systems performing drill-down and roll-up analysis.
(see Figure 1.6) 17

20/05/2019 Big data fundamentals


DIAGNOSTIC ANALYTICS

Figure 1.6: Diagnostic


analytics can result in
data that is suitable for
performing drilldown and
roll-up analysis.

20/05/2019 Big data fundamentals


PREDICTIVE ANALYTICS

To determine the outcome of an future event and generate future


predictions based on a built model.
Sample questions (what-if rationale):
What are the chances that a customer will default on a loan if they have missed a
monthly payment?
What will be the patient survival rate if Drug B is administered instead of Drug A?
If a customer has purchased Products A and B, what are the chances that they will
also purchase Product C ?

20/05/2019 Big data fundamentals


PREDICTIVE ANALYTICS

The strength and magnitude of the associations based upon past


events will form the basis of models (which include patterns, trends
and exceptions found in historical and current data).
The models have implicit dependencies on the conditions under
which the past events occurred. If these underlying conditions
change, then the models that make predictions need to be updated.

20/05/2019 Big data fundamentals


PREDICTIVE ANALYTICS
Figure 1.7: Predictive analytics tools can provide user-friendly front-end
interfaces.

20/05/2019 Big data fundamentals


PRESCRIPTIVE ANALYTICS
Prescriptive analytics build upon the results of predictive analytics by
prescribing actions that should be taken and explaining the reason
“why” (because they embed elements of situational understanding).
Sample questions:
Among three drugs, which one provides the best results?
When is the best time to trade a particular stock?
Various outcomes are calculated, and the best course of action for
each outcome is suggested. This approach shifts from explanatory
to advisory and can include the simulation of various scenarios. 22

20/05/2019 Big data fundamentals


PRESCRIPTIVE ANALYTICS

Figure 1.8: Prescriptive analytics


involves the use of business rules and
internal and/or external data to
perform an in-depth analysis.

20/05/2019 Big data fundamentals


BUSINESS INTELLIGENCE (BI)

BI applies analytics to large amounts of enterprise’s data generated


by its business processes and information systems (which has
typically been consolidated into an enterprise data warehouse) to
gain insight into the performance of an enterprise.
The output of BI can be surfaced to a dashboard that allows
managers to access and analyze the results and potentially refine
the analytic queries to further explore the data. (see Figure 1.9)

20/05/2019 Big data fundamentals


BUSINESS INTELLIGENCE (BI)
Figure 1.9: BI can be used to improve business applications, consolidate data in data
warehouses and analyze queries via a dashboard.

20/05/2019 Big data fundamentals


KEY PERFORMANCE INDICATORS (KPI)
A KPI is a metric
to gauge success within a particular business context.
to identify business performance problems and demonstrate regulatory
compliance.
is the quantifiable reference points for measuring a specific aspect of a
business’ overall performance.
KPIs are linked with an enterprise’s overall strategic goals and
objectives.
KPIs are often displayed via a KPI dashboard, and compare the
actual measurements with threshold values of KPI. (see Figure 1.10) 26

20/05/2019 Big data fundamentals


KEY PERFORMANCE INDICATORS (KPI)
Figure 1.10: A KPI dashboard acts as a central reference point for gauging business
performance.

20/05/2019 Big data fundamentals


BIG DATA CHARACTERISTICS
FIVE BIG DATA TRAITS

Are commonly referred to as the Five Vs


To differentiate “Big” data from other forms of data.

20/05/2019 Big data fundamentals


VOLUME
The anticipated volume of data is high, substantial and ever-growing.
Figure 1.12 provides a visual representation of the large volume of data being
created daily by organizations and users world-wide.

Figure 1.12: Organizations and users


world-wide create over 2.5 EBs of
data a day. As a point of comparison,
the Library of Congress currently
holds more than 300 TBs of data.

20/05/2019 Big data fundamentals


VOLUME

Typical data sources for generating high data volumes:


online transactions, such as point-of-sale and banking
scientific and research experiments, such as the Large Hadron Collider
and Atacama Large Millimeter/Submillimeter Array telescope
sensors, such as GPS sensors, RFIDs, smart meters and telematics
social media, such as Facebook and Twitter
VELOCITY

Data can arrive at fast speeds, and enormous datasets can


accumulate within very short periods of time.
Depending on the data source, velocity may not always be
high. For example, MRI scan images are not generated as
frequently as log entries from a high-traffic webserver.

20/05/2019 Big data fundamentals


VELOCITY
Figure 1.13: Examples of high-
velocity Big Data can easily be
generated in a given minute
350,000 tweets,
300 hours of video footage
uploaded to YouTube,
171 million emails,
330 GBs of sensor data from a
jet engine.
20/05/2019 Big data fundamentals
VARIETY
Variety is the multiple formats and types of data.
Data variety brings challenges for enterprises in terms of data
integration, transformation, processing, and storage.
Figure 1.14 gives an example of data variety, which includes
structured data (i.e. financial transactions), semi-structured data
(i.e. emails) and unstructured data (i.e. images).

20/05/2019 Big data fundamentals


VERACITY
Veracity refers to the quality or fidelity of data.
Noise is data that cannot be converted into information and thus has no value.
Signals are data that have value and lead to meaningful information.
The signal-to-noise ratio
Data with a high signal-to-noise ratio has more veracity than data with a lower
ratio.
The signal-to-noise ratio is dependent upon the source data and data type. For
example: Data that is acquired in a controlled manner (via online customer
registrations), usually contains less noise than data acquired via uncontrolled
35

sources, such as blog postings.


20/05/2019 Big data fundamentals
VALUE

Value is the usefulness of data for an enterprise.


Value is intuitively impacted by
(a) the veracity
(b) the timeliness of generated analytic results (how long
data processing takes)
(c) the lifecycle-related concerns

20/05/2019 Big data fundamentals


VALUE

(a) the veracity


the higher the data fidelity, the more value it holds
for the business.

20/05/2019 Big data fundamentals


VALUE
(b) the timeliness
value and time are inversely
related. The longer it takes to
turn data into meaningful
information, the less value it
has for a business.
because analytics results
have a shelf-life; for
example, a 20 minute
delayed stock quote has little
to no value for making a
trade compared to a quote
that is 20 milliseconds old
20/05/2019 Big data fundamentals 38
VALUE

(c) the lifecycle-related concerns


How well has the data been stored?
Were valuable attributes of the data removed during data cleansing?
Are the right types of questions being asked during data analysis?
Are the results of the analysis being accurately communicated to the
appropriate decision-makers?

20/05/2019 Big data fundamentals


DIFFERENT TYPES OF DATA
THE DATA PROCESSED BY BIG DATA SOLUTIONS CAN BE

Human-generated data Machine-generated data


The result of human interaction with Is generated by software programs
systems, such as online services and and hardware devices in response to
digital devices. real-world events.
For example, a log file captures an
authorization decision made by a
security service, and a point-of-sale
system generates a transaction
against inventory to reflect items
purchased by a customer. 41

20/05/2019 Big data fundamentals


Human-generated data Machine-generated data

Figure 1.16: Examples of human-generated data Figure 1.17: Examples of machine-generated


include social media, blog posts, emails, photo data include web logs, sensor data,
sharing and messaging. telemetry data, smart meter data and appliance
usage data.

20/05/2019 Big data fundamentals


PRIMARY TYPES OF DATA

Both human-generated and machine-generated


data can be in various formats or types:
structured data
unstructured data
semi-structured data
metadata (another type)

20/05/2019 Big data fundamentals


STRUCTURED DATA
Is a data model or schema and is often stored in tabular form.
To capture relationships between different entities and is stored in a relational
database.
Structured data is frequently generated by enterprise applications and
information systems like ERP and CRM systems.
Example: banking transactions, invoices, and customer records.
Figure 1.18: The symbol used to represent structured data stored in a tabular
form.

20/05/2019 Big data fundamentals


UNSTRUCTURED DATA
Data that does not conform to a data model or data schema, and
accounts for 80% of the data within any given enterprise.
Unstructured data has a faster growth rate than structured data.
Form of unstructured data: textual or binary, and often conveyed via
files that are self-contained and non-relational.
A text file may contain the contents of various tweets or blog postings.
Binary files are often media files that contain image, audio or video data.

20/05/2019 Big data fundamentals


UNSTRUCTURED DATA

Special purpose logic is usually required to process and store unstructured


data.
For example: to play a video file, it is essential that the correct codec (coder-
decoder) is available.
Unstructured data cannot be directly processed or queried using SQL.
If it is required to be stored within a relational database, it is stored in a table as a
Binary Large Object (BLOB).
Alternatively, a Not-only SQL (NoSQL) database is a non-relational database that
can be used to store unstructured data alongside structured data.

20/05/2019 Big data fundamentals


SEMI-STRUCTURED DATA
Semi-structured data has a defined level of structure and consistency, but is not
relational in nature.
It is hierarchical or graph-based, and commonly stored in files that contain text.
It is more easily processed than unstructured data.
Figure 1.20: XML, JSON and sensor data are common forms of semi-
structured.

20/05/2019 Big data fundamentals


SEMI-STRUCTURED DATA

Examples of common sources of semi-structured data: electronic


data interchange (EDI) files, spreadsheets, RSS feeds and sensor
data.
Semi-structured data often has special pre-processing and storage
requirements.
An example of pre-processing is the validation of an XML file to
ensure that it conformed to its schema definition.

20/05/2019 Big data fundamentals


METADATA
Metadata provides information about a dataset’s characteristics
and structure.
It is mostly machine-generated and can be appended to data.
The tracking of metadata is crucial to Big Data processing, storage
and analysis because it provides information about the pedigree of
the data and its provenance during processing.
Examples:
XML tags providing the author and creation date of a document
49

attributes providing the file size and resolution of a digital photograph


20/05/2019 Big data fundamentals
METADATA
Big Data solutions rely on metadata, particularly when processing
semi-structured and unstructured data.
Figure 1.21: The symbol used to represent metadata.

20/05/2019 Big data fundamentals


CASE STUDY BACKGROUND
CASE STUDY BACKGROUND

Company introduction
Company history and Company structure
IT environment – Technical Infrastructure and Automation Environment
Business Goals and Obstacles to adopt a data-driven IT solution
Big Data adoption - Case Study Example

20/05/2019 Big data fundamentals


COMPANY INTRODUCTION

Ensure to Insure (ETI) is a leading insurance company that provides


a range of insurance plans in the health, building, marine and
aviation sectors.
25 million globally dispersed customer base.
5,000 employees.
more than 350,000,000 USD annual revenue.

20/05/2019 Big data fundamentals


COMPANY HISTORY
50 years ago: started as an exclusive health insurance provider.
Later, ETI has extended its services to property and casualty insurance plans in
the building, marine and aviation sectors.
Each of four sectors has a core team of specialized and experienced agents,
actuaries, underwriters and claim adjusters.

ETI’s key
departments

Customer Human IT
Underwriting Claims Settlement Legal Marketing Accounts
care resource
department department department department department department department
department department 54

20/05/2019 Big data fundamentals


Agents
• generating the company’s revenue by selling policies
Actuaries
• managing risk assessment
• designing new insurance plans and revising existing plans
• performing what-if analyses and making use of dashboards and scorecards for scenario
evaluation
Underwriters
• evaluating new insurance applications and deciding on the premium amount
Claim adjusters
• dealing with investigating claims made against a policy
• arriving at a settlement amount for the policyholder

55
COMPANY HISTORY
Communication channels between Customer care department and prospective
and existing customers:
telephone
email
social media
Core competence:
providing competitive policies and premium customer service that does not end once
a policy has been sold.
helping to achieve increased levels of customer acquisition and retention.
relying heavily on its actuaries to create insurance plans that reflect the needs of its 56

customers.
20/05/2019 Big data fundamentals
policy
quotation
customer
IT ENVIRONMENT – relationship policy
TECHNICAL management
(CRM)
administration

INFRASTRUCTURE AND
AUTOMATION A set of client-
ENVIRONMENT enterprise server,
claims
resource mainframe management
planning (ERP) platforms and
systems

risk
billing
assessment
document
management
57

20/05/2019 Big data fundamentals


IT ENVIRONMENT – FUNCTIONS OF EACH SYSTEM
Policy quotation system
To create new insurance plans
To provide quotes to prospective customers
Is integrated with the website and customer care portal to provide website visitors and customer care
agents the ability to obtain insurance quotes
Policy administration system
To handle policy lifecycle management, including issuance, update, renewal and cancellation of policies
Claims management system
To deal with claim processing activities
A claim is registered when a policyholder makes a report, then assigned to a claim adjuster who analyzes
the claim in light of the available information that was submitted when the claim was made, as well other
background information obtained from different internal and external sources. Based on the analyzed58
information, the claim is settled following a certain set of business rules.

20/05/2019 Big data fundamentals


IT ENVIRONMENT – FUNCTIONS OF EACH SYSTEM
Risk assessment system
Is used by the actuaries to assess any potential risk, such as a storm or a flood that
could result in policyholders making claims
To run probability-based risk evaluation that involves executing various
mathematical and statistical models.
Document management system
A central repository for all kinds of documents, including policies, claims, scanned
documents and customer correspondence.
Billing system
To keep track of premium collection from customers
To generate various reminders for customers who have missed their payment via
email and postal mail. 59

20/05/2019 Big data fundamentals


IT ENVIRONMENT – FUNCTIONS OF EACH SYSTEM
ERP system
Day-to-day running of ETI, including human resource management and accounts
CRM system
To record all aspects of customer communication via phone, email and postal mail
To serve as a portal for call center agents for dealing with customer enquiries.
To allow the marketing team to create, run and manage marketing campaigns.
= = > Data from these above operational systems is exported to an Enterprise
Data Warehouse (EDW)
To generate reports for financial and performance analysis.
To generate reports for different regulatory authorities to ensure continuous
regulatory compliance. 60

20/05/2019 Big data fundamentals


BUSINESS GOALS AND OBSTACLES
Over the past few decades, ETI is suffering the falling share price and decrease in market share.
A committee comprised of senior managers was formed to investigate and make recommendations.

Main reason Consequence


The existing regulations change and new regulations are
introduced very fast and frequently. But the company is slow to Had to pay heavy fines
respond and has not been able to ensure full and continuous
compliance.

The insurance plans are created and policies are underwritten


without a thorough risk assessment Reducing the profit made on
investments
- - > incorrect premiums being set and more payoutsbeing
made than anticipated

Customers whose circumstances


The insurance plans are generally based on the actuaries’ deviate from the average set are not
experience and analysis of the population as a whole interested in such insurance plans.
- - > only apply to an average set ofcustomers

20/05/2019 Big data fundamentals


Main reason C onsequence

Direct monetary loss + indirect


The increased number of complex and hard-to-detect
loss (due to the costs related to
fraudulent claims and the associated payments being
the processing of fraudulent
made against them
claims)

A significant increase of natural disasters such as floods,


storms and epidemics Loss in revenue
- - > increasing the number of high-end genuine claims

Customer defection due to slow claims processing and Loss in the number of customer
insurance products that no longer match the needs of + declines in revenue
customers

The emergence of tech-savvy competitors that employ Loss in the number of customer
the use of telematics to provide personalized policies + declines in revenue

20/05/2019 Big data fundamentals


STRATEGIC GOALS TO IMPROVE PROFITABILITY
1. Decrease losses by:
(a) improving risk evaluation and maximizing risk mitigation, which applies to both creation of
insurance plans and when new applications are screened at the time of issuing a policy,
(b) implementing a proactive catastrophe management system that decreases the number of potential
claims resulting from a calamity, and
(c) detecting fraudulent claims.

2. Decrease customer defection and improve customer retention with:


(a) speedy settlement of claims and
(b) personalized and competitive policies based on individual circumstances rather than demographic
generalization alone.

3. Achieve and maintain full regulatory compliance at all times by employing enhanced risk 63

management techniques that can better predict risks

20/05/2019 Big data fundamentals


OBSTACLES TO ADOPT A DATA-DRIVEN IT SOLUTION
Acquiring, storing and processing unstructured data from internal and external
data sources – Currently, only structured data is stored and processed
Processing large amounts of data in a timely manner – The amount of data
processed cannot be classified as large, and the reports take a long time to
generate.
Processing multiple types of data and combining structured data with
unstructured data – Unstructured data such as documents and call center logs
that cannot currently be processed, while structured data is used in isolation for
all types of analyses.

= = > a recommendation that ETI should adopt Big Data 64

20/05/2019 Big data fundamentals


BIG DATA ADOPTION - CASE STUDY EXAMPLE
1. IT team and skills for Big Data implementation
Problems
No in-house Big Data skills
Have to choose between hiring a Big Data consultant or sending its IT team on a Big Data training course.
Solutions
Sending only the senior IT team members to the Big Data training course.
For long-term plan, this trained team members will become a permanent in-house Big Data resource and can also
train junior team members to further increase the in-house Big Data skillset.
2. During Big Data training course
Problems
No common vocabulary of terms
Lack of business exposure and understanding BI and the establishment of appropriate KPIs
Solutions
Building a terms glossary for datasets including claims, policies, quotes, customer profile data and census data. 65

Explaining BI by using the monthly report generation process for evaluating the previous month’s performance as
an example
20/05/2019 Big data fundamentals
BIG DATA ADOPTION - CASE STUDY EXAMPLE
3. Data Analytics
Deciding to use of both descriptive and diagnostic analytics
Descriptive analytics is for:
querying the policy administration system to determine the number of polices sold each day
querying the claims management system to find out how many claims are submitted daily
querying the billing system to find out how many customers are behind on their premium payments.
Diagnostic analytics is for
various BI activities, such as performing queries to answer questions such as why last month’s sales target was not met.
performing drill-down operations to breakdown sales by type and location so that it can be determined which locations
underperformed for specific types of policies.
In the future, utilizing predictive and prescriptive analytics in a gradual manner by first implementing predictive analytics
and then slowly building up their capabilities to implement prescriptive analytics.
predictive analytics will enable detection of fraudulent claims by predicting which claim is a fraudulent one and in case
of customer defection by predicting which customers are likely to defect.
later, via prescriptive analytics, prescribing the correct premium amount considering all risk factors or prescribing the
best course of action to take for mitigating claims when faced with catastrophes, such as floods or storms. 66

20/05/2019 Big data fundamentals


BIG DATA ADOPTION - CASE STUDY EXAMPLE
4. Identifying Data Characteristics
Volume
A large amount of transactional data is generated as a result of processing claims, selling
new policies and changes to existing policies.
A large volumes of unstructured data, both inside and outside the company, including
health records, documents submitted by the customers at the time of submitting an
insurance application, property schedules, fleet data, social media data and weather data.
Velocity
For in-flow data, some is low velocity (such as the claims submission data and the new
policies issued data), some is high (such as webserver logs and insurance quotes).
For out-flow data, social media data and the weather data may arrive at a fast pace.
For catastrophe management and fraudulent claim detection, data needs to be processed
quickly to minimize losses. 67

20/05/2019 Big data fundamentals


BIG DATA ADOPTION - CASE STUDY EXAMPLE
4. Identifying Data Characteristics
Variety
Have to incorporate a range of datasets that include health records, policy data, claim data,
quote data, social media data, call center agent notes, claim adjuster notes, incident
photographs, weather reports, census data, webserver logs and emails.
Veracity
Inside ETI’s boundary, data has high veracity thanks to data validation performed at multiple
stages such as data entry, function-level input validation, data persistence.
Outside ETI’s boundary, data has low veracity (such as social media data and weather data)
that requires an increased level of data validation and cleansing
Value
68

Have to draw maximum value out of the available datasets by ensuring the datasets are
stored in their original form and that they are subjected to the right type of analytics.
20/05/2019 Big data fundamentals
BIG DATA ADOPTION - CASE STUDY EXAMPLE
5. Identifying Types of Data
Structured data: policy data, claim data, customer profile data and quote data.
Unstructured data: social media data, insurance application documents, call center agent
notes, claim adjuster notes and incident photographs.
Semi-structured data: health records, customer profile data, weather reports, census data,
webserver logs and emails.
Metadata is a new concept as ETI’s current data management procedures do not create nor
append any metadata.
Why? - - > Because all data in ETI is stored and processed is structured in nature and originates
from within the company. Hence, the origins and the characteristics of data are implicitly known.
Solution - - > for the structured data, the data dictionary and the existence of last updated
timestamp and last updated user-id columns within the different relational database tables can be 69
used as metadata.
20/05/2019 Big data fundamentals
THANK YOU

70

You might also like