0% found this document useful (0 votes)

52 views8 pages

44 Recognizing Your Data Types: Structured and Unstructured Data

This document discusses different types of data that are important to consider for predictive analytics projects. It describes structured versus unstructured data, with structured data being well-organized and easy for computers to analyze, while unstructured data is free-form and requires more preprocessing. It also discusses static versus streamed data, with static data being self-contained and streamed data changing continuously in real-time. The document provides examples and comparisons of these different data types to help categorize data sources.

Uploaded by

Manjush Rangaswamy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

52 views8 pages

44 Recognizing Your Data Types: Structured and Unstructured Data

Uploaded by

Manjush Rangaswamy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

44 Part I: Getting Started with Predictive Analytics

Recognizing Your Data Types

If your company is like most others, you’ve gathered a large amount of data
through the years — simply as a result of operating a business. Some of
this data can be found in your databases; some may be scattered across
hard drives on your company’s computers or in its online content.

Your raw data may consist of presentations, individual text files, images,
audio and video files, and e-mails — for openers.

The sheer amount of this data can be overwhelming. If you categorize it,
however, you create the core of any predictive analytics effort. The more
you learn about your data, the better able you are to analyze and use it.
You can start by getting a good working knowledge of your data types — in
particular, structured versus unstructured data, and streamed versus static
data. The upcoming sections give you a closer look at these data types.

Structured and unstructured data

Data contained in databases, documents, e-mails, and other data files can be
categorized either as structured or unstructured data.

Structured data is well organized, follows a consistent order, is relatively easy

to search and query, and can be readily accessed and understood by a
person or a computer program.

A classic example of structured data is an Excel spreadsheet with labeled

columns. Such structured data is consistent; column headers — usually
brief, accurate descriptions of the content in each column — tell you
exactly what kind of content to expect. In a column labeled e-mail address,
for example, you can count on finding a list of (no surprise here) e-mail
addresses. Such overt consistency makes structured data amenable to
automated data management.

Structured data is usually stored in well-defined schemas such as databases.

It’s usually tabular, with columns and rows that clearly define its attributes.

Unstructured data, on the other hand, tends to be free-form, non-tabular,

dispersed, and not easily retrievable; such data requires deliberate
intervention to make sense of it. Miscellaneous e-mails, documents,
web pages, and files (whether text, audio, and/or video) in scattered
locations are examples of unstructured data.

It’s hard to categorize the content of unstructured data. It tends to be mostly

text, it’s usually created in a hodgepodge of free-form styles, and finding any
attributes you can use to describe or group it is no small task.
Chapter 3: Exploring Your Data Types and Associated Techniques 45
The content of unstructured data is hard to work with or make sense of
programmatically. Computer programs cannot analyze or generate
reports on such data, simply because it lacks structure, has no
underlying dominant characteristic, and individual items of data
have no common ground.

In general, there’s a higher percentage of unstructured data than structured

data in the world. Unstructured data requires more work to make it useful,
so it gets more attention — thus tends to consume more time. No wonder
the promise of a processing capability that can swiftly make sense of huge
bodies of unstructured data is a major selling point for predictive analytics.

Don’t underestimate the importance of structured data and the power it

brings to your analysis. It’s far more efficient to analyze structured data
than to analyze unstructured data. Unstructured data can also be costly
to preprocess for analysis as you’re building a predictive analytics
project. The selection of relevant data, its cleansing, and subsequent
transformations can be lengthy and tedious. The resultant newly
organized data from those necessary preprocessing steps can then be
used in a predictive analytics model. The wholesale transformation
of unstructured data however, may have to wait until you have your
predictive analytics model up and running.

Data mining and text analytics are two approaches to structuring text

documents, linking their contents, grouping and summarizing their
data, and uncovering patterns in that data. Both disciplines provide
a rich framework of algorithms and techniques to mine the text
scattered across a sea of documents.

It’s also worth noting that search engine platforms provide readily available
tools for indexing data and making it searchable.

Table 3-1 compares structured and unstructured data.

Table 3-1 Characteristics of Structured and Structured Data

Characteristics Structured Unstructured
Association Organized Scattered and dispersed
Appearance Formally defined Free-form
Accessibility Easy to access and query Hard to access and query
Availability Percentagewise lower Percentagewise higher
Analysis Efficient to analyze Additional preprocessing
is needed
46 Part I: Getting Started with Predictive Analytics

Unstructured data does not completely lack structure — you just have to ferret
it out. Even the text inside digital files still has some structure associated with
it, often showing up in the metadata — for example, document titles, dates the
files were last modified, and their authors’ names. The same thing applies for
e-mails: The contents may be unstructured, but structured data is associated
with them — for example, the date and time they were sent, the names of their
senders and recipients, whether they contain attachments.

The idea here is that you can still find some order you can use while you’re
going through all that “unstructured data”. Of course, you may have to do
some digging. The content of a thread of 25 e-mails shooting back and forth
between two recipients may wander away from the subject line of the first
original e-mail, even if the subject line stays the same. Additionally, the
very first subject line in that e-mail thread may not accurately reflect even
the content of that very first e-mail. (For example, the subject line may say
something as unhelpful as “Hi, there!”)

The separation line between the two data types isn’t always clear. In general,
you can always find some attributes of unstructured data that can be
considered structured data. Whether that structure is reflective of the
content of that data — or useful in data analysis — is unclear at best.
For that matter, structured data can hold unstructured data within it. In a
web form, for example, users may be asked to give feedback on a product
by choosing an answer from multiple choices — but also presented with
a comment box where they can provide additional feedback. The answers
from multiple choices are structured; the comment field is unstructured
because of its free-form nature. Such cases are best understood as a mix
of structured and unstructured data. Most data is a composite of both.

Technically speaking, there will always be some exceptions in defining data

categories; the lines between the two can be blurry. But the idea is to make
a useful distinction between structured and unstructured data — and that is
almost always possible.

For a successful predictive analytics project, both your structured and unstruc-
tured data must be combined in a logical format that can be analyzed.

Static and streamed data

Data can also be identified as streamed, static, or a mix of the two. Streamed
data changes continuously; examples include the constant stream of
Facebook updates, tweets on Twitter, and the constantly changing
stock prices while the market is still open.

Streamed data is continuously changing; static data is self-contained and

enclosed. The problems associated with static data include gaps, outliers,
or incorrect data, all of which may require some cleansing, preparation,
and preprocessing before you can use static data for an analysis.
Chapter 3: Exploring Your Data Types and Associated Techniques 47
As with streamed data, other problems may arise. Volume can be a problem; the
sheer amount of non-stop data constantly arriving can be overwhelming. And
the faster the data is streaming in, the harder it is for the analysis to catch up.

The two main models for analyzing streamed data are as follows:

✓ Examine only the newest data points and make a decision about the state
of the model and its next move. This approach is incremental — essentially
building up a picture of the data as it arrives.
✓ Evaluate the entire dataset, or a subset of it, to make a decision each
time new data points arrive. This approach is inclusive of more data
points in the analysis — what constitutes the “entire” dataset changes
every time new data is added.

Depending on the nature of your business and the anticipated impact of the
decision, one model is preferable over the other.

Some business domains, such as the analysis of environmental, market, or

intelligence data, prize new data that arrives in real time. All this data must
be analyzed as it’s being streamed — and interpreted not only correctly but
right away. Based on the newly available information, the model redraws the
whole internal representation of the outside world. Doing so provides you
with the most up-to-date basis for a decision you may need to make and act
upon quickly.

For example, a predictive analytics model may process a stock price as a

data feed, even while the data is rapidly changing, analyze the data in the
context of immediate market conditions existing in real time, and then decide
whether to trade a particular stock.

Clearly, analyzing streamed data differs from analyzing static data. Analyzing
a mix of both data types can be even more challenging.

Identifying Data Categories

As a result of doing business, companies have gathered masses of data about
their business and customers, often referred to as business intelligence. To
help you develop categories for your data, what follows is a general rundown
of the types of data that are considered business intelligence:

Behavioral data derives from transactions, and can be collected

automatically:

✓ Items bought
✓ Methods of payment
✓ Whether the purchased items were on sale
48 Part I: Getting Started with Predictive Analytics

✓ The purchasers’ access information:

t "EESFTT
t 1IPOFOVNCFS
t &NBJMBEESFTT
All of us have provided such data when making a purchase online (or even
when buying at a store or over the phone).

Other types of data can be collected from customers with their co-operation:

✓ Data provided by customers when they fill out surveys

✓ Customers’ collected answers to polls via questionnaires
✓ Information collected from customers who make direct contact with
companies
t *OBQIZTJDBMTUPSF
t 0WFSUIFQIPOF
t 5ISPVHIUIFDPNQBOZXFCTJUF

In addition, the type of data that a business collects from its operations can
provide information about its customers. Common examples include the
amount of time that customers spend on company websites, as well as
customers’ browsing histories. All that data combined can be analyzed to
answer some important questions:

✓ How can your business improve the customer experience?

✓ How can you retain existing customers and attract new ones?
✓ What would your customer base like to buy next?
✓ What purchases can you recommend to particular customers?

The first step toward answering these questions (and many others) is to
collect and use all customer-related operations data for a comprehensive
analysis. The data types that make up such data can intersect and could
be described and/or grouped differently for the purposes of analysis.

Some companies collect these types of data by giving customers personal-

ized experiences. For example, when a business provides its customers with
the tools they need to build personalized websites, it not only empowers
customers (and enriches their experience of dealing with the company), it
also allows the company to learn from a direct expression of its customers’
wants and needs: the websites they create.
Chapter 3: Exploring Your Data Types and Associated Techniques 49
Attitudinal data
Any information that can shed light on how customers think or feel is
considered attitudinal data.

When companies put out surveys that ask their customers for feedback and
their thoughts about their line of businesses and products, the collected
data is an example of attitudinal data.

Attitudinal data has a direct impact on the type of marketing campaign

a company can launch. It helps shape and target the message of that
campaign. Attitudinal data can help make both the message and
the products more relevant to the customers’ needs and wants —
allowing the business to serve existing customers better and attract
prospective ones.

The limitation of attitudinal data is a certain imperfection: Not everyone

objectively answers survey questions, and not everyone provides all the
relevant details that shaped their thinking at the time of the survey.

Behavioral data
Behavioral data derives from what customers do when they interact with the
business; it consists mainly of data from sales transactions. Behavioral data
tends to be more reliable than attitudinal data because it represents what
actually happened.

Businesses know, for example, what products are selling, who is buying them,
and how customers are paying for them.

Behavioral data is a by-product of normal operations, so is available to a

company at no extra cost. Attitudinal data, on the other hand, requires
conducting surveys or commissioning market research to get insights
into the minds of the customers.

Attitudinal data is analyzed to understand why customers behave the way

they do, and details their views of your company. Behavioral data tells
you what is happening and records customers’ real actions. Attitudinal
data provides insight into motivations; behavioral data provides the
who-did-what — the overall context that led to customers’ particular
reactions. Your analysis should include groups for both types of data;
they are complementary.

Combining both attitudinal and behavioral data can make your predictive
analytics models more accurate by helping you define the segments of your
customer base, offer a more personalized customer experience, and identify
the drivers behind the business.
50 Part I: Getting Started with Predictive Analytics

Table 3-2 compares attitudinal and behavioral data.

Table 3-2 Comparing Attitudinal and Behavioral Data

Characteristics Attitudinal Behavioral
Data Source Customers’ thoughts Customers’ actions
Data Means Collected from surveys Collected from transactions
Data Type Subjective Objective
Data Cost May cost extra No extra cost

Demographic data
Demographic data comprises information including age, race, marital status,
education level, employment status, household income, and location. You
can get demographic data from the U.S. Census Bureau, other government
agencies, or through commercial entities.

The more data you have about your customers, the better the insight you’ll
have into identifying specific demographic and market trends as well as
how they may affect your business. Measuring the pulse of the demographic
trends will enable you to adjust to the changes and better market to, attract,
and serve those segments.

Different segments of the population are interested in different products.

Small businesses catering to specific locations should pay attention to

the demographic changes in those locations. All of us have witnessed
populations changing over time in certain neighborhoods. Businesses
must be aware of such changes; they may affect business significantly.

Demographic data, when combined with behavioral and attitudinal data,

allows marketers to paint an accurate picture of their current and
potential customers, allowing them to increase satisfaction, retention,
and acquisition.

Generating Predictive Analytics

There are two ways to go about generating or implementing predictive analytics:
purely on the basis of your data (with no prior knowledge of what you’re
after) or with a proposed business goal that the data may or may not support.
You don’t have to choose one or the other; the two approaches can be
complementary. Each has its advantages and disadvantages.
Chapter 3: Exploring Your Data Types and Associated Techniques 51
Whether you’re coming up with hypotheses to test, analyzing the results that
come out of your data analysis (and making sense of them), or starting to
examine your data with no prior assumptions of what you may find, the goal
of your analysis is always the same: to decide whether to act on what you
find. You have an active role in implementing the process needed for either
approach to predictive analytics. Both approaches to predictive analytics
have their limitations; keep risk management in mind as you cross-examine
their results. Which approach do you find to be both promising of good
results and relatively safe?

Combining both types of analysis empowers your business and enables you
to expand your understanding, insight, and awareness of your business and
your customers. It makes your decision process smarter and subsequently
more profitable.

Data-driven analytics
If you’re basing your analysis purely on existing data, you can use internal
data — accumulated by your company over the years — or external data
(often purchased from a source outside your company) that is relevant to
your line of business.

To make sense of that data, you can employ data-mining tools to overcome
both its complexity and size; reveal some patterns you were not aware of;
uncover some associations and links within your data; and use your findings
to generate new categorizations, new insights and new understanding.
Data-driven analysis can even reveal a gem or two that can radically improve
your business — all of which gives this approach an element of surprise that
feeds on curiosity and builds anticipation.

Data-driven analysis is best suited for large datasets because it’s hard
for human beings to wrap their minds around huge amounts of data.
Data-mining tools and visualization techniques help us get a closer look
and cut the overwhelming mass of data down to size. Keep these general
principles in mind:

✓ The more complete your data is, the better the outcome of data-driven
analytics. If you have extensive data that has key information to the
variables you’re measuring, and spans an extended period of time,
you’re guaranteed to discover something new about your business.
✓ Data-driven analytics is neutral because no prior knowledge about the
data is necessary and you’re not after a specific goal in particular, but
analyzing the data for the sake of it.
✓ The nature of this analysis is broad and it does not concern itself with
a specific search or validation of a preconceived idea. This approach
to analytics can be viewed as sort of random and broad data mining.

CPMAI Methodology overview
100% (2)
CPMAI Methodology overview
33 pages
artificial-intelligence-for-business-analytics-algorithms-platforms-and-application-scenarios-9783658375997-9783658375980-365837599x
No ratings yet
artificial-intelligence-for-business-analytics-algorithms-platforms-and-application-scenarios-9783658375997-9783658375980-365837599x
175 pages
Data Types and Sources
No ratings yet
Data Types and Sources
36 pages
Marketing Analytics A Comprehensive Guide Version 1.0 - Inge - 2022 - FlatWorld - 9781453398937
100% (4)
Marketing Analytics A Comprehensive Guide Version 1.0 - Inge - 2022 - FlatWorld - 9781453398937
392 pages
Structured and Unstructured Data
No ratings yet
Structured and Unstructured Data
3 pages
Structured and Unstructured Data: Learning Outcomes
100% (1)
Structured and Unstructured Data: Learning Outcomes
13 pages
Tableau VS Pro BI Report 2020 SelectHub PDF
100% (1)
Tableau VS Pro BI Report 2020 SelectHub PDF
31 pages
Structured Vs Unstructured Data
No ratings yet
Structured Vs Unstructured Data
3 pages
Unit 1 Notes Final Part A
No ratings yet
Unit 1 Notes Final Part A
82 pages
Chapter 01: Types of Digital Data
No ratings yet
Chapter 01: Types of Digital Data
79 pages
TIT 721 BI-Unit-II Study Materials
No ratings yet
TIT 721 BI-Unit-II Study Materials
38 pages
Bussiness Analytics Chep-2
No ratings yet
Bussiness Analytics Chep-2
36 pages
Database Data: Definition - Unstructured Data Is A Generic Label For Describing Any Corporate Information That Is Not
No ratings yet
Database Data: Definition - Unstructured Data Is A Generic Label For Describing Any Corporate Information That Is Not
14 pages
Chapter 2-converted BI
No ratings yet
Chapter 2-converted BI
39 pages
Chapter 2
No ratings yet
Chapter 2
39 pages
Explain Simply and Clearly
No ratings yet
Explain Simply and Clearly
11 pages
DATA ANALYTICS note
No ratings yet
DATA ANALYTICS note
52 pages
Data and Data Storage
No ratings yet
Data and Data Storage
29 pages
Module 1 - Lecture 3 - Types of Data - 16.5.2022
No ratings yet
Module 1 - Lecture 3 - Types of Data - 16.5.2022
38 pages
Dbms Harsha P
No ratings yet
Dbms Harsha P
16 pages
Unit-1 (3)
No ratings yet
Unit-1 (3)
62 pages
Unit-1-Part1-Big Data Analytics and Tools
No ratings yet
Unit-1-Part1-Big Data Analytics and Tools
12 pages
Type of Data
No ratings yet
Type of Data
44 pages
Unit I EBDP 2022
No ratings yet
Unit I EBDP 2022
80 pages
Unit 16_CRP-SEM3_Proposal 2023 Big Data for Assignment Support
100% (1)
Unit 16_CRP-SEM3_Proposal 2023 Big Data for Assignment Support
42 pages
Big data aktu unit 1
No ratings yet
Big data aktu unit 1
85 pages
Sybca Bigdata Notes
100% (1)
Sybca Bigdata Notes
11 pages
Data and Its Types
No ratings yet
Data and Its Types
40 pages
Data Analytics and Supporting Services_Module 3-1
No ratings yet
Data Analytics and Supporting Services_Module 3-1
65 pages
Module 1
No ratings yet
Module 1
27 pages
Unit I Types of Digital Data: CO1: Explain About Big Data Paradigm
No ratings yet
Unit I Types of Digital Data: CO1: Explain About Big Data Paradigm
37 pages
DA_Unit_1
No ratings yet
DA_Unit_1
44 pages
2. Big Data and Business Analytics (1)
No ratings yet
2. Big Data and Business Analytics (1)
76 pages
Digital Data
No ratings yet
Digital Data
32 pages
DA(Unit-1)
No ratings yet
DA(Unit-1)
45 pages
Course 3
No ratings yet
Course 3
22 pages
Data Types
No ratings yet
Data Types
36 pages
Practical No.10 Aim:Case Study Case Study Topic: Structureddata vs. Unstructureddata
No ratings yet
Practical No.10 Aim:Case Study Case Study Topic: Structureddata vs. Unstructureddata
5 pages
Unstructured Data
No ratings yet
Unstructured Data
2 pages
Unstructured Data
No ratings yet
Unstructured Data
13 pages
UNIT 1 INTRODUCTION TO BIGDATA by MIT
No ratings yet
UNIT 1 INTRODUCTION TO BIGDATA by MIT
12 pages
Big Data and Analytics Cse448 Module 1 L
No ratings yet
Big Data and Analytics Cse448 Module 1 L
38 pages
CSC4404 Chap3
No ratings yet
CSC4404 Chap3
84 pages
Unit4 - DataAnalytics and IoT PDF
No ratings yet
Unit4 - DataAnalytics and IoT PDF
40 pages
Big Data Unit-1 Kcs-061
No ratings yet
Big Data Unit-1 Kcs-061
64 pages
5.1. - Structured and Unstrucutred Data
No ratings yet
5.1. - Structured and Unstrucutred Data
22 pages
Harnessing The Power of Big Data For Insurance: White Paper
No ratings yet
Harnessing The Power of Big Data For Insurance: White Paper
8 pages
Slide 2
No ratings yet
Slide 2
2 pages
5.1 Data and Databases
No ratings yet
5.1 Data and Databases
14 pages
3. AI primer
No ratings yet
3. AI primer
24 pages
Chapter 2 - Types of digital data
No ratings yet
Chapter 2 - Types of digital data
12 pages
Structured vs. Unstructured Data Understanding Differences
No ratings yet
Structured vs. Unstructured Data Understanding Differences
9 pages
Unit 1: To Data Science
No ratings yet
Unit 1: To Data Science
56 pages
Structured, Semi Structured and Unstructured Data
No ratings yet
Structured, Semi Structured and Unstructured Data
13 pages
Digital Data Part 1
No ratings yet
Digital Data Part 1
5 pages
Data Categories
No ratings yet
Data Categories
4 pages
Chapter 2
67% (3)
Chapter 2
39 pages
Fundamentals of Big Data & Business Analytics
No ratings yet
Fundamentals of Big Data & Business Analytics
512 pages
Bda Module 1 Notes
No ratings yet
Bda Module 1 Notes
10 pages
Computer
No ratings yet
Computer
4 pages
Unstructured Data Vs Structured
No ratings yet
Unstructured Data Vs Structured
3 pages
CS L02 Introduction AI
No ratings yet
CS L02 Introduction AI
76 pages
Assignment On Business Analytics
No ratings yet
Assignment On Business Analytics
6 pages
Introduction To Data Management - Week 1 - 2024
No ratings yet
Introduction To Data Management - Week 1 - 2024
17 pages
The-AI-Advantage-Transforming-LD-for-Tomorrows-Workforce
No ratings yet
The-AI-Advantage-Transforming-LD-for-Tomorrows-Workforce
36 pages
1 - Course Slides - Data Science and ML Fundamentals
No ratings yet
1 - Course Slides - Data Science and ML Fundamentals
92 pages
1 - Chap 3 - Types of Digital Data
68% (19)
1 - Chap 3 - Types of Digital Data
40 pages
BoussaadaAchraf Tunisian Truck License Plate Recognition
No ratings yet
BoussaadaAchraf Tunisian Truck License Plate Recognition
92 pages
Unit I (HR 03)
No ratings yet
Unit I (HR 03)
17 pages
Unit - I: Types of Digital Data
No ratings yet
Unit - I: Types of Digital Data
5 pages
5UADM - OBE - 0523 The Final
No ratings yet
5UADM - OBE - 0523 The Final
21 pages
20210813133916D5341 - Week 1 CH 1
No ratings yet
20210813133916D5341 - Week 1 CH 1
43 pages
Big Data PHD Thesis PDF
100% (4)
Big Data PHD Thesis PDF
7 pages
2 Technology and Data
No ratings yet
2 Technology and Data
12 pages
2.6 2.7 Business Questions
No ratings yet
2.6 2.7 Business Questions
12 pages
FDW Review and Chapter Questions
No ratings yet
FDW Review and Chapter Questions
22 pages
Business Analytics Frameworks
No ratings yet
Business Analytics Frameworks
3 pages
Stroke Prediction Using Machine Learning
No ratings yet
Stroke Prediction Using Machine Learning
8 pages
Data Visualization Discovery Better Business Decisions 106672
100% (1)
Data Visualization Discovery Better Business Decisions 106672
35 pages
The Incorporation Robotics and Artificial Intelligence in Nursing Practices
No ratings yet
The Incorporation Robotics and Artificial Intelligence in Nursing Practices
8 pages
Sports Analytics
0% (1)
Sports Analytics
11 pages
Management Project Proposal - Kartik Mehta - 15A2HP441
No ratings yet
Management Project Proposal - Kartik Mehta - 15A2HP441
9 pages
Jithesh Janardhanan Resume
No ratings yet
Jithesh Janardhanan Resume
2 pages
G4 Impact of AI in Accounting Article Review
No ratings yet
G4 Impact of AI in Accounting Article Review
14 pages
Big Data Analytics
No ratings yet
Big Data Analytics
86 pages
Churn data prediction project
No ratings yet
Churn data prediction project
5 pages
Business Analytics Using R-Course Outline
No ratings yet
Business Analytics Using R-Course Outline
3 pages
Data Science and Analytics: Transforming Raw Data into Actionable Insights: A Comprehensive Guide
From Everand
Data Science and Analytics: Transforming Raw Data into Actionable Insights: A Comprehensive Guide
Marlowe Reyes
No ratings yet
Data Structures & Algorithms Interview Questions You'll Most Likely Be Asked
From Everand
Data Structures & Algorithms Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
1/5 (1)
Concept Mining: Fundamentals and Applications
From Everand
Concept Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet

44 Recognizing Your Data Types: Structured and Unstructured Data

Uploaded by

44 Recognizing Your Data Types: Structured and Unstructured Data

Uploaded by

44 Part I: Getting Started with Predictive Analytics

Recognizing Your Data Types

Structured and unstructured data

Structured data is well organized, follows a consistent order, is relatively easy

A classic example of structured data is an Excel spreadsheet with labeled

Structured data is usually stored in well-defined schemas such as databases.

Unstructured data, on the other hand, tends to be free-form, non-tabular,

It’s hard to categorize the content of unstructured data. It tends to be mostly

In general, there’s a higher percentage of unstructured data than structured

Don’t underestimate the importance of structured data and the power it

Data mining and text analytics are two approaches to structuring text

Table 3-1 compares structured and unstructured data.

Table 3-1 Characteristics of Structured and Structured Data

Technically speaking, there will always be some exceptions in defining data

Static and streamed data

Streamed data is continuously changing; static data is self-contained and

Some business domains, such as the analysis of environmental, market, or

For example, a predictive analytics model may process a stock price as a

Identifying Data Categories

Behavioral data derives from transactions, and can be collected

✓ The purchasers’ access information:

✓ Data provided by customers when they fill out surveys

✓ How can your business improve the customer experience?

Some companies collect these types of data by giving customers personal-

Attitudinal data has a direct impact on the type of marketing campaign

The limitation of attitudinal data is a certain imperfection: Not everyone

Behavioral data is a by-product of normal operations, so is available to a

Attitudinal data is analyzed to understand why customers behave the way

Table 3-2 compares attitudinal and behavioral data.

Table 3-2 Comparing Attitudinal and Behavioral Data

Different segments of the population are interested in different products.

Small businesses catering to specific locations should pay attention to

Demographic data, when combined with behavioral and attitudinal data,

Generating Predictive Analytics

You might also like