0% found this document useful (0 votes)
15 views52 pages

Data in BI

This document provides an overview of data and data types in data mining. It discusses the goal of business intelligence to help decision-makers. Most business data is unstructured or semi-structured. Structured data is what data mining algorithms use. Data can be categorical, such as nominal and ordinal data, or numerical, such as interval and ratio data. Unstructured and semi-structured data must be converted to structured data before data mining. Descriptive statistics describe data through measures like mean and standard deviation, while inferential statistics make inferences about populations from samples.

Uploaded by

maryam nabilah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views52 pages

Data in BI

This document provides an overview of data and data types in data mining. It discusses the goal of business intelligence to help decision-makers. Most business data is unstructured or semi-structured. Structured data is what data mining algorithms use. Data can be categorical, such as nominal and ordinal data, or numerical, such as interval and ratio data. Unstructured and semi-structured data must be converted to structured data before data mining. Descriptive statistics describe data through measures like mean and standard deviation, while inferential statistics make inferences about populations from samples.

Uploaded by

maryam nabilah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 52

Topic 2

Data in Data Mining

ISP642 – BUSINESS
INTELLIGENCE
Learning Objectives
 Understand the concepts of data and the various sources
of data
 Understand unstructured, structured and semi-structured
data
 Understand simple taxonomy of data in data mining
BI Revisited
 The goal of BI
 is to help decision-makers make more informed
and better decisions to guide the business

 Decisions are made based on previous and


current data
Data
 a collection of facts usually resulted from
 Experiences
 Observations
 Experiments
 lowest level of abstraction (from which
information and knowledge are derived)
 Data may consist of
 Numbers
 Words
 mages
Sources of Data
 Abundant data available from:
 Business : Web, e-commerce, transactions, stock
 Science : Remote sensoring, bioinformatics, scientific
simulation
 Society and everyone : news, digital cameras
Firms and DATA
 Firms have a vast amount of data available to them
from a huge range of sources.
 BI tools
 are designed to help firms sift useful information from
all the sources,
 but, as with any tool, they should be used intelligently
to achieve optimal results
Digital data
 Types of digital data
 Structured
 Unstructured
 Semi-structured
 80–90% of business data is either unstructured or
semi-structured (Merrill Lynch)
 Difficult to extract information
Formats of Data

Source: Prasad & Acharya (2011)


Taxonomy of Data in Data Mining
Unstructured Data

Source: Prasad & Acharya (2011)


Example of Unstructured Data

Source: Prasad & Acharya (2011)


How to Store Unstructured Data?

Source: Prasad & Acharya (2011)


How to Store Unstructured Data?

Source: Prasad & Acharya (2011)


How to Extract Information from
Unstructured Data?

Source: Prasad & Acharya (2011)


How to Extract Information from
Unstructured Data?
Semi-structured Data
Where does Semi-structured Data
Come from?
How to Manage Semi-structured
Data?
How to Store Semi-structured
Data?
How to Store Semi-structured
Data?
How to Extract Information from
Semi-structured Data?
How to Extract Information from
Semi-structured Data?
XML – A Solution for Semi-
structured Data Management
XML – A Solution for Semi-
structured Data Management
Structured Data
Where does Structured Data Come
from?
Structured Data: Everything in its
Place
Semi-structured to Structured
Ease with Structured Data-Storage
Ease with Structured Data-
Retrieval
Why Structured Data?
 Structured data is what data mining algorithms use.
Categorical and Numerical

Data

Categorical Numerical

Nominal Ordinal Interval Ratio


Categorical Data
 (Structured) → Categorical
 Categorical data (or also known as discrete data)
 represent the labels of multiple classes used to divide a
variable into specific group.
 Examples:
 Race, sex, age, group, educational level
 Further divided into Data
 Nominal data
 Ordinal data Categorical Numerical

Nominal Ordinal Interval Ratio


Nominal Data
 (Structured) → (Categorical) → Nominal
 contains measurements of simple codes assigned to objects as
labels, which are not measurements.
 Examples:
 Marital status
 Single
 Married, or
 Divorced
 yes/no Data
 true/false
 good/bad Categorical Numerical
 Red/green/blue
Nominal Ordinal Interval Ratio
Ordinal Data
 (Structured) → (Categorical) → Ordinal
 contains codes assigned to objects or events as labels that also
represent the rank order among them.
 Examples:
 Credit score
 Low
 Medium
 High
 Age group
 Child Data
 Young
 Educational level
Categorical Numerical
 High school
 College
Nominal Ordinal Interval Ratio
Numerical Data
 (Structured) → (Numerical)
 Represent numeric values of specific variables
 Examples:
 Age
 Number of children
 Total household income (in US Dollars)
 Travel distance (in miles)
 Temperature (in Fahrenheit degrees)
Data
 Can be
 Integer
Categorical Numerical
 Real number

Nominal Ordinal Interval Ratio


Interval Data
 (Structured) → (Numerical) → Interval
 Variables that can be measured on interval scale
 There is no absolute zero value
 Examples:
 Temperature (in Celsius scale)
 Difference between melting temperature and boiling temperature
 My level of happiness, rated from 1 to 10
 Time
Data

Categorical Numerical

Nominal Ordinal Interval Ratio


Ratio Data
 (Structured) → (Numerical) → Ratio
 Measurement variables commonly found in
 physical sciences and
 engineering
 Examples:
 Mass
 Length
 Time
 Energy Data

 Electric charge
 Height Categorical Numerical

 Weight
Nominal Ordinal Interval Ratio
Unstructured & Semi-structured Data
 Other data types (Qualitative)
 Textual
 Spatial
 Imagery
 Voice
 MUST BE CONVERTED into structured
 Then only they can be processed by data mining
algorithms.
Why is this Important?
 Important to understand the data
 To perform operation / statistical analysis
Summary of Data Operations
Statistical Analysis from Data
Statistical Methods
Descriptive vs. Inferential
Statistics
Descriptive Statistics (example)
 Asking 35 people their favorite ice cream flavors
Descriptive Analysis
 Univariate Analysis
 the distribution
 Frequency distribution
 the central tendency
 Mean
 Median
 Mode
 the dispersion
 Range
 Standard deviation
Inferential Statistics
 Confidence Interval:
 when you want to estimate
a population parameter
 Mean difference

 Significance Testing
 to assess the evidence
provided by the data in
favor of hypothesis
 T-Tests
Inferential Analysis
Descriptive Statistical Analysis:
Others
END OF TOPIC
50

 THANK YOU FOR YOUR ATTENTION


Next Topic
References
52

1. RN Prasad & Seema Acharya(2011), Fundamentals of Business


Analytics, Wiley India Pvt. Ltd
2. Ramesh Sharda, Dursun Delen & Efraim Turban (2014), Business
Intelligence and Analytics, 10th ed., Pearson Education Ltd.
3. Nathan Yau, (2013), Data Points: Visualization That Means
Something, Wiley.
4. http://www.graphpad.com/support/faqid/1089/
5. https://statistics.laerd.com/statistical-guides/hypothesis-testing.php
6. http://www.mymarketresearchmethods.com/descriptive-inferential-statis
tics-difference/

You might also like