TECH 4070-Ch02

Download as pdf or txt
Download as pdf or txt
You are on page 1of 33

Business

Intelligence,
Analytics,
and Data Chapter 2
Science: A Descriptive Analytics I:
Nature of Data, Statistical
Managerial Modeling, and Visualization
Perspective
Learning Objectives (1 of 2)

2.1 Understand the nature of data as it relates to business


intelligence (BI) and analytics
2.2 Learn the methods used to make real-world data
analytics ready
2.3 Describe statistical modeling and its relationship to
business analytics
2.4 Learn about descriptive and inferential statistics
2.5 Define business reporting, and understand its historical
evolution
Learning Objectives (2 of 2)

2.6 Understand the importance of data/information


visualization
2.7 Learn different types of visualization techniques
2.8 Appreciate the value that visual analytics brings to
business analytics
2.9 Different type of Charts and Graphs
The Nature of Data (1 of 2)

• Data: a collection of facts


– usually obtained as the result of experiences,
observations, or experiments
• Data may consist of numbers, words, images, …
• Data is the lowest level of abstraction (from which
information and knowledge are derived)
• Data is the source for information and knowledge
• Data quality and data integrity → critical to analytics
The Nature of Data (2 of 2)
Enterprises Resource Planning
Customer Relationship Management
Supply Chain Management
Universal Bata Bank
Metrics for Analytics Ready Data
• Data source reliability
• Data content accuracy
• Data accessibility
• Data security and data privacy
• Data richness
• Data consistency
• Data currency/data timeliness
• Data granularity
• Data validity and data relevancy
A Simple Taxonomy of Data (1 of 2)

• Data (datum - singular form of data): facts


• Structured data
– Targeted for computers to process
– Numeric versus nominal
• Unstructured/textual data
– Targeted for humans to process/digest
• Semi-structured data?
– XML, HTML, Log files, etc.
• Data taxonomy…
A Simple Taxonomy of Data (2 of 2)
Application Case 2.1
Medical Device Company Ensures Product Quality
While Saving Money
Questions for Discussion
1. What were the main challenges for the medical device
company?
2. Were they market or technology driven?
3. What was the proposed solution?
4. What were the results?
5. What do you think was the real return on investment
(ROI)?
The Art and Science of Data Preprocessing (1 of 2)

• The real-world data is dirty, misaligned, overly complex,


and inaccurate
– Not ready for analytics!
• Readying the data for analytics is needed
– Data preprocessing
▪ Data consolidation
▪ Data cleaning
▪ Data transformation
▪ Data reduction
• Art – it develops and improves with experience
The Art and Science of Data
Preprocessing (2 of 2)

• Data reduction
1. Variables
– Dimensional reduction
– Variable selection
2. Cases/samples
– Sampling
– Balancing / stratification
Data Preprocessing Tasks and Methods (1 of 3)

Table 2.1 A Summary of Data Preprocessing Tasks and Potential


Methods
Main Task Subtasks Popular Methods
Data Access and collect the data SQL queries, software agents, Web services.
consolidation Select and filter the data Domain expertise, SQL queries, statistical tests.
Integrate and unify the data SQL queries, domain expertise, ontology-driven data
mapping.
Data Handle missing values in Fill in missing values (imputations) with most
cleaning the data appropriate values (mean, median, min/max, mode,
etc.); recode the missing values with a constant such as
“ML”; remove the record of the missing value; do
nothing.
Data Identify and reduce noise in Identify the outliers in data with simple statistical
cleaning the data techniques (such as averages and standard deviations)
or with cluster analysis; once identified, either remove
the outliers or smooth them by using binning,
regression, or simple averages.
Data Preprocessing Tasks and Methods (2 of 3)
Main Task Subtasks Popular Methods
Data cleaning Find and Identify the erroneous values in data (other than
eliminate outliers), such as odd values, inconsistent class
erroneous data labels, odd distributions; once identified, use domain
expertise to correct the values or remove the records
holding the erroneous values.
Data Normalize the Reduce the range of values in each numerically
transformation data valued variable to a standard range (e.g., 0 to 1 or -1
to +1) by using a variety of normalization or scaling
techniques.
Data Discretize or If needed, convert the numeric variables into
transformation aggregate the discrete representations using range-or
data frequency-based binning techniques; for categorical
variables, reduce the number of values by applying
proper concept hierarchies.
Data Preprocessing Tasks and Methods (3 of 3)
Main Task Subtasks Popular Methods
Data Construct new Derive new and more informative variables from the
transformation attributes existing ones using a wide range of mathematical
functions (as simple as addition and multiplication or
as complex as a hybrid combination of log
transformations).
Data reduction Reduce number Principal component analysis, independent
of attributes component analysis, chi-square testing, correlation
analysis, and decision tree induction.
Data reduction Reduce number Random sampling, stratified sampling, expert-
of records knowledge-driven purposeful sampling.
Data reduction Balance skewed Oversample the less represented or undersample
data the more represented classes.
Statistical Modeling for Business Analytics (1 of 2)
Statistical Modeling for Business Analytics (2 of 2)

• Statistics
– A collection of mathematical techniques to
characterize and interpret data
• Descriptive Statistics
– Describing the data (as it is)
• Inferential statistics
– Drawing inferences about the population based on
sample data
• Descriptive statistics for descriptive analytics
Descriptive Statistics Measures of Centrality
Tendency

• Arithmetic mean

x1 + x2 +    + xn 
n
x
x = x = i =1 i
n n
• Median
– The number in the middle
• Mode
– The most frequent observation
Descriptive Statistics Measures of
Dispersion (1 of 2)

• Dispersion
– Degree of variation in a given
variable
• Range
– Max - Min
Standard Deviation
• Variance

n

n
( xi − x) 2 ( xi − x) 2

s =
2 i =1 s = i =1

n −1 n −1
• Mean Absolute Deviation (MAD)
– Average absolute deviation from the mean
Descriptive Statistics Measures of
Dispersion (2 of 2)
• Quartiles
• Box-and-Whiskers Plot
– a.k.a. box-plot
– Versatile / informative
Descriptive Statistics Shape of a Distribution

• Histogram – frequency chart


• Skewness
– Measure of asymmetry

i =1 i
n
( x − x ) 3

Skewness = S =
(n − 1) s 3
• Kurtosis
– Peak/tall/skinny nature of the distribution

i =1 i
n
( x − x ) 4

Kurtosis = K = 4
− 3
ns
Relationship Between Dispersion and
Shape Properties
Technology Insights 2.1 (1 of 2)
Descriptive Statistics in Excel
Technology Insights 2.1 (2 of 2)
Descriptive Statistics in Excel Creating box-plot in Microsoft Excel
Business Reporting Definitions and Concepts

• Report = Information → Decision


• Report?
– Any communication artifact prepared to convey
specific information
• A report can fulfill many functions
– To ensure proper departmental functioning
– To provide information
– To provide the results of an analysis
– To persuade others to act
– To create an organizational memory…
What is a Business Report?

• A written document that contains information regarding


business matters.
• Purpose: to improve managerial decisions
• Source: data from inside and outside the organization
(via the use of ETL)
• Format: text + tables + graphs/charts
• Distribution: in-print, email, portal/intranet
Data acquisition → Information generation → Decision
making → Process management
Business Reporting
Types of Business Reports

• Metric Management Reports


– Help manage business performance through metrics
(SLAs –Service Level Agreements for externals; KPIs
Key Performance Indicator for internals)
– Can be used as part of Six Sigma and/or TQM –Total
Quality Management
• Dashboard-Type Reports
– Graphical presentation of several performance
indicators in a single page using dials/gauges
• Balanced Scorecard–Type Reports
– Include financial, customer, business process, and
learning & growth indicators
Different types of Chart and Graphs.
Which Chart or Graph Should You Use?
An Example Gapminder Chart Wealth and
Health of Nations

See gapminder.org for Interesting animated examples


End of Chapter 2

You might also like