0% found this document useful (0 votes)
32 views57 pages

Learning Objectives: Amity Global Business School

The document discusses the concepts and processes involved in business intelligence and data mining, including defining key terms, describing common data mining methodologies like CRISP-DM and SEMMA, and providing examples of data mining applications in various industries like banking, retail, and healthcare. The focus is on introducing learners to the objectives, benefits, and wide-ranging uses of business analytics and data mining techniques.

Uploaded by

samar1976
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views57 pages

Learning Objectives: Amity Global Business School

The document discusses the concepts and processes involved in business intelligence and data mining, including defining key terms, describing common data mining methodologies like CRISP-DM and SEMMA, and providing examples of data mining applications in various industries like banking, retail, and healthcare. The focus is on introducing learners to the objectives, benefits, and wide-ranging uses of business analytics and data mining techniques.

Uploaded by

samar1976
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 57

AMITY GLOBAL

BUSINESS SCHOOL Chandigarh

Learning Objectives
• Describe the business intelligence (B I) methodology and
concepts.
• Define data mining as an enabling technology for business
intelligence
• Understand the objectives and benefits of business
analytics and data mining
• Recognize the wide range of applications of data mining
• Learn the standardized data mining processes
– CRISP-DM
– SEMMA
– KDD
AMITY GLOBAL
BUSINESS SCHOOL Chandigarh

Learning Objectives
• Understand the steps involved in data preprocessing for
data mining
• Learn different methods and algorithms of data mining
• Build awareness of the existing data mining software tools
• Commercial versus free/open source
• Understand the pitfalls and myths of data mining
AMITY GLOBAL
BUSINESS SCHOOL Chandigarh

A Framework for Business Intelligence


• Definitions of business intelligence (B I)
• A brief history of B I
• The architecture of B I
– Data warehousing (D W) [as a foundation of B I]
– Business performance management (BP M)
– User interface (dashboard)
AMITY GLOBAL
BUSINESS SCHOOL Chandigarh

Evolution of Business Intelligence(B I)


AMITY GLOBAL
BUSINESS SCHOOL Chandigarh

The Origins and Drivers of B I


A High-Level Architecture of B I.
AMITY GLOBAL
BUSINESS SCHOOL Chandigarh

Data Mining Concepts/Definitions


Why Data Mining?
• More intense competition at the global scale.
• Recognition of the value in data sources.
• Availability of quality data on customers, vendors,
transactions, Web, etc.
• Consolidation and integration of data repositories into
data warehouses.
• The exponential increase in data processing and storage
capabilities; and decrease in cost.
• Movement toward conversion of information resources
into nonphysical form.
AMITY GLOBAL
BUSINESS SCHOOL Chandigarh

Definition of Data Mining


• The nontrivial process of identifying valid, novel,
potentially useful, and ultimately understandable
patterns in data stored in structured databases.
- Fayyad et al., (1996)
• Keywords in this definition: Process, nontrivial,
valid, novel, potentially useful, understandable.
• Data mining: a misnomer?
• Other names: knowledge extraction, pattern
analysis, knowledge discovery, information
harvesting, pattern searching, data dredging,…
AMITY GLOBAL
BUSINESS SCHOOL Chandigarh

Data Mining is at the Intersection of Many


Disciplines

Pattern
Recognition

DATA Machine
MINING Learning

Mathematical
Modeling Databases

Management Science &


Information Systems
AMITY GLOBAL
BUSINESS SCHOOL Chandigarh

Data Mining Characteristics/Objectives


• Source of data for DM is often a consolidated data
warehouse (not always!).
• DM environment is usually a client-server or a
Web-based information systems architecture.
• Data is the most critical ingredient for DM which
may include soft/unstructured data.
• The miner is often an end user
• Striking it rich requires creative thinking
• Data mining tools’ capabilities and ease of use are
essential (Web, Parallel processing, etc.)
AMITY GLOBAL
BUSINESS SCHOOL Chandigarh

Data in Data Mining


• Data: a collection of facts usually obtained as the result of
experiences, observations, or experiments.
• Data may consist of numbers, words, images, …
• Data: lowest level of abstraction (from which information
and knowledge are derived).
Data

Unstructured or
Structured
Semi-Structured

Categorical Numerical Textual Multimedia HTML/XML

Nominal Ordinal Interval Ratio


AMITY GLOBAL
BUSINESS SCHOOL Chandigarh

What Does DM Do? How Does it Work?


• DM extract patterns from data
– Pattern? A mathematical (numeric and/or
symbolic) relationship among data items

• Types of patterns
– Association
– Prediction
– Cluster (segmentation)
– Sequential (or time series) relationships
AMITY GLOBAL
BUSINESS SCHOOL Chandigarh

Data Mining Applications


• Customer Relationship Management
– Maximize return on marketing campaigns
– Improve customer retention (churn analysis)
– Maximize customer value (cross-, up-selling)
– Identify and treat most valued customers

• Banking & Other Financial


– Automate the loan application process
– Detecting fraudulent transactions
– Maximize customer value (cross-, up-selling)
– Optimizing cash reserves with forecasting
AMITY GLOBAL
BUSINESS SCHOOL Chandigarh

Data Mining Applications (cont.)


• Retailing and Logistics
– Optimize inventory levels at different locations
– Improve the store layout and sales promotions
– Optimize logistics by predicting seasonal effects
– Minimize losses due to limited shelf life

• Manufacturing and Maintenance


– Predict/prevent machinery failures
– Identify anomalies in production systems to optimize
the use manufacturing capacity
– Discover novel patterns to improve product quality
AMITY GLOBAL
BUSINESS SCHOOL Chandigarh

Data Mining Applications (cont.)


• Brokerage and Securities Trading
– Predict changes on certain bond prices
– Forecast the direction of stock fluctuations
– Assess the effect of events on market movements
– Identify and prevent fraudulent activities in trading

• Insurance
– Forecast claim costs for better business planning
– Determine optimal rate plans
– Optimize marketing to specific customers
– Identify and prevent fraudulent claim activities
AMITY GLOBAL
BUSINESS SCHOOL Chandigarh

Data Mining Applications (cont.)


• Computer hardware and software
• Science and engineering
• Government and defense
• Homeland security and law enforcement
• Travel industry
• Healthcare Increasingly more
popular application areas
• Medicine for data mining
• Entertainment industry
• Sports
• Etc.
AMITY GLOBAL
BUSINESS SCHOOL Chandigarh

Data Mining Process


• A manifestation of best practices
• A systematic way to conduct DM projects
• Different groups has different versions
• Most common standard processes:
– CRISP-DM (Cross-Industry Standard Process
for Data Mining)
– SEMMA (Sample, Explore, Modify, Model, and
Assess)
– KDD (Knowledge Discovery in Databases)
AMITY GLOBAL
BUSINESS SCHOOL Chandigarh

Data Mining Process: CRISP-DM

1 2
Business Data
Understanding Understanding

3
Data
Preparation
Data Sources
6
4
Deployment
Model
Building

5
Testing and
Evaluation
AMITY GLOBAL
BUSINESS SCHOOL Chandigarh

Data Mining Process: CRISP-DM


Step 1: Business Understanding Accounts for
~85% of total
Step 2: Data Understanding project time
Step 3: Data Preparation (!)
Step 4: Model Building
Step 5: Testing and Evaluation
Step 6: Deployment
• The process is highly repetitive and
experimental (DM: art versus science?)
AMITY GLOBAL
BUSINESS SCHOOL Chandigarh

Data Preparation – A Critical DM Task


Real-world
Data

· Collect data
Data Consolidation · Select data
· Integrate data

· Impute missing values


Data Cleaning · Reduce noise in data
· Eliminate inconsistencies

· Normalize data
Data Transformation · Discretize/aggregate data
· Construct new attributes

· Reduce number of variables


Data Reduction · Reduce number of cases
· Balance skewed data

Well-formed
Data
AMITY GLOBAL
BUSINESS SCHOOL Chandigarh

Data Mining Process: SEMMA


Sample
(Generate a representative
sample of the data)

Assess Explore
(Evaluate the accuracy and (Visualization and basic
usefulness of the models) description of the data)

SEMMA

Model Modify
(Use variety of statistical and (Select variables, transform
machine learning models ) variable representations)
AMITY GLOBAL
BUSINESS SCHOOL Chandigarh

Data Mining Methods: Classification


• Most frequently used DM method
• Part of the machine-learning family
• Employ supervised learning
• Learn from past data, classify new data
• The output variable is categorical (nominal or
ordinal) in nature
• Classification versus regression?
• Classification versus clustering?
AMITY GLOBAL
BUSINESS SCHOOL Chandigarh

Assessment Methods for Classification


• Predictive accuracy
– Hit rate
• Speed
– Model building; predicting
• Robustness
• Scalability
• Interpretability
– Transparency, explainability
AMITY GLOBAL
BUSINESS SCHOOL Chandigarh

Cluster Analysis for Data Mining


• Used for automatic identification of
natural groupings of things
• Part of the machine-learning family
• Employ unsupervised learning
• Learns the clusters of things from past
data, then assigns new instances
• There is not an output variable
• Also known as segmentation
AMITY GLOBAL
BUSINESS SCHOOL Chandigarh

Cluster Analysis for Data Mining


• Clustering results may be used to
– Identify natural groupings of customers
– Identify rules for assigning new cases to classes
for targeting/diagnostic purposes
– Provide characterization, definition, labeling of
populations
– Decrease the size and complexity of problems for
other data mining methods
– Identify outliers in a specific domain (e.g., rare-
event detection)
AMITY GLOBAL
BUSINESS SCHOOL Chandigarh

Association Rule Mining


• A very popular DM method in business
• Finds interesting relationships (affinities) between
variables (items or events)
• Part of machine learning family
• Employs unsupervised learning
• There is no output variable
• Also known as market basket analysis
• Often used as an example to describe DM to
ordinary people, such as the famous “relationship
between diapers and beers!”
AMITY GLOBAL
BUSINESS SCHOOL Chandigarh

Association Rule Mining


• Input: the simple point-of-sale transaction data
• Output: Most frequent affinities among items
• Example: according to the transaction data…
“Customer who bought a lap-top computer and a
virus protection software, also bought extended
service plan 70 percent of the time."
• How do you use such a pattern/knowledge?
– Put the items next to each other
– Promote the items as a package
– Place items far apart from each other!
AMITY GLOBAL
BUSINESS SCHOOL Chandigarh

Association Rule Mining


• A representative applications of association rule
mining include
– In business: cross-marketing, cross-selling, store
design, catalog design, e-commerce site design,
optimization of online advertising, product pricing, and
sales/promotion configuration
– In medicine: relationships between symptoms and
illnesses; diagnosis and patient characteristics and
treatments (to be used in medical DSS); and genes and
their functions (to be used in genomics projects)
–…
R (245)

AMITY GLOBAL Excel (238)


Rapid-I RapidMiner (213)

BUSINESS SCHOOL KNIME (174)


Weka / Pentaho (118) Chandigarh
StatSoft Statistica (112)

Data Mining Software SAS (101)


Rapid-I RapidAnalytics (83)
MATLAB (80)
IBM SPSS Statistics (62)
• Commercial IBM SPSS Modeler (54)
SAS Enterprise Miner (46)

– IBM SPSS Modeler Orange (42)


Microsoft SQL Server (40)

(formerly Clementine) Other free software (39)


TIBCO Spotfire / S+ / Miner (37)
– SAS - Enterprise Miner Tableau (35)
Oracle Data Miner (35)

– IBM - Intelligent Miner Other commercial software (32)


JMP (32)
Mathematica (23)
– StatSoft – Statistica Data Miner3D (19)

Miner IBM Cognos (16)


Stata (15)
Zementis (14)
– … many more KXEN (14)
Bayesia (14)
• Free and/or Open Source C4.5/C5.0/See5 (13)
Revolution Computing (11)
– R Salford SPM/CART/MARS/TreeNet/RF (9)
XLSTAT (7)

– RapidMiner SAP (BusinessObjects/Sybase/Hana)(7)


Angoss (7)

– Weka… RapidInsight/Veera (5)
Teradata Miner (4)
11 Ants Analytics (4)
WordStat (3)
Predixion Software (3)

Source: KDNuggets.com 0 50 100 150 200 250 300


AMITY GLOBAL
BUSINESS SCHOOL Chandigarh

Data Mining Myths


• Data mining …
– provides instant solutions/predictions
– is not yet viable for business applications
– requires a separate, dedicated database
– can only be done by those with advanced
degrees
– is only for large firms that have lots of
customer data
– is another name for the good-old statistics
AMITY GLOBAL
BUSINESS SCHOOL Chandigarh
Common Data Mining Blunders
1. Selecting the wrong problem for data mining
2. Ignoring what your sponsor thinks data mining is and what it really
can/cannot do
3. Not leaving insufficient time for data acquisition, selection and
preparation
4. Looking only at aggregated results and not at individual
records/predictions
5. Being sloppy about keeping track of the data mining procedure and
results
6. Ignoring suspicious findings and quickly moving on .
7. Running mining algorithms repeatedly and blindly. It is important to
think hard about the next stage of data analysis. Data mining is a very
hands-on activity.
8. Believing everything you are told about the data.
9. Believing everything you are told about your own data mining analysis.
10. Measuring your results differently from the way your sponsor measures
them.
AMITY GLOBAL
BUSINESS SCHOOL Chandigarh

Neural Network Concepts


• Neural networks (NN): a brain metaphor for
information processing
• Neural computing
• Artificial neural network (ANN)
• Many uses for ANN for
– pattern recognition, forecasting, prediction, and
classification
• Many application areas
– finance, marketing, manufacturing, operations,
information systems, and so on
AMITY GLOBAL
BUSINESS SCHOOL Chandigarh

Biological Neural Networks


Dendrites
Synapse
Synapse

Axon

Axon

Dendrites Soma
Soma

• Two interconnected brain cells (neurons)


AMITY GLOBAL
BUSINESS SCHOOL Chandigarh

Processing Information in ANN


Inputs Weights Outputs

x1
w1 Y1

x2 w2 Neuron (or PE) f (S )


. S  
n
X iW
Y
. Y2
. i 1
i

.
. Summation
Transfer
.
Function
wn Yn
xn

• A single neuron (processing element – PE)


with inputs and outputs
AMITY GLOBAL
BUSINESS SCHOOL Chandigarh

Biology Analogy
AMITY GLOBAL
BUSINESS SCHOOL Chandigarh

Text Mining Concepts


• 85-90 percent of all corporate data is in some
kind of unstructured form (e.g., text)
• Unstructured corporate data is doubling in size
every 18 months
• Tapping into these information sources is not an
option, but a need to stay competitive
• Answer: text mining
– A semi-automated process of extracting knowledge
from unstructured data sources
– a.k.a. text data mining or knowledge discovery in
textual databases
AMITY GLOBAL
BUSINESS SCHOOL Chandigarh

Data Mining versus Text Mining


• Both seek for novel and useful patterns
• Both are semi-automated processes
• Difference is the nature of the data:
– Structured versus unstructured data
– Structured data: in databases
– Unstructured data: Word documents, PDF
files, text excerpts, XML files, and so on
• Text mining – first, impose structure to
the data, then mine the structured data.
AMITY GLOBAL
BUSINESS SCHOOL Chandigarh

Text Mining Concepts


• Benefits of text mining are obvious, especially in
text-rich data environments
– e.g., law (court orders), academic research (research
articles), finance (quarterly reports), medicine
(discharge summaries), biology (molecular
interactions), technology (patent files), marketing
(customer comments), etc.
• Electronic communication records (e.g., Email)
– Spam filtering
– Email prioritization and categorization
– Automatic response generation
AMITY GLOBAL
BUSINESS SCHOOL Chandigarh

Text Mining Application Area


• Information extraction
• Topic tracking
• Summarization
• Categorization
• Clustering
• Concept linking
• Question answering
AMITY GLOBAL
BUSINESS SCHOOL Chandigarh

Text Mining Terminology


• Unstructured or semi-structured data
• Corpus (and corpora)
• Terms
• Concepts
• Stemming
• Stop words (and include words)
• Synonyms (and polysemes)
• Tokenizing
AMITY GLOBAL
BUSINESS SCHOOL Chandigarh

Text Mining Terminology


• Term dictionary
• Word frequency
• Part-of-speech tagging
• Morphology
• Term-by-document matrix
– Occurrence matrix
• Singular value decomposition
– Latent semantic indexing
AMITY GLOBAL
BUSINESS SCHOOL Chandigarh

Natural Language Processing (NLP)


• Structuring a collection of text
– Old approach: bag-of-words
– New approach: natural language processing
• NLP is …
– a very important concept in text mining
– a subfield of artificial intelligence and computational
linguistics
– the studies of "understanding" the natural human
language
• Syntax versus semantics-based text mining
AMITY GLOBAL
BUSINESS SCHOOL Chandigarh

Natural Language Processing (NLP)


• What is “Understanding” ?
– Human understands, what about
computers?
– Natural language is vague, context driven
– True understanding requires extensive
knowledge of a topic

– Can/will computers ever understand natural


language the same/accurate way we do?
AMITY GLOBAL
BUSINESS SCHOOL Chandigarh

Natural Language Processing (NLP)


• Challenges in NLP
– Part-of-speech tagging
– Text segmentation
– Word sense disambiguation
– Syntax ambiguity
– Imperfect or irregular input
– Speech acts

• Dream of AI community
– to have algorithms that are capable of automatically
reading and obtaining knowledge from text
AMITY GLOBAL
BUSINESS SCHOOL Chandigarh

Natural Language Processing (NLP)


• WordNet
– A laboriously hand-coded database of English words,
their definitions, sets of synonyms, and various
semantic relations between synonym sets.
– A major resource for NLP.
– Need automation to be completed.
• Sentiment Analysis
– A technique used to detect favorable and unfavorable
opinions toward specific products and services
– SentiWordNet
AMITY GLOBAL
BUSINESS SCHOOL Chandigarh

NLP Task Categories


• Information retrieval, information extraction
• Named-entity recognition
• Question answering
• Automatic summarization
• Natural language generation & understanding
• Machine translation
• Foreign language reading & writing
• Speech recognition
• Text proofing, optical character recognition
AMITY GLOBAL
BUSINESS SCHOOL Chandigarh

Text Mining Applications


• Marketing applications
– Enables better CRM
• Security applications
– ECHELON, OASIS
– Deception detection (…)
• Medicine and biology
– Literature-based gene identification (…)
• Academic applications
– Research stream analysis
AMITY GLOBAL
BUSINESS SCHOOL Chandigarh

Web Mining Overview


• Web is the largest repository of data
• Data is in HTML, XML, text format
• Challenges (of processing Web data)
– The Web is too big for effective data mining
– The Web is too complex
– The Web is too dynamic
– The Web is not specific to a domain
– The Web has everything
• Opportunities and challenges are great!
AMITY GLOBAL
BUSINESS SCHOOL Chandigarh

Web Mining
• Web mining (or Web data mining) is the
process of discovering intrinsic relationships
from Web data (textual, linkage, or usage)
• Is it the same as data mining on data
generated on the Internet?
• Web data?
– Content, Link, Log, …
• Web Mining versus Web Analytics
– Look at the simple taxonomy on the next slide
AMITY GLOBAL
BUSINESS SCHOOL Chandigarh

Web Mining
Data Text
Mining Mining

WEB MINING

Web Content Mining Web Structure Mining Web Usage Mining


Source: unstructured Source: the unified Source: the detailed
textual content of the resource locator (URL) description of a Web
Web pages (usually in links contained in the site’s visits (sequence
HTML format) Web pages of clicks by sessions)

Search Engines Sentiment Analysis Semantic Webs Web Analytics

Page Rank Information Retrieval Graph Mining Social Analytics Clickstream Analysis

Search Engines Optimization Social Network Analysis Social Media Analytics Log Analysis

Marketing Attribution Customer Analytics 360 Customer View


AMITY GLOBAL
BUSINESS SCHOOL Chandigarh

Web Content/Structure Mining


• Mining the textual content on the Web
• Data collection via Web Crawlers/Spiders

• Web pages include hyperlinks


– Authoritative pages
– Hubs
– hyperlink-induced topic search (HITS) alg.
AMITY GLOBAL
BUSINESS SCHOOL Chandigarh

Business Performance
Management (BPM)
• Business Performance Management (BPM) is…
A real-time system that alerts managers to
potential opportunities, impending problems
and threats, and then empowers them to react
through models and collaboration.
• Also called corporate performance management
(CPM by Gartner Group), enterprise
performance management (EPM by Oracle),
strategic enterprise management (SEM by SAP)
AMITY GLOBAL
BUSINESS SCHOOL Chandigarh

Business Performance Management (BPM)


• BPM refers to the business processes,
methodologies, metrics, and technologies used by
enterprises to measure, monitor, and manage
business performance.
• BPM encompasses three key components
– A set of integrated, closed-loop management and
analytic processes, supported by technology …
– Tools for businesses to define strategic goals and then
measure/manage performance against them
– Methods and tools for monitoring key performance
indicators (KPIs), linked to organizational strategy
AMITY GLOBAL
BUSINESS SCHOOL Chandigarh

A Closed-Loop Process to Optimize


Business Performance
• Process Steps
1. Strategize
2. Plan
3. Monitor/analyze
4. Act/adjust

Each with its own


process steps
AMITY GLOBAL
BUSINESS SCHOOL Chandigarh

Strategize:
Where Do We Want to Go?
• Strategic planning
– Common tasks for the strategic planning
process:
1. Conduct a current situation analysis
2. Determine the planning horizon
3. Conduct an environment scan
4. Identify critical success factors
5. Complete a gap analysis
6. Create a strategic vision
7. Develop a business strategy
8. Identify strategic objectives and goals
AMITY GLOBAL
BUSINESS SCHOOL Chandigarh

Plan:
How Do We Get There?
• Operational planning
– Operational plan: plan that translates an
organization’s strategic objectives and goals
into a set of well-defined tactics and
initiatives, resources requirements, and
expected results for some future time period
(usually a year).
• Operational planning can be
• Tactic-centric (operationally focused)
• Budget-centric plan (financially focused)
AMITY GLOBAL
BUSINESS SCHOOL Chandigarh
Monitor/Analyze:
How Are We Doing?
• A comprehensive framework for
monitoring performance should
address two key issues:
– What to monitor?
• Critical success factors
• Strategic goals and targets
•…
– How to monitor?
•…
AMITY GLOBAL
BUSINESS SCHOOL Chandigarh

Act and Adjust: What Do We


Need to Do Differently?
• Success (or mere survival) depends on new
projects: creating new products, entering
new markets, acquiring new customers (or
businesses), or streamlining some process.
• Many new projects and ventures fail!
• What is the chance of failure?
– 60% of Hollywood movies fail
– 70% of large IT projects fail, …

You might also like