0% found this document useful (0 votes)

17 views49 pages

LECTURE 3-BDM 411 Data Analytics and BIG Data

Uploaded by

Saadie Essie

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views49 pages

LECTURE 3-BDM 411 Data Analytics and BIG Data

Uploaded by

Saadie Essie

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 49

BDM 411:BUSINESS

INTELLIGENCE AND
ANALYTICS
LECTURE 3: BIG DATA ANALYTICS AND DATA MINING
INTRODUCTION TO BIG DATA ANALYTICS
INTRODUCTION TO BIG DATA ANALYTICS

Definition:

Refers to datasets whose size is

beyond the ability of typical
database software tools to capture,
store, manage and analyze.
Sedkaouli S. (2018)
INTRODUCTION TO BIG DATA
 Big Data is created digitally and collected
automatically

 Large amounts of data are collected and

organized to benefit an organization and
their user clients

 Big Data Resources are built from scratch.

No data and no big data technologies exist
before big data
Sedkaouli S. (2018)
Business Analytics, BI, Big Data,
Data Mining - What’s the difference?
 Business Analytics – Tools to explore past data to
gain insight into future business decisions.
 BI– Tools and techniques to turn data into meaningful
information.
 BigData –data sets that are so large or complex that
traditional data processing applications are
inadequate.
 Data Mining - Tools for discovering
patterns in large data sets.
INTRODUCTION TO BIG DATA
INTRODUCTION TO BIG DATA
BIG DATA MARKET REVENUE
BIG DATA ANALYTICS PROCESS AND OBJECTIGES
CHARACTERISTICS OF BIG DATA
STAGES OF BIG DATA ANALYTICS
TOOLS USED IN BIG DATA ANALYTICS

Characteristics of Data for Good Decision Making

Source: speakingdata blog

LECTURE 3:
INTRODUCTION DATA MINING
OUTLINE
Tasks in Data mining
Data mining Process
Classification of Data mining
Systems
Major issues in Data mining
Introduction

Data Mining is a process of discovering

various models, summaries, and derived values
from a given collection of data.

The general experimental procedure adapted to

data-mining problems involves the following
steps:
Tasks in Data mining
Data mining involves six common classes of tasks:
Anomaly detection (Outlier/change/deviation detection) – The
identification of unusual data records, that might be interesting or data
errors that require further investigation.

Association rule learning (Dependency modelling) – Searches for

relationships between variables. For example a supermarket might gather
data on customer purchasing habits. Using association rule learning, the
supermarket can determine which products are frequently bought together
and use this information for marketing purposes. This is sometimes
referred to as market basket analysis.

Clustering – is the task of discovering groups and structures in the data

that are in some way or another "similar", without using known structures
in the data.
Tasks in Data mining


Classification – is the task of generalizing known structure to
apply to new data. For example, an e-mail program might
attempt to classify an e-mail as "legitimate" or as "spam".
 Regression – attempts to find a function which models the
data with the least error.
 Summarization – providing a more compact representation of
the data set, including visualization and report generation.
Data mining Process
State the problem and formulate the hypothesis

 Gathering domain-specific knowledge and experience

are usually necessary in order to come up with a
meaningful problem statement.
 The modeler usually specifies a set of variables for the
unknown dependency and, if possible, a general form
of this dependency as an initial hypothesis.
 Closeinteraction between the data-mining expert and
the application expert.
Data mining Process
2. Collect the data
 Concerned with Generation and gathering of data
In general, there are two distinct possibilities:
The first is when the data-generation process is under
the control of an expert (modeler): this approach is
known as a designed experiment.
The second possibility is when the expert cannot
influence the data- generation process: this is known as
the observational approach.
Data mining Process
3. Preprocessing the data
In the observational setting, data are usually "collected" from
the existing databses, data warehouses, and data marts. Data
preprocessing usually includes at least two common tasks:

 Outlier detection (and removal) – Outliers are unusual

data values that are not consistent with most observations.

 Commonly, outliers result from measurement errors,

coding and recording errors, and, sometimes, are natural,
abnormal values.
Data mining Process
3. Preprocessing the data Cont’d
Scaling, encoding, and selecting features – Data
preprocessing includes several steps such as variable
scaling and different types of encoding.

 For
example, one feature with the range [0, 1] and the
other with the range [−100, 1000] will not have the
same weights in the applied technique;
Data mining Process
4. Estimate the model
The selection and implementation of the
appropriate data-mining technique is the main
task in this phase.

This process is not straightforward; usually, in

practice, the implementation is based on several
models, and selecting the best one is an
additional task.
Data mining Process
5. Interpret the model and draw conclusions
Modern data-mining methods are expected to
yield highly accurate results using high
dimensional models.

The problem of interpreting these models, also

very important, is considered a separate task,
with specific techniques to validate the results.
Data mining Process
Data Cleaning techniques
Data Integration techniques
Data Transformation techniques
Data Reduction techniques
Data Preprocessing Techniques

Data Cleaning techniques

Data Integration techniques
Data Transformation techniques
Data Reduction techniques
Data Preprocessing Techniques-Integration
Data Integration:
It combines data from multiple sources into a coherent
data store, as in data warehousing.
These sources may include multiple databases, data
cubes, or flat files.
Data Preprocessing Techniques-Integration
Issues in Data integration:
Schema integration and object matching: How can the data analyst or the
computer be sure that customer id in one database and customer number in
another reference to the same attribute.
Redundancy:
An attribute (such as annual revenue, for instance) may be redundant if it
can be derived from another attribute or set of attributes. Inconsistencies in
attribute or dimension naming can also cause redundancies in the resulting
data set.
Detection and resolution of data value conflicts:
For the same real-world entity, attribute values from different sources may
differ.
Data Preprocessing Techniques-Data Transformation
In data transformation, the data are transformed or consolidated into
forms appropriate for mining.

Data transformation can involve the following:

Smoothing, this works to remove noise from the data. Such techniques
include binning, regression, and clustering

Aggregation, where summary or aggregation operations are applied to the

data. For example, the daily sales data may be aggregated so as to
compute monthly and annual total amounts. This step is typically used in
constructing a data cube for analysis of the data at multiple granularities.
Data Preprocessing Techniques-Data Transformation
 Generalization of the data, where low-level or ―primitive‖ (raw) data
are replaced by higher-level concepts through the use of concept
hierarchies. For example, categorical attributes, like street, can be
generalized to higher-level concepts, like city or country.

 Normalization, where the attribute data are scaled so as to fall within a

small specified range, such as 1:0 to 1:0, or 0:0 to 1:0.

 Attribute construction (or feature construction),where new attributes

are constructed and added from the given set of attributes to help the
mining process.
Data Preprocessing Techniques-Data Reduction
Data reduction techniques can be applied to obtain a
reduced representation of the data set that is much
smaller in volume, yet closely maintains the integrity of
the original data.

That is, mining on the reduced data set should be more

efficient yet produce the same (or almost the same)
analytical results.
Data Preprocessing Techniques-Data Reduction
 Strategies for data reduction include the following:
 Data cube aggregation, where aggregation operations are
applied to the data in the construction of a data cube.
 Attribute subset selection, where irrelevant, weakly relevant,
or redundant attributes or dimensions may be detected and
removed.
 Dimensionality reduction, where encoding mechanisms are
used to reduce the dataset size.
 Numerosity reduction, where the data are replaced or
estimated by alternative, smaller data representations such as
parametric models (which need store only the model parameters
instead of the actual data) or nonparametric methods such as
clustering, sampling, and the use of histograms.
Classification of Data mining Systems

The data mining system can be classified

according to the following criteria:
Database Technology
Statistics
Machine Learning
Information Science
Visualization
Other Disciplines
Data mining Classifications

Some Other Classification Criteria:

 Classification according to kind of databases mined

 Classification according to kind of knowledge mined
 Classification according to kinds of techniques
utilized
 Classification according to applications adapted
Data mining Classifications
Classification according to kind of databases mined
We can classify the data mining system according to kind
of databases mined.
Database system can be classified according to different
criteria such as data models, types of data etc. And the
data mining system can be classified accordingly.

For example if we classify the database according to data

model then we may have a relational, transactional,
object- relational, or data warehouse mining system.
Data mining Classifications
Classification according to kind of knowledge mined
We can classify the data mining system according to kind of
knowledge mined. It is means data mining system are
classified on the basis of functionalities such as:
Characterization
Discrimination
Association and Correlation Analysis
Classification
Prediction
Clustering
Outlier Analysis
Evolution Analysis
Data mining Classifications
Classification according to kinds of techniques
utilized
 We can classify the data mining system according to
kind of techniques used.

 We can describes these techniques according to degree

of user interaction involved or the methods of analysis
employed.
Data mining Classifications
Classification according to applications
adapted
We can classify the data mining system according
to application adapted. These applications are as
follows:
Finance
Telecommunications
DNA
Stock Markets
E-mail
Major issues in Data Mining
 Miningdifferent kinds of knowledge in
databases: -
The need of different users is not the same. And Different user may be in
interested in different kind of knowledge. Therefore it is necessary for
data mining to cover broad range of knowledge discovery task.

 Interactive mining of knowledge at multiple

levels of abstraction: - The data mining process needs to
be interactive because it allows users to focus the search for
patterns, providing and refining data mining requests based on
returned results.
Major issues in Data Mining
Incorporation of background knowledge: -
To guide discovery process and to express the
discovered patterns, the background knowledge can be
used. Background knowledge may be used to express
the discovered patterns not only in concise terms but at
multiple level of abstraction.

Data mining query languages and ad hoc data

mining:-
Data Mining Query language that allows the user to
describe ad hoc mining tasks, should be integrated with
a data warehouse query language and optimized for
Major issues in Data Mining
 Presentation and visualization of data mining
results:- Once the patterns are discovered it needs to
be expressed in high level languages, visual
representations. These representations should be
easily understandable by the users.

 Handling noisy or incomplete data: - The data

cleaning methods are required that can handle the
noise, incomplete objects while mining the data
regularities. If data cleaning methods are not there
then the accuracy of the discovered patterns will be
poor.
Major issues in Data Mining
 Pattern evaluation:- It refers to
interestingness of the problem. The patterns
discovered should be interesting because either
they represent common knowledge or lack
novelty.

 Efficiency and scalability of data mining

algorithms:- In order to effectively extract the
information from huge amount of data in
databases, data mining algorithm must be
Businesses Need Support for Decision Making

 Uncertain economics
 Rapidly changing environments
 Global competition
 Demanding customers
 Taking
advantage of information acquired by
companies is a Critical Success Factor.
The Information Gap

 The shortfall between gathering information and using it for decision

making.
 Firms have inadequate data warehouses.
 Business Analysts spend 2 days a week gathering and formatting data,
instead of performing analysis. (Data Warehousing Institute).
 Business Intelligence (BI) seeks to bridge the information gap.
Summary
 Explained BI, Analytics, Data Marts and Big Data.
 Defined the characteristics of data for good decision making.
 Described data mining in detail.
 Explained and gave examples of
market basket and cluster analysis.
REVIEW QUESTIONS
1. Discuss the relationship between Digital transformation
and Business Intelligence
2. Summarize the benefits of Business intelligence in an
organization in the 21st Century
3. Differentiate Digital transformation and Business
Intelligence
4. Discuss the Porters Five Forces in relation to the Business
intelligence
5. Choose one Sector of Economy in Kenya and discuss how
Business Intelligence has been used to achieve efficiency
in Operations
CASE STUDY

From the Case Study:

ADDRESSING 4 CORE BUSINESS INTELLIGENCE
CHALLENGES ON THE SEARCH FOR ACTIONABLE
INSIGHTS By Jennifer Bresnick

1. Discussthe four Core Business Intelligence

Challenges faced by the organizations today.
2. Proposethe specific solutions to the above
challenges

Integrating+LLMs+into+AI-Driven+Supply+Chains
No ratings yet
Integrating+LLMs+into+AI-Driven+Supply+Chains
35 pages
DATA MINING Notes (Upate)
No ratings yet
DATA MINING Notes (Upate)
25 pages
BDA Class1
No ratings yet
BDA Class1
33 pages
Unit-1 PPT
No ratings yet
Unit-1 PPT
21 pages
DATA Mining
No ratings yet
DATA Mining
21 pages
Unit 3
No ratings yet
Unit 3
34 pages
Unit 1 DMW
No ratings yet
Unit 1 DMW
41 pages
Data Mining Module 2
No ratings yet
Data Mining Module 2
23 pages
Data Mining Mod 1 Notes
No ratings yet
Data Mining Mod 1 Notes
25 pages
Data Mining Notes
No ratings yet
Data Mining Notes
25 pages
Unit 3 Data Warehousing and Data Mining
No ratings yet
Unit 3 Data Warehousing and Data Mining
7 pages
Data Mining
No ratings yet
Data Mining
44 pages
Data Mining and Warehousing-1
No ratings yet
Data Mining and Warehousing-1
43 pages
IOT-Domain Analyst
No ratings yet
IOT-Domain Analyst
23 pages
Dta Mining
No ratings yet
Dta Mining
15 pages
Unit - 2
No ratings yet
Unit - 2
17 pages
Chapter 1 - What Is Data Mining
No ratings yet
Chapter 1 - What Is Data Mining
8 pages
Data Mining & Data Warehousing
No ratings yet
Data Mining & Data Warehousing
62 pages
DATA MINING Notes
No ratings yet
DATA MINING Notes
37 pages
Data Mining - Digital Notes (Unit I To V)
No ratings yet
Data Mining - Digital Notes (Unit I To V)
85 pages
Data Mining Notes
No ratings yet
Data Mining Notes
14 pages
Down 2
No ratings yet
Down 2
61 pages
DWM Notes Class by Proff
No ratings yet
DWM Notes Class by Proff
88 pages
Lecture 1428550844
No ratings yet
Lecture 1428550844
87 pages
Data Mining - KTUweb PDF
No ratings yet
Data Mining - KTUweb PDF
82 pages
Data Mining-CH5
No ratings yet
Data Mining-CH5
49 pages
Unit-I Data Mining
No ratings yet
Unit-I Data Mining
28 pages
Data Mining Notes
No ratings yet
Data Mining Notes
82 pages
Unit-4 Introduction To Data Mining
No ratings yet
Unit-4 Introduction To Data Mining
26 pages
DMDW Lecture Notes
No ratings yet
DMDW Lecture Notes
24 pages
Introduction To Data Mining For Business Analytics
No ratings yet
Introduction To Data Mining For Business Analytics
51 pages
Assignment 2
No ratings yet
Assignment 2
5 pages
Data Mining
100% (1)
Data Mining
18 pages
DataMining S
No ratings yet
DataMining S
103 pages
Course Manual On Data Mining - CSC 425 - 015446
No ratings yet
Course Manual On Data Mining - CSC 425 - 015446
44 pages
Unit III DWDM
No ratings yet
Unit III DWDM
113 pages
Data Preprocessing
No ratings yet
Data Preprocessing
8 pages
Datawarehouse&Data Mining - ALL
No ratings yet
Datawarehouse&Data Mining - ALL
46 pages
DM Notes-1
No ratings yet
DM Notes-1
71 pages
What Is Business Analytics?: Predictive Analytics Descriptive Analytics Prescriptive Analytics
No ratings yet
What Is Business Analytics?: Predictive Analytics Descriptive Analytics Prescriptive Analytics
35 pages
Advance Database With Lab: Professor & Head (Department of Software Engineering)
No ratings yet
Advance Database With Lab: Professor & Head (Department of Software Engineering)
5 pages
Chapter 3-IB
No ratings yet
Chapter 3-IB
69 pages
1712060004 (1)
No ratings yet
1712060004 (1)
25 pages
DM Module1
No ratings yet
DM Module1
15 pages
Bi Lesson 6
No ratings yet
Bi Lesson 6
36 pages
Unit 3 DW
No ratings yet
Unit 3 DW
19 pages
Chapter-1 - Introduction To Data Mining
No ratings yet
Chapter-1 - Introduction To Data Mining
10 pages
Unit-1 Introduction To Data Mining
No ratings yet
Unit-1 Introduction To Data Mining
33 pages
Fundamentals of Data Science Notes (Module - 1)
No ratings yet
Fundamentals of Data Science Notes (Module - 1)
19 pages
DW and DM Notes
No ratings yet
DW and DM Notes
89 pages
Data Mining and Data Analysis UNIT-1 Notes For Print
No ratings yet
Data Mining and Data Analysis UNIT-1 Notes For Print
22 pages
DM Notes
No ratings yet
DM Notes
91 pages
Data Mining - Preprocessing
No ratings yet
Data Mining - Preprocessing
77 pages
Data Mining
No ratings yet
Data Mining
6 pages
Unit 2: Big Data Analytics
No ratings yet
Unit 2: Big Data Analytics
45 pages
Data Mining
No ratings yet
Data Mining
15 pages
Question Bank DMC
No ratings yet
Question Bank DMC
28 pages
Notes For DMDWH - Module1
No ratings yet
Notes For DMDWH - Module1
21 pages
M.E.-ISE-2023-25-60 PIS E31-RSA-Best Practices in Data Mining
No ratings yet
M.E.-ISE-2023-25-60 PIS E31-RSA-Best Practices in Data Mining
3 pages
Unit III
No ratings yet
Unit III
101 pages
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Attorney Misconduct Case Study Complete
No ratings yet
Attorney Misconduct Case Study Complete
2 pages
BCT 321 Assignment 2 - 7.7.2024
No ratings yet
BCT 321 Assignment 2 - 7.7.2024
1 page
Lesson 5 Exploratory Data Analysis
No ratings yet
Lesson 5 Exploratory Data Analysis
10 pages
Topic 15
No ratings yet
Topic 15
21 pages
Multidimensional Poverty
No ratings yet
Multidimensional Poverty
3 pages
Topic 7-Python Functions Modules
No ratings yet
Topic 7-Python Functions Modules
16 pages
Lecture 8 - Artificial Neural Networks
No ratings yet
Lecture 8 - Artificial Neural Networks
41 pages
Between Adolescence and Adulthood Expectations Abo
No ratings yet
Between Adolescence and Adulthood Expectations Abo
22 pages
Lesson 4 Exploring Agricultural Insights With Anova in Python
No ratings yet
Lesson 4 Exploring Agricultural Insights With Anova in Python
9 pages
Introduction To DAX in Power BI
100% (1)
Introduction To DAX in Power BI
18 pages
BSD 321 User Centered Design Case Study
No ratings yet
BSD 321 User Centered Design Case Study
2 pages
Final Year Project Work - Nishanth
No ratings yet
Final Year Project Work - Nishanth
69 pages
Posture Detection System
No ratings yet
Posture Detection System
31 pages
Big Data Analytics
No ratings yet
Big Data Analytics
19 pages
Final Review Paper 1
No ratings yet
Final Review Paper 1
19 pages
Financial Fraud Detection Using Machine Learning - Final Report With Acceptance Index and Plag Report
No ratings yet
Financial Fraud Detection Using Machine Learning - Final Report With Acceptance Index and Plag Report
95 pages
Data Mining - Lab 1
No ratings yet
Data Mining - Lab 1
4 pages
Thesis Proposal 03
No ratings yet
Thesis Proposal 03
17 pages
Final Year Project
No ratings yet
Final Year Project
24 pages
Gen AI
No ratings yet
Gen AI
11 pages
MID TERM Medicine Recommended System Report
No ratings yet
MID TERM Medicine Recommended System Report
43 pages
SYNOPSIS
No ratings yet
SYNOPSIS
28 pages
FATURA: A Multi-Layout Invoice Image Dataset For Document Analysis and Understanding
No ratings yet
FATURA: A Multi-Layout Invoice Image Dataset For Document Analysis and Understanding
19 pages
KGiSL Institute of Technolog (Final)
No ratings yet
KGiSL Institute of Technolog (Final)
31 pages
Data Visualization Using Python
No ratings yet
Data Visualization Using Python
79 pages
Image Data Preprocessing
No ratings yet
Image Data Preprocessing
34 pages
(M3S1) Data Analytics Framework
No ratings yet
(M3S1) Data Analytics Framework
12 pages
Drug Dosage Control System Using Reinforcement Learning
No ratings yet
Drug Dosage Control System Using Reinforcement Learning
8 pages
Final Project Report
No ratings yet
Final Project Report
62 pages
Data Science Case Study
No ratings yet
Data Science Case Study
24 pages
ML Assignment 1
No ratings yet
ML Assignment 1
23 pages
Fracture Identification On Facial Bone X-Ray Using Transfer Learning (YOLO V8 Algorithm)
No ratings yet
Fracture Identification On Facial Bone X-Ray Using Transfer Learning (YOLO V8 Algorithm)
11 pages
FinQuiz - Curriculum Note, @InsightSquad Study Session 3, Reading 8
No ratings yet
FinQuiz - Curriculum Note, @InsightSquad Study Session 3, Reading 8
11 pages
Managing Machine Learning Projects Final
No ratings yet
Managing Machine Learning Projects Final
136 pages
Manuscript Updated-1
No ratings yet
Manuscript Updated-1
10 pages
Final Page
No ratings yet
Final Page
75 pages
Unit - 2 Data Minig Notes
No ratings yet
Unit - 2 Data Minig Notes
15 pages
Data Mining Warehousing - Data Mining - Notes
No ratings yet
Data Mining Warehousing - Data Mining - Notes
56 pages
Cloud Architecture Recommendation Using LLM Model (2)
No ratings yet
Cloud Architecture Recommendation Using LLM Model (2)
1 page
R22EF170 - 4th SEM - SDP - Report
No ratings yet
R22EF170 - 4th SEM - SDP - Report
11 pages

LECTURE 3-BDM 411 Data Analytics and BIG Data

Uploaded by

LECTURE 3-BDM 411 Data Analytics and BIG Data

Uploaded by

BDM 411:BUSINESS

Refers to datasets whose size is

 Large amounts of data are collected and

 Big Data Resources are built from scratch.

Source: speakingdata blog

Data Mining is a process of discovering

The general experimental procedure adapted to

Association rule learning (Dependency modelling) – Searches for

Clustering – is the task of discovering groups and structures in the data

 Gathering domain-specific knowledge and experience

 Outlier detection (and removal) – Outliers are unusual

 Commonly, outliers result from measurement errors,

This process is not straightforward; usually, in

The problem of interpreting these models, also

Data Cleaning techniques

Data transformation can involve the following:

Aggregation, where summary or aggregation operations are applied to the

 Normalization, where the attribute data are scaled so as to fall within a

 Attribute construction (or feature construction),where new attributes

That is, mining on the reduced data set should be more

The data mining system can be classified

Some Other Classification Criteria:

 Classification according to kind of databases mined

For example if we classify the database according to data

 We can describes these techniques according to degree

 Interactive mining of knowledge at multiple

Data mining query languages and ad hoc data

 Handling noisy or incomplete data: - The data

 Efficiency and scalability of data mining

 The shortfall between gathering information and using it for decision

From the Case Study:

1. Discussthe four Core Business Intelligence

You might also like