0% found this document useful (0 votes)
17 views49 pages

LECTURE 3-BDM 411 Data Analytics and BIG Data

Uploaded by

Saadie Essie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views49 pages

LECTURE 3-BDM 411 Data Analytics and BIG Data

Uploaded by

Saadie Essie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 49

BDM 411:BUSINESS

INTELLIGENCE AND
ANALYTICS
LECTURE 3: BIG DATA ANALYTICS AND DATA MINING
INTRODUCTION TO BIG DATA ANALYTICS
INTRODUCTION TO BIG DATA ANALYTICS

Definition:

Refers to datasets whose size is


beyond the ability of typical
database software tools to capture,
store, manage and analyze.
Sedkaouli S. (2018)
INTRODUCTION TO BIG DATA
 Big Data is created digitally and collected
automatically

 Large amounts of data are collected and


organized to benefit an organization and
their user clients

 Big Data Resources are built from scratch.


No data and no big data technologies exist
before big data
Sedkaouli S. (2018)
Business Analytics, BI, Big Data,
Data Mining - What’s the difference?
 Business Analytics – Tools to explore past data to
gain insight into future business decisions.
 BI– Tools and techniques to turn data into meaningful
information.
 BigData –data sets that are so large or complex that
traditional data processing applications are
inadequate.
 Data Mining - Tools for discovering
patterns in large data sets.
INTRODUCTION TO BIG DATA
INTRODUCTION TO BIG DATA
BIG DATA MARKET REVENUE
BIG DATA ANALYTICS PROCESS AND OBJECTIGES
CHARACTERISTICS OF BIG DATA
STAGES OF BIG DATA ANALYTICS
TOOLS USED IN BIG DATA ANALYTICS

Characteristics of Data for Good Decision Making

Source: speakingdata blog


LECTURE 3:
INTRODUCTION DATA MINING
OUTLINE
Tasks in Data mining
Data mining Process
Classification of Data mining
Systems
Major issues in Data mining
Introduction

Data Mining is a process of discovering


various models, summaries, and derived values
from a given collection of data.

The general experimental procedure adapted to


data-mining problems involves the following
steps:
Tasks in Data mining
Data mining involves six common classes of tasks:
Anomaly detection (Outlier/change/deviation detection) – The
identification of unusual data records, that might be interesting or data
errors that require further investigation.

Association rule learning (Dependency modelling) – Searches for


relationships between variables. For example a supermarket might gather
data on customer purchasing habits. Using association rule learning, the
supermarket can determine which products are frequently bought together
and use this information for marketing purposes. This is sometimes
referred to as market basket analysis.

Clustering – is the task of discovering groups and structures in the data


that are in some way or another "similar", without using known structures
in the data.
Tasks in Data mining


Classification – is the task of generalizing known structure to
apply to new data. For example, an e-mail program might
attempt to classify an e-mail as "legitimate" or as "spam".
 Regression – attempts to find a function which models the
data with the least error.
 Summarization – providing a more compact representation of
the data set, including visualization and report generation.
Data mining Process
State the problem and formulate the hypothesis

 Gathering domain-specific knowledge and experience


are usually necessary in order to come up with a
meaningful problem statement.
 The modeler usually specifies a set of variables for the
unknown dependency and, if possible, a general form
of this dependency as an initial hypothesis.
 Closeinteraction between the data-mining expert and
the application expert.
Data mining Process
2. Collect the data
 Concerned with Generation and gathering of data
In general, there are two distinct possibilities:
The first is when the data-generation process is under
the control of an expert (modeler): this approach is
known as a designed experiment.
The second possibility is when the expert cannot
influence the data- generation process: this is known as
the observational approach.
Data mining Process
3. Preprocessing the data
In the observational setting, data are usually "collected" from
the existing databses, data warehouses, and data marts. Data
preprocessing usually includes at least two common tasks:

 Outlier detection (and removal) – Outliers are unusual


data values that are not consistent with most observations.

 Commonly, outliers result from measurement errors,


coding and recording errors, and, sometimes, are natural,
abnormal values.
Data mining Process
3. Preprocessing the data Cont’d
Scaling, encoding, and selecting features – Data
preprocessing includes several steps such as variable
scaling and different types of encoding.

 For
example, one feature with the range [0, 1] and the
other with the range [−100, 1000] will not have the
same weights in the applied technique;
Data mining Process
4. Estimate the model
The selection and implementation of the
appropriate data-mining technique is the main
task in this phase.

This process is not straightforward; usually, in


practice, the implementation is based on several
models, and selecting the best one is an
additional task.
Data mining Process
5. Interpret the model and draw conclusions
Modern data-mining methods are expected to
yield highly accurate results using high
dimensional models.

The problem of interpreting these models, also


very important, is considered a separate task,
with specific techniques to validate the results.
Data mining Process
Data Cleaning techniques
Data Integration techniques
Data Transformation techniques
Data Reduction techniques
Data Preprocessing Techniques

Data Cleaning techniques


Data Integration techniques
Data Transformation techniques
Data Reduction techniques
Data Preprocessing Techniques-Integration
Data Integration:
It combines data from multiple sources into a coherent
data store, as in data warehousing.
These sources may include multiple databases, data
cubes, or flat files.
Data Preprocessing Techniques-Integration
Issues in Data integration:
Schema integration and object matching: How can the data analyst or the
computer be sure that customer id in one database and customer number in
another reference to the same attribute.
Redundancy:
An attribute (such as annual revenue, for instance) may be redundant if it
can be derived from another attribute or set of attributes. Inconsistencies in
attribute or dimension naming can also cause redundancies in the resulting
data set.
Detection and resolution of data value conflicts:
For the same real-world entity, attribute values from different sources may
differ.
Data Preprocessing Techniques-Data Transformation
In data transformation, the data are transformed or consolidated into
forms appropriate for mining.

Data transformation can involve the following:


Smoothing, this works to remove noise from the data. Such techniques
include binning, regression, and clustering

Aggregation, where summary or aggregation operations are applied to the


data. For example, the daily sales data may be aggregated so as to
compute monthly and annual total amounts. This step is typically used in
constructing a data cube for analysis of the data at multiple granularities.
Data Preprocessing Techniques-Data Transformation
 Generalization of the data, where low-level or ―primitive‖ (raw) data
are replaced by higher-level concepts through the use of concept
hierarchies. For example, categorical attributes, like street, can be
generalized to higher-level concepts, like city or country.

 Normalization, where the attribute data are scaled so as to fall within a


small specified range, such as 1:0 to 1:0, or 0:0 to 1:0.

 Attribute construction (or feature construction),where new attributes


are constructed and added from the given set of attributes to help the
mining process.
Data Preprocessing Techniques-Data Reduction
Data reduction techniques can be applied to obtain a
reduced representation of the data set that is much
smaller in volume, yet closely maintains the integrity of
the original data.

That is, mining on the reduced data set should be more


efficient yet produce the same (or almost the same)
analytical results.
Data Preprocessing Techniques-Data Reduction
 Strategies for data reduction include the following:
 Data cube aggregation, where aggregation operations are
applied to the data in the construction of a data cube.
 Attribute subset selection, where irrelevant, weakly relevant,
or redundant attributes or dimensions may be detected and
removed.
 Dimensionality reduction, where encoding mechanisms are
used to reduce the dataset size.
 Numerosity reduction, where the data are replaced or
estimated by alternative, smaller data representations such as
parametric models (which need store only the model parameters
instead of the actual data) or nonparametric methods such as
clustering, sampling, and the use of histograms.
Classification of Data mining Systems

The data mining system can be classified


according to the following criteria:
Database Technology
Statistics
Machine Learning
Information Science
Visualization
Other Disciplines
Data mining Classifications

Some Other Classification Criteria:

 Classification according to kind of databases mined


 Classification according to kind of knowledge mined
 Classification according to kinds of techniques
utilized
 Classification according to applications adapted
Data mining Classifications
Classification according to kind of databases mined
We can classify the data mining system according to kind
of databases mined.
Database system can be classified according to different
criteria such as data models, types of data etc. And the
data mining system can be classified accordingly.

For example if we classify the database according to data


model then we may have a relational, transactional,
object- relational, or data warehouse mining system.
Data mining Classifications
Classification according to kind of knowledge mined
We can classify the data mining system according to kind of
knowledge mined. It is means data mining system are
classified on the basis of functionalities such as:
Characterization
Discrimination
Association and Correlation Analysis
Classification
Prediction
Clustering
Outlier Analysis
Evolution Analysis
Data mining Classifications
Classification according to kinds of techniques
utilized
 We can classify the data mining system according to
kind of techniques used.

 We can describes these techniques according to degree


of user interaction involved or the methods of analysis
employed.
Data mining Classifications
Classification according to applications
adapted
We can classify the data mining system according
to application adapted. These applications are as
follows:
Finance
Telecommunications
DNA
Stock Markets
E-mail
Major issues in Data Mining
 Miningdifferent kinds of knowledge in
databases: -
The need of different users is not the same. And Different user may be in
interested in different kind of knowledge. Therefore it is necessary for
data mining to cover broad range of knowledge discovery task.

 Interactive mining of knowledge at multiple


levels of abstraction: - The data mining process needs to
be interactive because it allows users to focus the search for
patterns, providing and refining data mining requests based on
returned results.
Major issues in Data Mining
Incorporation of background knowledge: -
To guide discovery process and to express the
discovered patterns, the background knowledge can be
used. Background knowledge may be used to express
the discovered patterns not only in concise terms but at
multiple level of abstraction.

Data mining query languages and ad hoc data


mining:-
Data Mining Query language that allows the user to
describe ad hoc mining tasks, should be integrated with
a data warehouse query language and optimized for
Major issues in Data Mining
 Presentation and visualization of data mining
results:- Once the patterns are discovered it needs to
be expressed in high level languages, visual
representations. These representations should be
easily understandable by the users.

 Handling noisy or incomplete data: - The data


cleaning methods are required that can handle the
noise, incomplete objects while mining the data
regularities. If data cleaning methods are not there
then the accuracy of the discovered patterns will be
poor.
Major issues in Data Mining
 Pattern evaluation:- It refers to
interestingness of the problem. The patterns
discovered should be interesting because either
they represent common knowledge or lack
novelty.

 Efficiency and scalability of data mining


algorithms:- In order to effectively extract the
information from huge amount of data in
databases, data mining algorithm must be
Businesses Need Support for Decision Making

 Uncertain economics
 Rapidly changing environments
 Global competition
 Demanding customers
 Taking
advantage of information acquired by
companies is a Critical Success Factor.
The Information Gap

 The shortfall between gathering information and using it for decision


making.
 Firms have inadequate data warehouses.
 Business Analysts spend 2 days a week gathering and formatting data,
instead of performing analysis. (Data Warehousing Institute).
 Business Intelligence (BI) seeks to bridge the information gap.
Summary
 Explained BI, Analytics, Data Marts and Big Data.
 Defined the characteristics of data for good decision making.
 Described data mining in detail.
 Explained and gave examples of
market basket and cluster analysis.
REVIEW QUESTIONS
1. Discuss the relationship between Digital transformation
and Business Intelligence
2. Summarize the benefits of Business intelligence in an
organization in the 21st Century
3. Differentiate Digital transformation and Business
Intelligence
4. Discuss the Porters Five Forces in relation to the Business
intelligence
5. Choose one Sector of Economy in Kenya and discuss how
Business Intelligence has been used to achieve efficiency
in Operations
CASE STUDY

From the Case Study:


ADDRESSING 4 CORE BUSINESS INTELLIGENCE
CHALLENGES ON THE SEARCH FOR ACTIONABLE
INSIGHTS By Jennifer Bresnick

1. Discussthe four Core Business Intelligence


Challenges faced by the organizations today.
2. Proposethe specific solutions to the above
challenges

You might also like