0% found this document useful (0 votes)

11 views7 pages

Summary

Uploaded by

Jaja Nie Si Yao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views7 pages

Summary

Uploaded by

Jaja Nie Si Yao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

Business Intelligence:

 storing data, looking backward

Business analytics:
 analyze data, looking into the future
 scientific process of transforming data into insights for the purpose of making better
decisions
 action-driven approach
 ensure our analysis is driving business action.

Methods of business analytics:

1. Descriptive: Descriptive analytics is the analysis of historical data using two key
methods – data aggregation and data mining.
 it is used to represent what has happened in the past
 uses fairly simple analysis techniques
 descriptive analytics form the core of the everyday reporting
 shows companies the raw results of their potential actions
2. Predictive: It uses probabilities to make assessments of what could happen in the
future.
 it uses data mining
 it uses statistical modelling and machine learning techniques

3. Prescriptive: shows companies which option is the best

 borrows heavily from mathematics and computer science, using a variety of
statistical methods.
 emphasises actionable insights instead of data monitoring

What is data?
Data: values or measurements that represent conditions, objects or ideas
In the context of Data Mining, data is often assumed to be available in a tabular form
Conventional tools:
 reporting
 query-languages
 OLAP and spreadsheets
Disadvantages of conventional tools:
 Often, quite simple questions can be answered only
 Automation difficult
 Only small amounts of data may be handled (esp. spreadsheets)
 Only primitive statistical methods involved
 OLAP: query-focused and low complexity of analysis

Data mining: is the process of sorting through large data sets to identify patterns and
relationships that can help solve business problems through data analysis.
“Data mining is the analysis of (often large) observational data sets to find unsuspected
relationships and to summarize the data in novel ways that are both understandable and
useful to the data owner.” (Hand et al. 2001)
Clustering: attempts to group individuals in a population together by their similarity, but
without regard to any specific purpose
 Useful in preliminary domain exploration

Classification: attempts to predict, for each individual in a population, which of a (small) set
of classes that individual belongs to
 Classification algorithms provide models that determine which class a new individual
belongs to
 Classification is related to scoring

Regression: (value estimation) attempts to estimate or predict, for each individual, the
numerical value for that individual
 Generate regression model by looking at other, similar individuals in the population

KDD process: KDD is the systematic process of identifying valid, practical, and
understandable patterns in massive and complicated data sets.
CRISP-DM: Cross Industry Standard Process for Data Mining

Accuracy: closeness between the value in the data and the true value
 Low accuracy of numerical attributes due to noisy measurements, limited precision,
wrong measurements, transposition of digits when entered manually.
 Low accuracy of categorical attributes due to erroneous entries and typos.

Syntactic accuracy is violated if an entry does not belong to the domain of the attribute.
Semantic accuracy is violated if an entry is not correct although it belongs to the domain of
the attribute.

Missing at random (MAR)

 The probability that a value for 𝑌 is missing does not depend on the true value of 𝑌
→𝑃(𝑌 =?𝑋 =𝑃𝑌 =?𝑌,𝑋 𝑜𝑏𝑠 𝑜𝑏𝑠)
 Example: The maintenance staff does not change the batteries of a sensor when it
rains. Thus, the sensor does not always provide measurements when it rains.

Nonignorable
 The probability that a value for 𝑌 is missing depends on the true value of 𝑌.
 Example: A sensor for the temperature will not work when there is frost.
Pearson’s Correlation Coefficient

The (sample) Pearson‘s correlation coefficient is a measure for a linear relationship between
two numerical attributes 𝐴_1 and 𝐴_2
The larger the absolute value of the Pearson correlation coefficient, the stronger the linear
relationship between the two attributes.
Positive (negative) correlation indicates a line with positive (negative) slope.

Single Attributes:
 Categorical attributes: an outlier is a value that occurs with an extremely lower
frequency than the frequency of all other values
 Numerical attributes: outliers can be identified in boxplots or by statistical tests.

Supervised Learning: specific target

Unsupervised Learning: no specific target

What is Cluster Analysis?

Clustering is the process of grouping data into classes or clusters so that objects within a
cluster have high similarity in comparison to one another, but are very dissimilar to objects in
other clusters.
Cluster = collection of data objects that are similar to each other
Clustering is unsupervised learning

Hierarchical-Based Clustering: Creates a hierarchical decomposition of a set of data objects

Two main approaches:
 Agglomerative approach (bottom up): start with each object as a separate group,
then successively merge groups until a certain termination condition holds
 Divisive approach (top-down): start with all objects in one cluster, then successively
split up into smaller clusters until a certain termination condition holds
Measures for distances between two clusters:
 Single linkage: Minimum distance between two data points of different clusters
 Complete linkage: Maximum distance between two data points of different clusters
 Meandistance, i.e.,distancebetweenthemeanoftheclusters
 Average linkage: Average distance of all single distances of data points from different
clusters
The dendrogram visualizes all splits/merges and helps to identify a suitable number of
clusters after the computation
Prediction = estimating an unknown value
Induction = generalizing from specific cases to general rules

Entropy = is a measure of disorder that can be applied to a set 𝑆

Misclassification Rate = The sum of fractions of samples that represent a minority class
inside a set
Gini Index = Fraction of times in which any randomly drawn instance from the set 𝑆 would be
labeled incorrectly if its label was randomly chosen according to the label distribution inside
the set
Correlation = statistical relationship between two attributes

Unit - 2 Data Minig Notes
No ratings yet
Unit - 2 Data Minig Notes
15 pages
DMDW Notes Unit 2
0% (1)
DMDW Notes Unit 2
11 pages
Data Mining Techniques Unit 2
No ratings yet
Data Mining Techniques Unit 2
48 pages
BCA Data Mining
No ratings yet
BCA Data Mining
116 pages
DM - Unit I-Updated
No ratings yet
DM - Unit I-Updated
65 pages
Unit 1
No ratings yet
Unit 1
28 pages
DWDM (Unit-4) - 2
No ratings yet
DWDM (Unit-4) - 2
23 pages
Unit 4
No ratings yet
Unit 4
39 pages
Unit 1 Data Mining Task
No ratings yet
Unit 1 Data Mining Task
7 pages
Clustering
No ratings yet
Clustering
6 pages
2 Data Mining Tasks A Functionalities
No ratings yet
2 Data Mining Tasks A Functionalities
24 pages
Data Mining Tasks
No ratings yet
Data Mining Tasks
3 pages
2 Buss Intel Analytics
No ratings yet
2 Buss Intel Analytics
43 pages
Wk. 1. Introduction (08.10.2020)
No ratings yet
Wk. 1. Introduction (08.10.2020)
30 pages
HIT391-week 3-New
No ratings yet
HIT391-week 3-New
43 pages
UNIT 1 Introduction of Data Mining
No ratings yet
UNIT 1 Introduction of Data Mining
11 pages
#CH-2 2 2
No ratings yet
#CH-2 2 2
16 pages
UG BSF Clustering
No ratings yet
UG BSF Clustering
119 pages
Data Mining Tasks
No ratings yet
Data Mining Tasks
20 pages
Data Mining-Unit-1
No ratings yet
Data Mining-Unit-1
21 pages
Margin 6794edf99eb1f 6794ede66a47f
No ratings yet
Margin 6794edf99eb1f 6794ede66a47f
2 pages
DataMining Chapter1
No ratings yet
DataMining Chapter1
13 pages
Data Science S3mca
No ratings yet
Data Science S3mca
55 pages
03 Data Science Process - Fall 23-24
No ratings yet
03 Data Science Process - Fall 23-24
38 pages
Unit 4 Data Warehousing and Data Mining
No ratings yet
Unit 4 Data Warehousing and Data Mining
15 pages
Unit 4
No ratings yet
Unit 4
27 pages
DWDM Unit 1 Chap2 PDF
No ratings yet
DWDM Unit 1 Chap2 PDF
21 pages
R21 DM Unit1
No ratings yet
R21 DM Unit1
77 pages
Unit No 3
No ratings yet
Unit No 3
10 pages
DM Module1 Notes
No ratings yet
DM Module1 Notes
25 pages
DMlecture 1
No ratings yet
DMlecture 1
39 pages
Unit 4
No ratings yet
Unit 4
42 pages
DSS Chapter 5
No ratings yet
DSS Chapter 5
9 pages
Datawarehousing and Data Mining
No ratings yet
Datawarehousing and Data Mining
119 pages
It 311-Ads Module 5
No ratings yet
It 311-Ads Module 5
9 pages
Unit I
No ratings yet
Unit I
19 pages
Chapter 3: Data Mining
No ratings yet
Chapter 3: Data Mining
20 pages
Data Mining
No ratings yet
Data Mining
34 pages
Mod 5 Busan
No ratings yet
Mod 5 Busan
5 pages
PTDLKT
No ratings yet
PTDLKT
11 pages
DM UNIT-1 Question and Answer
No ratings yet
DM UNIT-1 Question and Answer
25 pages
Assignment Solution 074
No ratings yet
Assignment Solution 074
8 pages
Dmwviva
No ratings yet
Dmwviva
4 pages
Unit 3 BI & Data Science
No ratings yet
Unit 3 BI & Data Science
19 pages
Data Mining Notes
No ratings yet
Data Mining Notes
25 pages
Summary Business Analytics
No ratings yet
Summary Business Analytics
24 pages
Business Analytics: Aviral Apurva Anureet Bansal Devansh Agarwaal Dhwani Dhingra Chirag Verma
No ratings yet
Business Analytics: Aviral Apurva Anureet Bansal Devansh Agarwaal Dhwani Dhingra Chirag Verma
49 pages
YEAH
No ratings yet
YEAH
2 pages
1.1 Project Overview: Data Mining
No ratings yet
1.1 Project Overview: Data Mining
74 pages
Data Mining - UNIT-IV
No ratings yet
Data Mining - UNIT-IV
24 pages
Soln 1
100% (1)
Soln 1
6 pages
Big Data
No ratings yet
Big Data
28 pages
Rangkuman Data Analitik Dan Big Data
No ratings yet
Rangkuman Data Analitik Dan Big Data
10 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
51 pages
Discovering Knowledge in Data: Lecture Review of
No ratings yet
Discovering Knowledge in Data: Lecture Review of
20 pages
What Is Data Mining?
No ratings yet
What Is Data Mining?
17 pages
Data Mining Technique Using Weka Tool
No ratings yet
Data Mining Technique Using Weka Tool
21 pages
Business Statistics 22-23
No ratings yet
Business Statistics 22-23
4 pages
Data Mining Questions
100% (1)
Data Mining Questions
7 pages
Harsey2017 Darvo
No ratings yet
Harsey2017 Darvo
21 pages
Cloud-Based ROP Prediction and Optimization in Real Time Using Supervised Machine Learning
No ratings yet
Cloud-Based ROP Prediction and Optimization in Real Time Using Supervised Machine Learning
12 pages
Inferential Statistics
No ratings yet
Inferential Statistics
76 pages
Climate Change and Bank Default Yuna Heo
No ratings yet
Climate Change and Bank Default Yuna Heo
54 pages
Hayton Et Al. - 2004 - Factor Retention Decisions in Exploratory Factor A
No ratings yet
Hayton Et Al. - 2004 - Factor Retention Decisions in Exploratory Factor A
16 pages
RM File Deep Mehrotra Bba B& I 417
No ratings yet
RM File Deep Mehrotra Bba B& I 417
68 pages
Do Fundamental Financial Ratios Affect
No ratings yet
Do Fundamental Financial Ratios Affect
16 pages
Module Data Analysis
No ratings yet
Module Data Analysis
6 pages
CCF Statistics
No ratings yet
CCF Statistics
19 pages
24p NUR ABIDAH BINTI AZHAR
No ratings yet
24p NUR ABIDAH BINTI AZHAR
28 pages
Regression Analysis and Its Application: A Data-Oriented Approach First Edition Richard F. Gunst Download PDF
100% (2)
Regression Analysis and Its Application: A Data-Oriented Approach First Edition Richard F. Gunst Download PDF
65 pages
Student Resilience
No ratings yet
Student Resilience
46 pages
ELI5 (Not So Much) Linear Regression
No ratings yet
ELI5 (Not So Much) Linear Regression
21 pages
The Relations Between Hope and Subjective Well-Being: A Literature Overview and Empirical Analysis
No ratings yet
The Relations Between Hope and Subjective Well-Being: A Literature Overview and Empirical Analysis
23 pages
Apnaklub Data Analysis
No ratings yet
Apnaklub Data Analysis
12 pages
Relationship Between Students' Academic and Clinical Performance in Maternal and Child Nursing Course in A Selected College of Nursing
No ratings yet
Relationship Between Students' Academic and Clinical Performance in Maternal and Child Nursing Course in A Selected College of Nursing
8 pages
Rewards and Organizational Commitment A Study Among Malaysian Smes Employees
No ratings yet
Rewards and Organizational Commitment A Study Among Malaysian Smes Employees
21 pages
Factor Analysis
No ratings yet
Factor Analysis
14 pages
Psychological Factors Affecting Investors Decision Making
No ratings yet
Psychological Factors Affecting Investors Decision Making
17 pages
Performance Management System Implementation in Public Sector Undertakings - With Reference To RSTPS, Telangana
No ratings yet
Performance Management System Implementation in Public Sector Undertakings - With Reference To RSTPS, Telangana
5 pages
Ermeregress
No ratings yet
Ermeregress
21 pages
A Reassessment of Mercury in Silastic Strain Gauge Plethysmography
No ratings yet
A Reassessment of Mercury in Silastic Strain Gauge Plethysmography
16 pages
International Wound Journal - 2022 - Xu - Effects of Neutrophil To Lymphocyte Ratio Serum Calcium and Serum Albumin On
No ratings yet
International Wound Journal - 2022 - Xu - Effects of Neutrophil To Lymphocyte Ratio Serum Calcium and Serum Albumin On
9 pages
Roditeljski Stavovi
No ratings yet
Roditeljski Stavovi
20 pages
AQ077-3-2 Probability and Statistical Modelling SOW VE1
No ratings yet
AQ077-3-2 Probability and Statistical Modelling SOW VE1
5 pages
Sma 160 Introduction To Probability and Statistics
No ratings yet
Sma 160 Introduction To Probability and Statistics
4 pages
Human Brain Mapping - 2005 - Patel - A Bayesian Approach To Determining Connectivity of The Human Brain
No ratings yet
Human Brain Mapping - 2005 - Patel - A Bayesian Approach To Determining Connectivity of The Human Brain
10 pages
Uscm 2022 101
No ratings yet
Uscm 2022 101
12 pages
Discriminant Analysis SPSS
No ratings yet
Discriminant Analysis SPSS
7 pages

Summary

Uploaded by

Summary

Uploaded by

Business Intelligence:

 storing data, looking backward

Methods of business analytics:

3. Prescriptive: shows companies which option is the best

Missing at random (MAR)

Supervised Learning: specific target

What is Cluster Analysis?

Hierarchical-Based Clustering: Creates a hierarchical decomposition of a set of data objects

Entropy = is a measure of disorder that can be applied to a set 𝑆

You might also like