0% found this document useful (0 votes)

28 views44 pages

2 Data Mining Terms & Concepts

Uploaded by

saharsh0812

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views44 pages

2 Data Mining Terms & Concepts

Uploaded by

saharsh0812

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

DATA MINING TERMS &

CONCEPTS
DBMS
• Database System is used in traditional way of storing and
retrieving data.
• The major task of database system is to perform query
processing.
• These systems are generally referred as online
transaction processing system.
• These systems are used day to day operations of and
organization.
Data Warehouse
• Data Warehouse is the place where huge amount of data
is stored.

• It is meant for users or knowledge workers in the role of

data analysis and decision making.

• These systems are referred as online analytical

processing.
DBMS and Data Warehouse Difference
DBMS and Data Warehouse Difference
OLTP and OLAP
• OLTP

Transaction Oriented applications

Mainly concern with Entry, Storage and retrieval of data.

Design to day-to-day operations such as purchasing,

inventory, payroll, accounting etc.

It supports basically DML operations.

Users of OLTP

Almost all industries including:

Airlines

Supermarkets

Banking

Insurance

Etc.
• Data usually captured in OLTP are stored in
commercial relational databases. e.g;

• Database of supermarket store consists of the

following table to store the data about its
transactions, product, inventory, employee etc.
• Transactions

• ProductName

• EmployeeDetails

• InventorySupplies

• Suppliers
Advantages of OLTP
• Simplicity

• Efficiency

• Allow user to read, write and delete data quickly

• Fast query processing

• Respond user actions immediately and also support transaction

processing in demand.
Challenges
• Security

• It require concurrency control(locking) and

• recovery mechanism.

• OLTP system data content not suitable for decision

making

• A typical OLTP system manages the current data within the

enterprises/organization. These data are too far away from the
decision making.
Answer
The supermarket store is deciding on introducing a new
product. The key debating issue are: “which product should
they introduce?” and “should it be specific to a few
customer segments?”

The Supermarket store is looking at offering some discount

on their year of sale. The question here: “How much
discount should they offer ” and “ should different discount
to be given to different customer segment?”
Answer: OLAP

• OLAP differ from traditional DB in way the

data is conceptualized and stored.

• OLAP data are held in the dimensional

form rather than the relational form.

• OLAP life’s blood is multidimensional data

model.

• The multidimensional data model views

the data in the form of data cube.
Distributed Data Store (Distributed
Database)
• A distributed data store is a computer network where
information is stored on more than one node, often in a
replicated fashion It is usually specifically used to refer to
a distributed database where users store information on a
number of nodes.
Multidimensional Schema
• Multidimensional Schema is especially designed to model
data warehouse systems.

• The schemas are designed to address the unique needs

of very large databases designed for the analytical
purpose (OLAP).
• Two main types of schemas used are:

• Star Schema

• Snowflake Schema
Star Schema
• Star Schema in data warehouse, in which the center of
the star can have one fact table and a number of
associated dimension tables.

• It is known as star schema as its structure resembles a

star.

• The Star Schema data model is the simplest type of Data

Warehouse schema.
Star schema
Star schema Example
Characteristics of star schema
• Every dimension in a star schema is represented with the
only one-dimension table.
• The dimension table should contain the set of attributes.
• The dimension table is joined to the fact table using a
foreign key
• The dimension table are not joined to each other
Snowflake Schema
Snowflake Schema
Characteristics of Snowflake Schema

• It uses smaller disk space.

• Easier to implement a dimension as is added to the

Schema.

• Due to multiple tables query performance is reduced

Difference
Difference
ETL
• ETL is a process in Data Warehousing and it stands for
Extract, Transform and Load.

• It is a process in which an ETL tool extracts the data from

various data source systems, transforms it in the staging
area, and then finally, loads it into the Data Warehouse
system.
ETL
Extraction
• The first step of the ETL process is extraction.

• In this step, data from various source systems is extracted

which can be in various formats like relational databases,
No SQL, XML, and flat files into the staging area.

• It is important to extract the data from various source

systems and store it into the staging area first and not
directly into the data warehouse because the extracted
data is in various formats and can be corrupted also.
Transformation
• In this step, a set of rules or functions are applied on the
extracted data to convert it into a single standard format. It may
involve following processes/tasks:

• Filtering – loading only certain attributes into the data warehouse.

• Cleaning – filling up the NULL values with some default values,

mapping U.S.A, United States, and America into USA, etc.

• Joining – joining multiple attributes into one.

• Splitting – splitting a single attribute into multiple attributes.

• Sorting – sorting tuples on the basis of some attribute (generally key-

attribute).
Loading

• In this step, the transformed data is finally loaded into the

data warehouse.

• Sometimes the data is updated by loading into the data

warehouse very frequently and sometimes it is done after
longer but regular intervals.

• The rate and period of loading solely depends on the

requirements and varies from system to system.
Pipelining
Data mining
• Data mining has been defined as the non-trivial extraction
of implicit, previously unknown, and potentially useful
information from large data sets or databases.
Knowledge Discovery
• Knowledge discovery is the process of finding novel,
interesting, and useful patterns in data.

• Data mining is a subset of knowledge discovery. Thus,

data mining is also known as Knowledge Discovery in
Databases
Information Retrieval
• Automatic retrieval of all relevant documents while at the
same time retrieving as few of the non-relevant as
possible.

• It has the primary goals of indexing text and searching for

useful documents in a collection.
Triplet
• Data is an expression of feedback; a statement (rightly or
wrongly so) about an observation.
• Information is contextualized data.
• Knowledge is a phenomenon that implies our ability to
use the information for reasoning and decision making,
i.e., it is the basis of what you can, will, would, should or
might do with information.
Information Extraction
• Information Extraction has the goal of transforming a
collection of documents, usually with the help of an IR
system, into information that is more readily digested and
analyzed.
Knowledge Representation
• Knowledge representation is the presentation of
knowledge to the user for visualization in terms of trees,
tables, rules graphs, charts, matrices, etc.
Concept Hierarchies
• A concept hierarchy defines a sequence of mappings from
a set of low-level concepts to higher-level, more general
concepts.

• Depending on the type of the ordering relation we

distinguish several types of concept hierarchies.
Set Group Hierarchy
• Concept hierarchies may also be defined by discretizing
or grouping values for a given dimension or attribute,
resulting in a set-grouping hierarchy.
Schema Hierarchy
• A concept hierarchy that is a total or partial order among
attributes in a database schema is called a schema
hierarchy.
Different user view point

• There may be more than one concept hierarchy for a

given attribute or dimension, based on different user
viewpoints.

• For instance, a user may prefer to organize price by

defining ranges for inexpensive, moderately_priced, and
expensive.
Schema hierarchy

• Relating concept generality.

• The ordering reflects the generality of the attribute values,

e.g. street < city < state < country.
Set-grouping hierarchy
• The ordering relation is the subset relation (⊆). Applies to
set values.

• Example:
• {13, ..., 39} = young; {13, ..., 19} = teenage;
• {13, ..., 19} ⊆ {13, ..., 39} ⇒ teenage < young
Operation-derived hierarchy
• Produced by applying an operation (encoding, decoding,
information extraction).

• For example: [email protected] instantiates the

hierarcy user−name < department < university <
education
Rule-based hierarchy

• Using rules to define the partial order.

• for example: if antecedent then consequent defines the

order antecedent < consequent.

ETL Testing - PPT
No ratings yet
ETL Testing - PPT
77 pages
Datawarehouse Interview Quesion and Answers
100% (1)
Datawarehouse Interview Quesion and Answers
230 pages
CompTIA Security+ (SY0-701)
83% (12)
CompTIA Security+ (SY0-701)
405 pages
Unit I DMT
No ratings yet
Unit I DMT
74 pages
DW&DM Material
No ratings yet
DW&DM Material
107 pages
DM-M1-PPT v1.11
No ratings yet
DM-M1-PPT v1.11
84 pages
Multidimensional
No ratings yet
Multidimensional
77 pages
Unit-2 1
No ratings yet
Unit-2 1
60 pages
ML Module1
No ratings yet
ML Module1
56 pages
Data Mining
No ratings yet
Data Mining
98 pages
AZ 900T01A ENU TrainerHandbook PDF
100% (4)
AZ 900T01A ENU TrainerHandbook PDF
261 pages
DWM Unit 1
No ratings yet
DWM Unit 1
67 pages
Dbms Jennys Lectures Watermarked
No ratings yet
Dbms Jennys Lectures Watermarked
92 pages
Idq New Log Files
No ratings yet
Idq New Log Files
187 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
46 pages
Data Warehousing: Data Models and OLAP Operations: Lecture-1
No ratings yet
Data Warehousing: Data Models and OLAP Operations: Lecture-1
47 pages
Bi Unit 4
No ratings yet
Bi Unit 4
40 pages
Data Mining and Warehousing (203105431) : Sandeep Jangir, Assistant Professor
No ratings yet
Data Mining and Warehousing (203105431) : Sandeep Jangir, Assistant Professor
44 pages
Chapter 2.introduction To Data Warehouse
No ratings yet
Chapter 2.introduction To Data Warehouse
49 pages
Informatica FAQs
No ratings yet
Informatica FAQs
143 pages
Data Mining and Warehosuing Lecture 01
No ratings yet
Data Mining and Warehosuing Lecture 01
36 pages
R20-DMT Unit-I
No ratings yet
R20-DMT Unit-I
24 pages
Chapter 2 and 3
No ratings yet
Chapter 2 and 3
89 pages
Data Warehousing: Data Models and OLAP Operations
No ratings yet
Data Warehousing: Data Models and OLAP Operations
41 pages
Data Mining: OLAP Operations
100% (1)
Data Mining: OLAP Operations
8 pages
Data Mining UNIT 2 LECTURE NOTES
No ratings yet
Data Mining UNIT 2 LECTURE NOTES
32 pages
DWM Unit 1 (2023)
No ratings yet
DWM Unit 1 (2023)
38 pages
BusinessIntelligence 2023
No ratings yet
BusinessIntelligence 2023
36 pages
DM Chapter 2
No ratings yet
DM Chapter 2
35 pages
FALLSEM2023-24 CSI3010 ETH VL2023240104197 2023-07-28 Reference-Material-I
No ratings yet
FALLSEM2023-24 CSI3010 ETH VL2023240104197 2023-07-28 Reference-Material-I
32 pages
Unit 2
No ratings yet
Unit 2
32 pages
4th Year DW& DM Kai075 Unit 1
No ratings yet
4th Year DW& DM Kai075 Unit 1
25 pages
An Introduction To Data Warehousing and Data Mining
No ratings yet
An Introduction To Data Warehousing and Data Mining
34 pages
Data Warehouse
No ratings yet
Data Warehouse
71 pages
Vicon DIM IV
No ratings yet
Vicon DIM IV
26 pages
CST466-M1 - Ktunotes - in
No ratings yet
CST466-M1 - Ktunotes - in
24 pages
Data Warehousing
No ratings yet
Data Warehousing
21 pages
DWM Chp2 Notes
No ratings yet
DWM Chp2 Notes
21 pages
Unit 2 DATA WAREHOUSE AND DATA MART
No ratings yet
Unit 2 DATA WAREHOUSE AND DATA MART
17 pages
DMDW 7
No ratings yet
DMDW 7
30 pages
DMW Lab File Work
No ratings yet
DMW Lab File Work
18 pages
Chapter-2 DM
No ratings yet
Chapter-2 DM
23 pages
PECB - ISO IEC 27001 Lead Implementer.v2021 09 28.q18
No ratings yet
PECB - ISO IEC 27001 Lead Implementer.v2021 09 28.q18
6 pages
Data Warehouse Modeling
No ratings yet
Data Warehouse Modeling
17 pages
The Need of Data Analysis
No ratings yet
The Need of Data Analysis
12 pages
UNIT-1 (RIT-062) : Data Warehousing
No ratings yet
UNIT-1 (RIT-062) : Data Warehousing
34 pages
Project Report For ME
No ratings yet
Project Report For ME
49 pages
2.data Warehouse and OLAP
No ratings yet
2.data Warehouse and OLAP
14 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
52 pages
Unit 5 DW
No ratings yet
Unit 5 DW
12 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
10 pages
SQL Interview Questions PDF
88% (43)
SQL Interview Questions PDF
48 pages
Data Warehouse Concepts PDF
0% (1)
Data Warehouse Concepts PDF
14 pages
What Is Data Warehouse?: Data Mining by IK Unit 2
No ratings yet
What Is Data Warehouse?: Data Mining by IK Unit 2
21 pages
Data Modeling Interview Questions
75% (4)
Data Modeling Interview Questions
11 pages
Joomla
No ratings yet
Joomla
4 pages
DW Concepts Shiva
No ratings yet
DW Concepts Shiva
32 pages
Data Dictionary
No ratings yet
Data Dictionary
11 pages
Online Hotel Reservation System
100% (1)
Online Hotel Reservation System
15 pages
Data Warehousing Basics
No ratings yet
Data Warehousing Basics
20 pages
Knowledge Discovery Analysis
No ratings yet
Knowledge Discovery Analysis
7 pages
Persistence Best Practices For Java
No ratings yet
Persistence Best Practices For Java
202 pages
200-901 V12.02
No ratings yet
200-901 V12.02
104 pages
ISO/IEC Information & ICT Security and Governance Standards in Practice
No ratings yet
ISO/IEC Information & ICT Security and Governance Standards in Practice
19 pages
DW Concepts
No ratings yet
DW Concepts
7 pages
Cise & Ceh
No ratings yet
Cise & Ceh
33 pages
Datawarehouse Concepts
No ratings yet
Datawarehouse Concepts
5 pages
M05 - Managing Network Security
No ratings yet
M05 - Managing Network Security
32 pages
Chapter 2-Data Models
No ratings yet
Chapter 2-Data Models
13 pages
NIM Nutshell
No ratings yet
NIM Nutshell
13 pages
Implementing Cisco Edge Network Security Solutions (SENSS) 1.0
No ratings yet
Implementing Cisco Edge Network Security Solutions (SENSS) 1.0
2 pages
External Template MultiStep Parametric Loads ACT Extension
No ratings yet
External Template MultiStep Parametric Loads ACT Extension
13 pages
CC Unit 4 MCQ
100% (1)
CC Unit 4 MCQ
10 pages
ArcGIS Enterprise Terminology Guide
No ratings yet
ArcGIS Enterprise Terminology Guide
11 pages
Spring Interview Cheat Sheet
No ratings yet
Spring Interview Cheat Sheet
2 pages
EM202297DHA723DNP - 1DOT503 Assessment 2 Brief Report Module 8 FINAL
No ratings yet
EM202297DHA723DNP - 1DOT503 Assessment 2 Brief Report Module 8 FINAL
7 pages
CTBC Exalogic Refresh - PCA X9-2 Briefing - Exalogic Migration To PCA 220725
No ratings yet
CTBC Exalogic Refresh - PCA X9-2 Briefing - Exalogic Migration To PCA 220725
10 pages
HR Employee
No ratings yet
HR Employee
5 pages
Data Scientist - Enterprise Analytics & Data Science - United Airlines
No ratings yet
Data Scientist - Enterprise Analytics & Data Science - United Airlines
2 pages
CV - en - Hamdaoui Mohamed Amine
100% (1)
CV - en - Hamdaoui Mohamed Amine
2 pages
System Analysis and Design Lab 7: Question 1: Part A
No ratings yet
System Analysis and Design Lab 7: Question 1: Part A
4 pages
B Navaneetha (4y - 6m)
No ratings yet
B Navaneetha (4y - 6m)
3 pages
Lost Articles and Letters Reconciliation System
No ratings yet
Lost Articles and Letters Reconciliation System
4 pages
The Importance of Auditing in Our Daily Lives
No ratings yet
The Importance of Auditing in Our Daily Lives
2 pages
Blockchain
No ratings yet
Blockchain
3 pages
Take Home Assignment - CCS3342-Business Intelligence
No ratings yet
Take Home Assignment - CCS3342-Business Intelligence
2 pages
SAP HANA Cockpit
No ratings yet
SAP HANA Cockpit
1 page
Database Management System
From Everand
Database Management System
Manish Soni
No ratings yet
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
From Everand
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
AJIT DASH
2/5 (2)
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet

2 Data Mining Terms & Concepts

Uploaded by

2 Data Mining Terms & Concepts

Uploaded by

DATA MINING TERMS &

• It is meant for users or knowledge workers in the role of

• These systems are referred as online analytical

Transaction Oriented applications

Mainly concern with Entry, Storage and retrieval of data.

Design to day-to-day operations such as purchasing,

It supports basically DML operations.

Almost all industries including:

• Database of supermarket store consists of the

• Allow user to read, write and delete data quickly

• Fast query processing

• Respond user actions immediately and also support transaction

• It require concurrency control(locking) and

• OLTP system data content not suitable for decision

• A typical OLTP system manages the current data within the

The Supermarket store is looking at offering some discount

• OLAP differ from traditional DB in way the

• OLAP data are held in the dimensional

• OLAP life’s blood is multidimensional data

• The multidimensional data model views

• The schemas are designed to address the unique needs

• It is known as star schema as its structure resembles a

• The Star Schema data model is the simplest type of Data

• It uses smaller disk space.

• Easier to implement a dimension as is added to the

• Due to multiple tables query performance is reduced

• It is a process in which an ETL tool extracts the data from

• In this step, data from various source systems is extracted

• It is important to extract the data from various source

• Filtering – loading only certain attributes into the data warehouse.

• Cleaning – filling up the NULL values with some default values,

• Joining – joining multiple attributes into one.

• Splitting – splitting a single attribute into multiple attributes.

• Sorting – sorting tuples on the basis of some attribute (generally key-

• In this step, the transformed data is finally loaded into the

• Sometimes the data is updated by loading into the data

• The rate and period of loading solely depends on the

• Data mining is a subset of knowledge discovery. Thus,

• It has the primary goals of indexing text and searching for

• Depending on the type of the ordering relation we

• There may be more than one concept hierarchy for a

• For instance, a user may prefer to organize price by

• Relating concept generality.

• The ordering reflects the generality of the attribute values,

• For example: [email protected] instantiates the

• Using rules to define the partial order.

• for example: if antecedent then consequent defines the

You might also like