0% found this document useful (0 votes)

18 views10 pages

Adbms Unit5

Uploaded by

ofurusamuelchimenim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views10 pages

Adbms Unit5

Uploaded by

ofurusamuelchimenim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 10

DATA WAREHOUSING

W.H. Inmon defined –“ It is a subject-oriented, integrated, time-variant, non-volatile collection of data

in support of managements decisions.”

Data warehouses contain consolidated data from many sources, augmented with summary
information and covering a long time period. Warehouses are much larger than other kinds of
databases; sizes ranging from several gigabytes to terabytes are common.

Comparison between Operational database & data warehouse

Characteristics Operational Database Data Warehouse

Subject-Oriented Functional/Process oriented Data are subject-oriented. Ex:
data- invoices, credits,debits sales,products
Integrated Similar data can have different Provides a unified view of data
representations. with common representation.
Time-Variant Current Transactions are stored Data is historic in nature
Non-volatile Data update/deletes are Once data are stored, no
common changes are allowed

Main Components of Data warehouses

1. Data Acquisition
2. Data Storage
3. Data Access

Data Acquisition

An organizations’ daily operations access and modify operational databases. Data from these
operational databases and other external sources (e.g., customer proﬁles supplied by external
consultants) are extracted by using gateways, or standard external interfaces supported by the
underlying DBMSs. A gateway is an application program interface that allows client programs to
generate SQL statements to be executed at a server. Standards such as Open Database Connectivity
(ODBC) and Open Linking and Embedding for Databases (OLE-DB) from Microsoft and Java
DatabaseConnectivity (JDBC) are emerging for gateways.

Data is extracted from operational databases and external sources, cleaned to minimize errors and fill in
missing information when possible, and transformed to reconcile semantic mismatches. Transforming
data is typically accomplished by defining a relational view over the tables in the data sources (the
operational databases and other external sources).
Data Storage
Loading data consists of materializing such views and storing them in the warehouse. The cleaned and
transformed data is finally loaded into the warehouse. Additional preprocessing such as sorting and
generation of summary information is carried out at this stage. Data is partitioned and indexes are built
for efficiency. Due to the large volume of data, loading is a slow process. Loading a terabyte of data
sequentially can take weeks, and loading even a gigabyte can take hours. Parallelism is therefore
important for loading warehouses.
After data is loaded into a warehouse, additional measures must be taken to ensure that the data in the
warehouse is periodically refreshed to reflect updates to the data sources and to periodically purge data
that is too old from the warehouse

Data Access

Provides end users with access to the stored warehouse information. Tools such as quering, reporting,
OLAP (Online Analytical Processing), statistics, graphical and geographical information systems can be
used.

Characteristics of DW

 Multidimensional conceptual view

 Generic Dimensionality
 Client/Server Architecture
 Multi-user support
 Accessibility
 Flexible reporting
 Transparency & intuitive data manipulation.

Benefits of DW

 High Return on Investments

 Cost effective
 Competitive advantage
 Enterprise Intelligence
 Enhanced customer service
 Business Reengineering

Limitations of DW

 Query intensive
 Performance tuning is hard
 Scalability can be a problem
 High demand of resources
 High maintenance
 Complexity of integration

DATAWAREHOUSE ARCHITECTURE

Legacy Systems

n Legacy systems are older-generation systems that are incompatible with current generation
standards and systems but still in production use

 E.g. applications written in Cobol that run on mainframes

 Today’s hot new system is tomorrows legacy system!

Operational Data Store

It is a repository of current and integrated operational data used for analysis. It is created when
the legacy system is incapable of reporting. It is one of the most recent concepts in datawarehousing.
Data in ODS is subject-oriented, integrated, volatile and current or near.

Data Warehouse

Data enters the data warehouse into an integrated structure and format. The process involves
conversion, summarization, filtering, and condensation of data.

The datawarehouse DBMS is the cornerstone of datawarehousing environment. It is

implemented on RDBMS technology.Approaches such as parallel databases, multi-relational
databases(MRDBs), multidimensional databases (MDDBs) are also used in the environement.

Metadata

Data about the data which decribes the datawarehouse.

Data Marts

It is a generalized term used to describe data in a data warehouse environment. It is a

subsidiary of the data warehouse. It is localized , single purpose data warehouse implementation. It
describes small, single purpose mini data warehouse.

Adv:

 It enables departments to customize their data as it flows into the data mart.
 Enables departments to select subset of historic data
 Departments can select s/w for their data mart.
 Very cost effective

Disav

 Difficult to extend for other departments.

 Scalability
 Data Integration Problem

MDDBs (Multi dimensional Data bases)

They are tightly coupled with OLAP.

OLAP (Online Analytical Processing)

 Online Analytical Processing (OLAP)

o Interactive analysis of data, allowing data to be summarized and viewed in different
ways in an online fashion (with negligible delay)
 Data that can be modeled as dimension attributes and measure attributes are called
multidimensional data.
o Given a relation used for data analysis, we can identify some of its attributes as measure
attributes, since they measure some value, and can be aggregated upon. For instance,
the attribute number of the sales relation is a measure attribute, since it measures the
number of units sold.
o Some of the other attributes of the relation are identified as dimension attributes, since
they define the dimensions on which measure attributes, and summaries of measure
attributes, are viewed.

 The earliest OLAP systems used multidimensional arrays in memory to store data cubes, and are
referred to as multidimensional OLAP (MOLAP) systems.

 OLAP implementations using only relational database features are called relational OLAP
(ROLAP) systems

 Hybrid systems, which store some summaries in memory and store the base data and other
summaries in a relational database, are called hybrid OLAP (HOLAP) systems.
Data Mining

It is the process of extracting valid, previously unknown, comprehensible and actionable information
from large databases and using it for crucial business decisions.

It is a subarea of statistics (exploratory data analysis) and subarea of AI (KD and Machine leaning).

 Process of semi-automatically analyzing large databases to find patterns that are:

o valid: hold on new data with some certainity

o novel: non-obvious to the system

o useful: should be possible to act on the item

o understandable: humans should be able to interpret the pattern

 Also known as Knowledge Discovery in Databases (KDD)

Applications of Data Mining

 Banking: loan/credit card approval

o predict good customers based on old customers

 Customer relationship management:

o identify those who are likely to leave for a competitor.

 Targeted marketing:

o identify likely responders to promotions

 Fraud detection: telecommunications, financial transactions

o from an online stream of event identify fraudulent events

 Manufacturing and production:

o automatically adjust knobs when process parameter changes

 Medicine: disease outcome, effectiveness of treatments

o analyze patient disease history: find relationship between diseases

 Molecular/Pharmaceutical: identify new drugs

 Scientific data analysis:

o identify new galaxies by searching for sub clusters

 Web site/store design and promotion:

o find affinity of visitor to pages and modify layout

KDD PROCESS/ STEPS

 Problem fomulation

 Data collection

o subset data: sampling might hurt if highly skewed data

o feature selection: principal component analysis, heuristic search

 Pre-processing: cleaning

o name/address cleaning, different meanings (annual, yearly), duplicate removal,

supplying missing values

 Transformation:

o map complex objects e.g. time series data to features e.g. frequency

 Choosing mining task and mining method:

 Result evaluation and Visualization:

Data mining Techniques

1. Association Rules (AR)

Data is regarded as a collection of transactions, each involving a set of item. Association
rule must correlate the presence of a set of items with another range of values for
another set of variables.
Ex: bread Þ milk DB-Concepts, OS-Concepts Þ Networks
o Left hand side: antecedent, right hand side: consequent
o An association rule must have an associated population; the population
consists of a set of instances

2. Classification Trees
It is the process of learning a model that describes different classes of data. The classes
are predetermined.

E.g., given a new automobile insurance applicant, should he or she be classified as low
risk, medium risk or high risk?

Classification rules can be compactly shown as a decision tree.

3. Sequential Patterns
This rules defines the sequential pattern of transactions. For ex: If a person undergoes
cardiac surgery, he may suffer from kidney failure in the next 14 years.
4. Patterns within time series
This rule detects the similarities within positions of a time series of data taken at regular
intervals of time. Ex: daily closing stock, daily sales.
5. Clustering
A given population of events can be partitioned into sets of similar elements. This is
called cluster and each record belongs to exactly one cluster. Ex: women populations
can be grouped as ‘most-likely-to-buy’ and ‘leaset-likely-to-buy’.

Goals of DM

 Prediction
Predict the future behavior of certain attributes within data. Ex:On the basis of seismic
wave pattern the probability of an earthquake can be predicted.
 Identification
DM can identify the existence of an event, item or activity on the basis of data patterns.
Ex: identification of existence of genes based on DNA sequence.
 Classification
DM can partition the data so that different classes can be identified based on
combination of parameters. Ex: loyal and regular customers.
 Optimisation
DM can optimize the use of limited resources such as time, space, money , materials to
maximize output variables.

SPATIAL DATABASES

 Spatial databases store information related to spatial locations, and support efficient storage,
indexing and querying of spatial data.
 Special purpose index structures are important for accessing spatial data, and for processing
spatial join queries.
 Computer Aided Design (CAD) databases store design information about how objects are
constructed E.g.: designs of buildings, aircraft, layouts of integrated-circuits
 Geographic databases store geographic information (e.g., maps): often called geographic
information systems or GIS.

Representation of Spatial Data

 Various geometric constructs can be represented in a database in a normalized

fashion.
 Represent a line segment by the coordinates of its endpoints.
 Approximate a curve by partitioning it into a sequence of segments
o Create a list of vertices in order, or
o Represent each segment as a separate tuple that also carries with it the identifier of the
curve (2D features such as roads).
 Closed polygons
o List of vertices in order, starting vertex is the same as the ending vertex, or
o Represent boundary edges as separate tuples, with each containing identifier of the
polygon, or
o Use triangulation — divide polygon into triangles

Spatial Database Queries

 Nearness queries request objects that lie near a specified location.

 Nearest neighbor queries, given a point or an object, find the nearest object
that satisfies given conditions.
 Region queries deal with spatial regions. e.g., ask for objects that lie
partially or fully inside a specified region.
 Queries that compute intersections or unions of regions.

Spatial join of two spatial relations with the location playing the role of join attribute

 Spatial data is typically queried using a graphical query language; results are also
displayed in a graphical manner.
 Graphical interface constitutes the front-end
 Extensions of SQL with abstract data types, such as lines, polygons and bit maps,
have been proposed to interface with back-end.
o allows relational databases to store and retrieve spatial information
o Queries can use spatial conditions (e.g. contains or overlaps).
o queries can mix spatial and nonspatial conditions

Indexing in Spatial DB

1. Quad Tree
 Each node of a quadtree is associated with a rectangular region of space; the
top node is associated with the entire target space.
 Each non-leaf nodes divides its region into four equal sized quadrants
correspondingly each such node has four child nodes corresponding to the four
quadrants and so on
 Leaf nodes have between zero and some fixed maximum number of points.

R-Trees

 R-trees are a N-dimensional extension of B+-trees, useful for indexing sets of

rectangles and other polygons.
 Supported in many modern database systems, along with variants like R+ -trees
and R*-trees.
 Basic idea: generalize the notion of a one-dimensional interval associated with
each B+ -tree node to an
N-dimensional interval, that is, an N-dimensional rectangle.
 Will consider only the two-dimensional case (N = 2)
 generalization for N > 2 is straightforward, although R-trees work well only for
relatively small N
 A rectangular bounding box is associated with each tree node.
 Bounding box of a leaf node is a minimum sized rectangle that contains all the
rectangles/polygons associated with the leaf node.
 The bounding box associated with a non-leaf node contains the bounding box
associated with all its children.
 Bounding box of a node serves as its key in its parent node (if any)
 Bounding boxes of children of a node are allowed to overlap
 A polygon is stored only in one node, and the bounding box of the node must
contain the polygon
 The storage efficiency or R-trees is better than that of k-d trees or quadtrees
since a polygon is stored only once

Laser
No ratings yet
Laser
35 pages
DWDM B Tech Unit 1 Part-A
No ratings yet
DWDM B Tech Unit 1 Part-A
15 pages
Data Warehouse Full Slides
100% (3)
Data Warehouse Full Slides
822 pages
Gpon Cli Manual-V1.01
No ratings yet
Gpon Cli Manual-V1.01
257 pages
Sci 7 q1 12 Demonstrate Proper Use and Handling of Science Equipment
No ratings yet
Sci 7 q1 12 Demonstrate Proper Use and Handling of Science Equipment
44 pages
BI Unit 1 Data Warehouse
No ratings yet
BI Unit 1 Data Warehouse
169 pages
03 DM BI Data Warehousing
No ratings yet
03 DM BI Data Warehousing
94 pages
By Bi Jay Mishra
No ratings yet
By Bi Jay Mishra
685 pages
Data Warehousing
100% (1)
Data Warehousing
51 pages
Data Warehousing - Data Mining CSE - IT (4th Year) Engineering Lecture Notes, Ebook PDF Download
No ratings yet
Data Warehousing - Data Mining CSE - IT (4th Year) Engineering Lecture Notes, Ebook PDF Download
146 pages
Unit 1
No ratings yet
Unit 1
99 pages
Module 1
No ratings yet
Module 1
71 pages
DataminingWarehousing Module 1 PPT Notes
No ratings yet
DataminingWarehousing Module 1 PPT Notes
95 pages
Module 1-1basic Concepts
No ratings yet
Module 1-1basic Concepts
40 pages
Database And Computer Management: SERIES 1, #3
From Everand
Database And Computer Management: SERIES 1, #3
Elias Mutegi
No ratings yet
Data Mining and Warehousing - L1 & L2
No ratings yet
Data Mining and Warehousing - L1 & L2
30 pages
Engine Table of Contents
No ratings yet
Engine Table of Contents
248 pages
Chapter 6-Data Warehouse and Datamining
No ratings yet
Chapter 6-Data Warehouse and Datamining
38 pages
The Data Warehouse and Data Mining
No ratings yet
The Data Warehouse and Data Mining
51 pages
COMP 002 Computer Application Module TEACHERS
No ratings yet
COMP 002 Computer Application Module TEACHERS
34 pages
ENCOR - Chapter - 1 - Packet Forwarding
No ratings yet
ENCOR - Chapter - 1 - Packet Forwarding
57 pages
Datawarehouse & Data Mining
No ratings yet
Datawarehouse & Data Mining
59 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
46 pages
Data Mining and Data Warehousing
No ratings yet
Data Mining and Data Warehousing
73 pages
Lecture 1 & 2
No ratings yet
Lecture 1 & 2
14 pages
Adbms Unit1
No ratings yet
Adbms Unit1
27 pages
CH 1
No ratings yet
CH 1
53 pages
Data Mining and Warehosuing Lecture 01
No ratings yet
Data Mining and Warehosuing Lecture 01
36 pages
Module1 Part3
No ratings yet
Module1 Part3
46 pages
Carbenes 170512195843
No ratings yet
Carbenes 170512195843
38 pages
Adbms Unit2
No ratings yet
Adbms Unit2
20 pages
(2022-S2) 02 Robot Kinematics Part 1 New
No ratings yet
(2022-S2) 02 Robot Kinematics Part 1 New
36 pages
Data Warehousing and OLAP
No ratings yet
Data Warehousing and OLAP
47 pages
Introduction To Data Warehouse
No ratings yet
Introduction To Data Warehouse
17 pages
OLAP and Data Mining
No ratings yet
OLAP and Data Mining
27 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
70 pages
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 2
No ratings yet
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 2
86 pages
CH-2 Data Warehouse and OLAP
No ratings yet
CH-2 Data Warehouse and OLAP
24 pages
Introduction To Data Warehousing
100% (2)
Introduction To Data Warehousing
53 pages
DBMS II Seven 7
No ratings yet
DBMS II Seven 7
13 pages
Data Warehouse
No ratings yet
Data Warehouse
19 pages
Unit - 2 (1) DBMS
No ratings yet
Unit - 2 (1) DBMS
25 pages
Data Warehousing AND Data Mining
No ratings yet
Data Warehousing AND Data Mining
134 pages
Data Warehousing Unit 1,2
No ratings yet
Data Warehousing Unit 1,2
9 pages
DWM Unit-I Notes
No ratings yet
DWM Unit-I Notes
9 pages
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
From Everand
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
AJIT DASH
2/5 (2)
Data Warehousing and Data Mining
75% (4)
Data Warehousing and Data Mining
14 pages
2025-Handouts - OLAP - Lecture 1
No ratings yet
2025-Handouts - OLAP - Lecture 1
10 pages
Data Mining N Business Intelligence
No ratings yet
Data Mining N Business Intelligence
63 pages
Data Warehousing AND Data Mining
No ratings yet
Data Warehousing AND Data Mining
51 pages
Ammonia How Much Catalyst Is Needed For
No ratings yet
Ammonia How Much Catalyst Is Needed For
10 pages
Module 1
No ratings yet
Module 1
25 pages
CSE 592 Data Mining: Instructor: Pedro Domingos
No ratings yet
CSE 592 Data Mining: Instructor: Pedro Domingos
63 pages
Hedging Strategies Using Futures
No ratings yet
Hedging Strategies Using Futures
37 pages
Introduction To DW
No ratings yet
Introduction To DW
127 pages
Create Stored Procedures in The NorthWind
No ratings yet
Create Stored Procedures in The NorthWind
7 pages
Defining Data Mining and Data Warehouse
No ratings yet
Defining Data Mining and Data Warehouse
10 pages
Cameron EB 539 D Rev D1 - Preferred Seal Orientation For Pressure Control Equipment
No ratings yet
Cameron EB 539 D Rev D1 - Preferred Seal Orientation For Pressure Control Equipment
7 pages
Level Iii Ut Specific Examination
No ratings yet
Level Iii Ut Specific Examination
8 pages
Chapter-2 DM
No ratings yet
Chapter-2 DM
23 pages
UNIT-1 (RIT-062) : Data Warehousing
No ratings yet
UNIT-1 (RIT-062) : Data Warehousing
34 pages
A Review On Cellular Manufacturing Syste
No ratings yet
A Review On Cellular Manufacturing Syste
5 pages
Defining Data Mining and Data Warehouse (Adugna Gutema)
No ratings yet
Defining Data Mining and Data Warehouse (Adugna Gutema)
9 pages
Mediation Moderation in Social Psychological Research
No ratings yet
Mediation Moderation in Social Psychological Research
11 pages
A Novel Dataset of Guava Fruit For Grading and Classification
No ratings yet
A Novel Dataset of Guava Fruit For Grading and Classification
5 pages
Prac 4 Report
100% (1)
Prac 4 Report
15 pages
DW Concepts
100% (1)
DW Concepts
40 pages
Data Warehousing and Data Mining
100% (1)
Data Warehousing and Data Mining
30 pages
Lecturas No 1 Role and Scope of Industrial Engineers (1) - Recognized
No ratings yet
Lecturas No 1 Role and Scope of Industrial Engineers (1) - Recognized
14 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
29 pages
How The Switching Frequency Affects The Performance of A Buck Converter
No ratings yet
How The Switching Frequency Affects The Performance of A Buck Converter
8 pages
Cryptanalysis of A New Ultralightweight RFID Authentication ProtocolSASI
No ratings yet
Cryptanalysis of A New Ultralightweight RFID Authentication ProtocolSASI
5 pages
Data Warehousing & Data Mining: by Mandar Kulkarni PRN 10030141129 Mba-It Sicsr
No ratings yet
Data Warehousing & Data Mining: by Mandar Kulkarni PRN 10030141129 Mba-It Sicsr
36 pages
10 1149@2 1181908jes
No ratings yet
10 1149@2 1181908jes
6 pages
ETL Testing
No ratings yet
ETL Testing
32 pages
Chapter 13 - Data Warehousing
No ratings yet
Chapter 13 - Data Warehousing
31 pages
Adbms Modelqpnov2013
No ratings yet
Adbms Modelqpnov2013
2 pages
Adb QB
No ratings yet
Adb QB
2 pages
DATA WAREHOUSE Basic Concepts
No ratings yet
DATA WAREHOUSE Basic Concepts
26 pages
Data Warehouse
No ratings yet
Data Warehouse
16 pages
How Evolution of Database Led To Data Mining
No ratings yet
How Evolution of Database Led To Data Mining
10 pages
Data Mining and Data Warehouse BY
100% (1)
Data Mining and Data Warehouse BY
12 pages
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet
IBM System Networking SAN24B-5 Switch: Flexible, Easy-To-Use, Entr Y-Level SAN Switch For Private Cloud Storage
No ratings yet
IBM System Networking SAN24B-5 Switch: Flexible, Easy-To-Use, Entr Y-Level SAN Switch For Private Cloud Storage
4 pages
2 Operations On Polynomials
No ratings yet
2 Operations On Polynomials
5 pages
Experimental Study On Self Compacting Concrete With Various Percentage of Steel Fibres
No ratings yet
Experimental Study On Self Compacting Concrete With Various Percentage of Steel Fibres
4 pages
Pervaporation Ketazine Aq Layer Prodn HH Peroxide Proc PDF
No ratings yet
Pervaporation Ketazine Aq Layer Prodn HH Peroxide Proc PDF
6 pages
Upc 1678 G
No ratings yet
Upc 1678 G
6 pages
Data Warehousing and Data Mining Final Year Seminar Topic
No ratings yet
Data Warehousing and Data Mining Final Year Seminar Topic
10 pages
Knowledge Discovery Analysis
No ratings yet
Knowledge Discovery Analysis
7 pages
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
DMC Double Metal Cyanide Catalyst
No ratings yet
DMC Double Metal Cyanide Catalyst
2 pages
Half-Wave Rectifier Feeding A DC Motor
No ratings yet
Half-Wave Rectifier Feeding A DC Motor
4 pages

Adbms Unit5

Uploaded by

Adbms Unit5

Uploaded by

DATA WAREHOUSING

W.H. Inmon defined –“ It is a subject-oriented, integrated, time-variant, non-volatile collection of data

Comparison between Operational database & data warehouse

Characteristics Operational Database Data Warehouse

Main Components of Data warehouses

 Multidimensional conceptual view

 High Return on Investments

 E.g. applications written in Cobol that run on mainframes

 Today’s hot new system is tomorrows legacy system!

Operational Data Store

The datawarehouse DBMS is the cornerstone of datawarehousing environment. It is

Data about the data which decribes the datawarehouse.

It is a generalized term used to describe data in a data warehouse environment. It is a

 Difficult to extend for other departments.

MDDBs (Multi dimensional Data bases)

They are tightly coupled with OLAP.

OLAP (Online Analytical Processing)

 Online Analytical Processing (OLAP)

 Process of semi-automatically analyzing large databases to find patterns that are:

o valid: hold on new data with some certainity

o novel: non-obvious to the system

o useful: should be possible to act on the item

o understandable: humans should be able to interpret the pattern

 Also known as Knowledge Discovery in Databases (KDD)

Applications of Data Mining

 Banking: loan/credit card approval

o predict good customers based on old customers

 Customer relationship management:

o identify those who are likely to leave for a competitor.

o identify likely responders to promotions

 Fraud detection: telecommunications, financial transactions

o from an online stream of event identify fraudulent events

 Manufacturing and production:

o automatically adjust knobs when process parameter changes

 Medicine: disease outcome, effectiveness of treatments

o analyze patient disease history: find relationship between diseases

 Molecular/Pharmaceutical: identify new drugs

 Scientific data analysis:

o identify new galaxies by searching for sub clusters

o find affinity of visitor to pages and modify layout

KDD PROCESS/ STEPS

o subset data: sampling might hurt if highly skewed data

o feature selection: principal component analysis, heuristic search

o name/address cleaning, different meanings (annual, yearly), duplicate removal,

 Choosing mining task and mining method:

 Result evaluation and Visualization:

Data mining Techniques

1. Association Rules (AR)

Classification rules can be compactly shown as a decision tree.

Representation of Spatial Data

 Various geometric constructs can be represented in a database in a normalized

Spatial Database Queries

 Nearness queries request objects that lie near a specified location.

 R-trees are a N-dimensional extension of B+-trees, useful for indexing sets of

You might also like