0% found this document useful (0 votes)
114 views9 pages

DM Unit 1 PDF

OLAP stands for Online Analytical Processing and allows users to analyze multidimensional data from multiple database systems simultaneously. OLAP databases are divided into cubes that can be analyzed using five basic operations: drill down, roll up, dice, slice, and pivot. These operations allow users to view data at different levels of granularity. Data mining techniques like association, classification, clustering, sequential patterns, and decision trees are used to extract useful knowledge and patterns from large amounts of data. These techniques help organizations make better decisions. Descriptive techniques characterize data properties while predictive techniques infer patterns to make predictions.

Uploaded by

Ayush
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
114 views9 pages

DM Unit 1 PDF

OLAP stands for Online Analytical Processing and allows users to analyze multidimensional data from multiple database systems simultaneously. OLAP databases are divided into cubes that can be analyzed using five basic operations: drill down, roll up, dice, slice, and pivot. These operations allow users to view data at different levels of granularity. Data mining techniques like association, classification, clustering, sequential patterns, and decision trees are used to extract useful knowledge and patterns from large amounts of data. These techniques help organizations make better decisions. Descriptive techniques characterize data properties while predictive techniques infer patterns to make predictions.

Uploaded by

Ayush
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

OLAP Operations

OLAP stands for Online Analytical Processing Server. It is a software technology


that allows users to analyze information from multiple database systems at the same
time. It is based on multidimensional data model and allows the user to query on
multi-dimensional data (eg. Delhi -> 2018 -> Sales data).

OLAP databases are divided into one or more cubes and these cubes are known
as Hyper-cubes.

OLAP OPERATIONS:
There are five basic analytical operations that can be performed on an OLAP cube:

1. Drill down: In drill-down operation, the less detailed data is converted into highly
detailed data. It can be done by:

 Moving down in the concept hierarchy

 Adding a new dimension


In the cube given in overview section, the drill down operation is performed by moving
down in the concept hierarchy of Time dimension (Quarter -> Month).

2. Roll up: It is just opposite of the drill-down operation. It performs aggregation on the
OLAP cube. It can be done by:

 Climbing up in the concept hierarchy

 Reducing the dimensions

 In the cube given in the overview section, the roll-up operation is performed by
climbing up in the concept hierarchy of Location dimension (City -> Country).
3. Dice: It selects a sub-cube from the OLAP cube by selecting two or more dimensions.
In the cube given in the overview section, a sub-cube is selected by selecting following
dimensions with criteria:

 Location = “Delhi” or “Kolkata”

 Time = “Q1” or “Q2”

 Item = “Car” or “Bus”

4. Slice: It selects a single dimension from the OLAP cube which results in a new sub-
cube creation. In the cube given in the overview section, Slice is performed on the
dimension Time = “Q1”.

5. Pivot: It is also known as rotation operation as it rotates the current view to get a new
view of the representation. In the sub-cube obtained after the slice operation,
performing pivot operation gives a new view of it.

DATA MINING TECHNIQUES

Extracting important knowledge from a very large amount of data can be crucial to
organizations for the process of decision-making.

Some data mining techniques are :-

1 Association

2 Classification

3 Clustering

4 Sequential patterns

5 Decision tree.

1 Association Technique

Association Technique helps to find out the pattern from huge data, based on a
relationship between two or more items of the same transaction. The association
technique is used to analyze market means it help us to analyze people's buying habits.
For example, you might identify that a customer always buys ice cream whenever he
comes to watch move so it might be possible that when customer again comes to watch
move he might also want to buy ice cream again.

2 Classification Technique

Classification technique is most common data mining technique. In classification method


we use mathematical techniques such as decision trees, neural network and statistics in
order to predict unknown records. This technique helps in deriving important information
about data.

Let assume you have set of records, each record contains a set of attributes and depending
upon this attributes you will be able to predict unseen or unknown records. For example,
you have given all records of employees who left the company, with classification
technique you can predict who will probably leave the company in a future period.

3 Clustering Technique

Clustering is one of the oldest techniques used in the process of data mining. The main
aim of clustering technique is to makes cluster(groups) from pieces of data which share
common characteristics. Clustering Technique help to identify the differences and
similarities between the data.

Take an example of a shop in which many items are for sales, now the challenge is how to
keep those items in such way that customer can easily find his required item.By using the
clustering technique, you can keep some items in one corner that have some similarities
and other items in another corner that have some different similarities.

4 Sequential patterns

Sequential patterns are a useful method for identifying trends and similar patterns.
For example, in customer data you identify that a customer buys particular product on
particular time of year, you can use this information to suggest customer these particular
product on that time of year.

5 Decision tree

Decision
sion tree is one of the most common used data mining techniques because its model
is easy to understand for users. In decision tree you start with a simple question which has
two or more answers. Each answer leads to a further two or more question which help he us
to make a final decision. The root node of decision tree is a simple question.

Take a example of flood warning system.

Decision tree
First check water level, if water level is > 50ft then alert is send and if water level is <
50ft then check water level if water level is > 30ft then send warning and if water level is
< 30ft then water is in normal range.

Data Mining Functionalities

Data mining functionalities are used to specify the kind of patterns to be found in data
mining tasks. Data mining tasks can be classified into two categories: descriptive and
predictive.

Descriptive mining tasks characterize the general properties of the data in the database.

Predictive mining tasks perform inference on the current data in order to make
predictions.

Concept/Class Description: Characterization and Discrimination

Data can be associated with classes or concepts. For example, in the Electronics store,
classes of items for sale include computers and printers, and concepts of customers
include bigSpenders and budgetSpenders.

Data characterization

Data characterization is a summarization of the general characteristics or features of a


target class of data.

Data discrimination

Data discrimination is a comparison of the general features of target class data objects
with the general features of objects from one or a set of contrasting classes.

Mining Frequent Patterns, Associations, and Correlations

Frequent patterns, are patterns that occur frequently in data. There are many kinds of
frequent patterns, including itemsets, subsequences, and substructures.

Association analysis

Suppose, as a marketing manager, you would like to determine which items are frequently
purchased together within the same transactions.

buys(X,“computer”)=buys(X,“software”) [support=1%,confidence=50%]
where X is a variable representing a customer.Confidence=50% means that if a customer
buys a computer, there is a 50% chance that she will buy software as well.

Support=1% means that 1% of all of the transactions under analysis showed that
computerr and software were purchased together.

Classification and Prediction

Classification is the process of finding a model that describes and distinguishes data
classes for the purpose of being able to use the model to predict the class of objects whose
class label is unknown.

“How is the derived model presented?” The derived model may be represented in various
forms, such as classification (IF
(IF-THEN)
THEN) rules, decision trees, mathematical formulae, or
neural networks.

A decision tree is a flow-chart


chart-like tree structure, where each node denotes a test on an
attribute value, each branch represents an outcome of the test, and tree leaves represent
classes or class distributions.

Decision tree

A neural network,, when used for classification, is typically a collection


collect of neuron-like
processing units with weighted connections between the units.
Neural Network

Cluster Analysis

In classification and prediction analyze class


class-labeled
labeled data objects, where as clustering
analyzes data objects without consulting a known cla
class label.

Cluster Analysis

The objects are grouped based on the principle of maximizing the intraclass similarity and
minimizing the interclass similarity. That is, clusters of objects are formed so that objects
within a cluster have high similarity in ccomparison
omparison to one another, but are very dissimilar
to objects in other clusters.

Outlier Analysis

A database may contain data objects that do not comply with the general behavior or
model of the data. These data objects are outliers. Most data mining methods discard
outliers as noise or exceptions. The analysis of outlier data is referred to as outlier mining.

You might also like