DM Unit 1 PDF
DM Unit 1 PDF
OLAP databases are divided into one or more cubes and these cubes are known
as Hyper-cubes.
OLAP OPERATIONS:
There are five basic analytical operations that can be performed on an OLAP cube:
1. Drill down: In drill-down operation, the less detailed data is converted into highly
detailed data. It can be done by:
2. Roll up: It is just opposite of the drill-down operation. It performs aggregation on the
OLAP cube. It can be done by:
In the cube given in the overview section, the roll-up operation is performed by
climbing up in the concept hierarchy of Location dimension (City -> Country).
3. Dice: It selects a sub-cube from the OLAP cube by selecting two or more dimensions.
In the cube given in the overview section, a sub-cube is selected by selecting following
dimensions with criteria:
4. Slice: It selects a single dimension from the OLAP cube which results in a new sub-
cube creation. In the cube given in the overview section, Slice is performed on the
dimension Time = “Q1”.
5. Pivot: It is also known as rotation operation as it rotates the current view to get a new
view of the representation. In the sub-cube obtained after the slice operation,
performing pivot operation gives a new view of it.
Extracting important knowledge from a very large amount of data can be crucial to
organizations for the process of decision-making.
1 Association
2 Classification
3 Clustering
4 Sequential patterns
5 Decision tree.
1 Association Technique
Association Technique helps to find out the pattern from huge data, based on a
relationship between two or more items of the same transaction. The association
technique is used to analyze market means it help us to analyze people's buying habits.
For example, you might identify that a customer always buys ice cream whenever he
comes to watch move so it might be possible that when customer again comes to watch
move he might also want to buy ice cream again.
2 Classification Technique
Let assume you have set of records, each record contains a set of attributes and depending
upon this attributes you will be able to predict unseen or unknown records. For example,
you have given all records of employees who left the company, with classification
technique you can predict who will probably leave the company in a future period.
3 Clustering Technique
Clustering is one of the oldest techniques used in the process of data mining. The main
aim of clustering technique is to makes cluster(groups) from pieces of data which share
common characteristics. Clustering Technique help to identify the differences and
similarities between the data.
Take an example of a shop in which many items are for sales, now the challenge is how to
keep those items in such way that customer can easily find his required item.By using the
clustering technique, you can keep some items in one corner that have some similarities
and other items in another corner that have some different similarities.
4 Sequential patterns
Sequential patterns are a useful method for identifying trends and similar patterns.
For example, in customer data you identify that a customer buys particular product on
particular time of year, you can use this information to suggest customer these particular
product on that time of year.
5 Decision tree
Decision
sion tree is one of the most common used data mining techniques because its model
is easy to understand for users. In decision tree you start with a simple question which has
two or more answers. Each answer leads to a further two or more question which help he us
to make a final decision. The root node of decision tree is a simple question.
Decision tree
First check water level, if water level is > 50ft then alert is send and if water level is <
50ft then check water level if water level is > 30ft then send warning and if water level is
< 30ft then water is in normal range.
Data mining functionalities are used to specify the kind of patterns to be found in data
mining tasks. Data mining tasks can be classified into two categories: descriptive and
predictive.
Descriptive mining tasks characterize the general properties of the data in the database.
Predictive mining tasks perform inference on the current data in order to make
predictions.
Data can be associated with classes or concepts. For example, in the Electronics store,
classes of items for sale include computers and printers, and concepts of customers
include bigSpenders and budgetSpenders.
Data characterization
Data discrimination
Data discrimination is a comparison of the general features of target class data objects
with the general features of objects from one or a set of contrasting classes.
Frequent patterns, are patterns that occur frequently in data. There are many kinds of
frequent patterns, including itemsets, subsequences, and substructures.
Association analysis
Suppose, as a marketing manager, you would like to determine which items are frequently
purchased together within the same transactions.
buys(X,“computer”)=buys(X,“software”) [support=1%,confidence=50%]
where X is a variable representing a customer.Confidence=50% means that if a customer
buys a computer, there is a 50% chance that she will buy software as well.
Support=1% means that 1% of all of the transactions under analysis showed that
computerr and software were purchased together.
Classification is the process of finding a model that describes and distinguishes data
classes for the purpose of being able to use the model to predict the class of objects whose
class label is unknown.
“How is the derived model presented?” The derived model may be represented in various
forms, such as classification (IF
(IF-THEN)
THEN) rules, decision trees, mathematical formulae, or
neural networks.
Decision tree
Cluster Analysis
Cluster Analysis
The objects are grouped based on the principle of maximizing the intraclass similarity and
minimizing the interclass similarity. That is, clusters of objects are formed so that objects
within a cluster have high similarity in ccomparison
omparison to one another, but are very dissimilar
to objects in other clusters.
Outlier Analysis
A database may contain data objects that do not comply with the general behavior or
model of the data. These data objects are outliers. Most data mining methods discard
outliers as noise or exceptions. The analysis of outlier data is referred to as outlier mining.