DWDM Unit 2 and 3
DWDM Unit 2 and 3
Example 5.1 Market basket analysis. Suppose, as manager of an All Electronics branch, you
would like to learn more about the buying habits of your customers. Specifically, you wonder,
“Which groups or sets of items are customers likely to purchase on a given trip to the store?”.
transactions where each transaction T is a set of items such that T ⊆ I. Each transaction is
LetI = {I1, I2,..., Im} be a set of items. Let D, the task-relevant data, be a set of database
confidence(A⇒B) = P(B|A).
A set of items is referred to as an itemset. 2 An itemset that contains k items is a k-itemset. The
set {computer, antivirus software} is a 2-itemset. The occurrence frequency of an itemset is the
number of transactions that contain the itemset.
1. Find all frequent itemsets: By definition, each of these itemsets will occur at least as
frequently as a predetermined minimum support count, min sup.
2. Generate strong association rules from the frequent itemsets: By definition, these rules
must satisfy minimum support and minimum confidence.
In this section, you will learn methods for mining the simplest form of frequent patterns—single-
dimensional, single-level, Boolean frequent item sets, such as those discussed for market basket
analysis.
This particular type is actually one of the most unique kinds of all
the four association rules available. What sets it apart from the
others is the presence of numeric attributes in at least one
attribute of quantitative association rules. This is in contrast to the
generalized association rule, where the left and right sides consist
of categorical attributes.
A data mining procedure can uncover thousands of rules from a given set of information, most of
which end up being independent or tedious to the users. Users have a best sense of which
“direction” of mining can lead to interesting patterns and the “form” of the patterns or rules they
can like to discover.
Therefore, a good heuristic is to have the users defines such intuition or expectations as
constraints to constraint the search space. This strategy is called constraint-based mining.
Constraint-based algorithms need constraints to decrease the search area in the frequent itemset
generation step (the association rule generating step is exact to that of exhaustive algorithms).
The general constraint is the support minimum threshold. If a constraint is uncontrolled, its
inclusion in the mining phase can support significant reduction of the exploration space because
of the definition of a boundary inside the search space lattice, following which exploration is not
needed.
The important of constraints is well-defined − they create only association rules that are
appealing to users. The method is quite trivial and the rules space is decreased whereby
remaining methods satisfy the constraints.
Knowledge type constraints − These define the type of knowledge to be mined, including
association or correlation.
Data constraints − These define the set of task-relevant information such as Dimension/level
constraints − These defines the desired dimensions (or attributes) of the information, or methods
of the concept hierarchies, to be utilized in mining.
Interestingness constraints − These defines thresholds on numerical measures of rule
interestingness, including support, confidence, and correlation.
Rule constraints − These defines the form of rules to be mined. Such constraints can be defined
as metarules (rule templates), as the maximum or minimum number of predicates that can appear
in the rule antecedent or consequent, or as relationships between attributes, attribute values,
and/or aggregates.
The following constraints can be described using a high-level declarative data mining query
language and user interface. This form of constraint-based mining enables users to define the
rules that they can like to uncover, thus by creating the data mining process more efficient.
Furthermore, a sophisticated mining query optimizer can be used to deed the constraints defined
by the user, thereby creating the mining process more effective. Constraint-based mining boost
interactive exploratory mining and analysis.
APRIORI ALGORITHM
Apriori algorithm, a classic algorithm, is useful in mining frequent item sets and relevant
association rules. Usually, you operate this algorithm on a database containing a large number of
transactions.
Example: The items customers buy at a supermarket. It helps the customers buy their items with
ease, and enhances the sales performance of the departmental store.
Three significant components comprise the apriori algorithm. They are as follows.
Support
Confidence
Lift
Example: you need a big database. Let us suppose you have 2000 customer transactions in a
supermarket. You have to find the Support, Confidence, and Lift for two items, say bread and
jam. It is because people frequently bundle these two items together.
Out of the 2000 transactions, 200 contain jam whereas 300 contain bread. These 300 transactions
include a 100 that includes bread as well as jam. Using this data, we shall find out the support,
confidence, and lift.
Support
Support is the default popularity of any item. You calculate the Support as a quotient of the
division of the number of transactions containing that item by the total number of transactions.
Hence, in our example,
= 200/2000 = 10%
Confidence
In our example, Confidence is the likelihood that customer bought both bread and jam. Dividing
the number of transactions that include both bread and jam by the total number of transactions
will give the Confidence figure.
Confidence = (Transactions involving both bread and jam) / (Total Transactions involving jam)
= 100/200 = 50%. It implies that 50% of customers who bought jam bought bread as well.
Lift
According to our example, Lift is the increase in the ratio of the sale of bread when you sell jam.
The mathematical formula of Lift is as follows.
= 50 / 10 = 5
It says that the likelihood of a customer buying both jam and bread together is 5 times more than
the chance of purchasing jam alone. If the Lift value is less than 1, it entails that the customers
are unlikely to buy both the items together. Greater the value, the better is the combination.
Example:
Consider a supermarket scenario where the itemset is I = {Onion, Burger, Potato, Milk, Beer}.
The database consists of six transactions where 1 represents the presence of the item and 0 the
absence.
Step 1
Create a frequency table of all the items that occur in all the transactions. Now, prune the
frequency table to include only those items having a threshold support level over 50%. We arrive
at this frequency table.
Step 2
Make pairs of items such as OP, OB, OM, PB, PM, BM. This frequency table is what you arrive
at.
Step 3
Apply the same threshold support of 50% and consider the items that exceed 50% (in this case 3
and above).
Step 4
Look for a set of three items that the customers buy together. Thus we get this combination.
Step 5
Determine the frequency of these two item sets. You get this frequency table.
If you apply the threshold assumption, you can deduce that the set of three items frequently
purchased by the customers is OPB. In reality, you have hundreds and thousands of such
combinations.
At times, you need a large number of candidate rules. It can become computationally
expensive.
It is also an expensive method to calculate support because the calculation has to go
through the entire database.
Once the frequent item sets from transactions in a database D have been found, it is
straightforward to generate strong association rules from them (where strong association rules
satisfy both minimum support and minimum confidence). This can be done using Equation (5.4)
for confidence, which we show again here for completeness:
Mining Various Kinds of Association Rules
We have studied efficient methods for mining frequent item sets and association rules. In this
section, we consider additional application requirements by extending our scope to include
mining multilevel association rules, multidimensional association rules, and quantitative
association rules in transactional and/or relational databases and data warehouses. Multilevel
association rules involve concepts at different levels of abstraction. Multidimensional association
rules involve more than one dimension or predicate (e.g., rules relating what a customer buys as
well as the customer’s age.).
For many applications, it is difficult to find strong associations among data items at low or
primitive levels of abstraction due to the sparsity of data at those levels. Strong associations
discovered at high levels of abstraction may represent commonsense knowledge. Moreover, what
may represent common sense to one user may seem novel to another. Therefore, data mining
systems should provide capabilities for mining association rules at multiple levels of abstraction,
with sufficient flexibility for easy traversal among different abstraction spaces.
Mining multilevel association rules. Suppose we are given the task-relevant set of transactional
data in Table 5.6 for sales in an All Electronics store, showing the items purchased for each
transaction.
FP- Growth algorithm:
FP stands for frequent pattern
It is an efficient & scalable method for mining the complete set of frequent
pattren using a tree structure.
The tree structure is used for storing the information about fb called fp tree
Example:
The following table shows the transaction & the data item which
are selected in each transaction here the minimum support 30%
Transaction-ID Items
1 6,A,D,B
2 D,A,E,C,B
3 C,A,B,E
4 B,A,D
5 D
6 D,B
7 A,D,E
8 B,C
Solution:
Step-1: Find out the frequency and priority of each item using the
above data items.
Order: B,D,A,E,C
Step-2: Order the items in the above table according to the
priority.
B D
D A
C
E
A E
B-1,2,3,4,5,6 E C
D-1,2,3,4
A-1,2,3
E-1,2
C-1
A-1
E-1
C-1
D-1,2
A-1
E-1
C-1
UNIT-3
a) Normalization –scaling the data such that it falls which in the specified range
(1) Suppose we have given with data types of the student with their average then we have
single node.
Example: Average
(2) Suppose if the avg of the student is above 80./. it means a data types belongs to single
class.
Example: Avg>80%
For, suppose if all the students do not have the avg above 80./. then the algorithm proceed
further.
Avg
Above80% >65% and <80% >40% and <65%
Tree Pruning:-
When a decision tree is built, many of the branches will reflect anomalies because of noise and
outliers in the training data. Such branches should be removed.
1) Pre pruning
2) Post pruning
Pre pruning: In the preprinting approach, a tree is pruned by halting its construction early.
Post pruning: In this approach the sub trees are removed from a “fully grown “ tree.
A sub tree at a given node is pruned by removing its branch and replacing it with a leaf
The leaf is labeled with the most frequent class among the sub tree being replaced.
Bayesian’s Classification:
Baye’s Theorem:
Bayesian classifiers are statistical classifiers. They can predict class membership probability
such as the probability that a given touple belongs to a particular class . Bayesian’s
classification is based on Baye’s theorem.
2 Girls 3Girls
3Boys 4Boys
class A class B
P(A^G)- P(A).P(G/A)
P(G)=P(A^G)+P(B G)
Baye’s theorem
Where
Navie Bayes Classification: Navie Bayes is one of the most efficient and effective Inductive
learning algorithms for machine learning and data mining.
In simple terms, a navie Bayes classifier assumes that presence of a particular feature in a
class is unrelated to the presence of any other feature.
P(T/Short) =0
P(T/Medium) =0
P(T/TALL) =0.5
By comparing the tree class lables the highest value is 1 for which we got for tall, so ,the given
touple belongs to tall class 10 j m 2.2 tall.
Rules are a good way of representing information or bits of knowledge. A rule based classifier
uses a set of If-Then riles for classification. An If-Then rule is an expression of the form
Neural network:-
A neural network is a set of connected input/output units which each connection has a weight
associated with it.
The illustration below shows the basic idea behind Support Vector Machines. Here we
see the original objects (left side of the schematic) mapped, i.e., rearranged, using a set of
mathematical functions, known as kernels. The process of rearranging the objects is
known as mapping (transformation). Note that in this new setting, the mapped objects
(right side of the schematic) is linearly separable and, thus, instead of constructing the
complex curve (left schematic), all we have to do is to find an optimal line that can
separate the GREEN and the RED objects.
Prediction:
Following are the examples of cases where the data analysis task is Prediction −
Suppose the marketing manager needs to predict how much a given customer will spend during
a sale at his company. In this example we are bothered to predict a numeric value. Therefore the
data analysis task is an example of numeric prediction. In this case, a model or a predictor will
be constructed that predicts a continuous-valued-function or ordered value.
It involves a response variable Y and a single predictor variable X
Y=W0+W1X
101 101
W1=∑ (xi-x¯)(yi-y⁻)/∑(xi-x⁻)²;W0=Y⁻-w1X⁻
i=1
X Y
Mid Exam Final Exam
72 84
50 63
81 71
74 78
94 96
86 75
59 49
83 79
65 77
33 52
88 74
81 90
(72-72.16)(84-73.5)+(50-72.16)(63-73.5)……………..
W1=
(72-72.16)²(50-72.16)²…………….
W1=1940 / 3286.96=0.59
W0=74-0.59(72.16) =31.4
Y=31.4+0.59(X)
Y =31.4+0.59(87)
Y =83
PLOT GRAPH:
Lazy learners:
Lazy learning refers to machine learning processes in which generalization of the training data is
delayed until a query is made to the system. This type of learning is also known as Instance-
based Learning. Lazy classifiers are very useful when working with large datasets that have a
few attributes.
Learning systems have computation occurring at two different times: training time and
consultation times.
Training time is the time before the consultation time. During training time, the system derives
inferences from training data to prepare for the consultation time.
Consultation time is the time between the moment when an object is presented to a system so
that the system can make an inference and the moment when the system finishes making the
inference.
In a lazy learning algorithm, most of the computation is done during consultation time.
Essentially, a lazy algorithm defers the processing of examples till it receives an explicit request
for information.
Instance-based learning, local regression, K-Nearest Neighbors (K-NN), and Lazy Bayesian
Rules are some examples of lazy learning.
It is very useful when not all the examples are priorly available, but need to be collected
online. In such a situation, a new example observed would only require an update to the
database.
In lazy learning, collecting examples about an operating regime does not degrade the
modeling performance of other operating regimes. Essentially, lazy learning is not prone
to suffering from data interference.
The problem-solving capabilities of a lazy learning algorithm increase with every newly
presented case.
Lazy learning is easy to maintain because the learner will adapt automatically to changes
in the problem domain.
By Rule (6.47), a customer who has had a job for at least two years will receive credit if
her income is, say, $50,000, but not if it is $49,000. Such harsh thresholding may seem
unfair.
Page rank –Algorithm:
D
Iteration -1:
AA
PR(A)=(1/4)/3=1/12
B
PR(B)=(1/4)/2+(1/4)/3=2.5/12
C
C B
D
D
PR(C)=(1/4)/2+(1/4)/2=4.5/12
B
A
C
D
PR(D)=(1/4)/3+(1/4)/3=4/12
A
B
C
D
A
Now iteration 2: B
PR(A) =(4.5)/12/3=1.5/12
C
D
PR(B)=(1/12)/2+(4.5/12)/3=2/12
A
B
C
D
PR(D)=(4.5)/12/3+(2.5)/12/3=4/12
B
C
D
The web Page which have highest value that has highest Rank. In this
example web page ‘c’ has highest rank with 4.5/12