0% found this document useful (0 votes)
18 views31 pages

DWDM Unit 2 and 3

Uploaded by

vinaydarling063
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views31 pages

DWDM Unit 2 and 3

Uploaded by

vinaydarling063
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 31

UNIT-II

Mining frequent patterns, associations and correlations:


Frequent pattern mining searches for recurring relationships in a given data set. This section
introduces the basic concepts of frequent pattern mining for the discovery of interesting
associations and correlations between item sets in transactional and relational databases.
Frequent item set mining leads to the discovery of associations and correlations among items in
large transactional or relational data sets. A typical example of frequent item set mining is
market basket analysis. This process analyzes customer buying habits by finding associations
between the different items that customers place in their “shopping baskets”.

Example 5.1 Market basket analysis. Suppose, as manager of an All Electronics branch, you
would like to learn more about the buying habits of your customers. Specifically, you wonder,
“Which groups or sets of items are customers likely to purchase on a given trip to the store?”.

Computer ⇒ antivirus software [support = 2%, confidence = 60%

Frequent Item sets, Closed Item sets, and Association Rules

transactions where each transaction T is a set of items such that T ⊆ I. Each transaction is
LetI = {I1, I2,..., Im} be a set of items. Let D, the task-relevant data, be a set of database

contain A if and only if A ⊆ T.


associated with an identifier, called TID. Let A be a set of items. A transaction T is said to
The rule A ⇒ B has confidence c in the transaction set D, where c is the percentage of
transactions in D containing A that also contain B. This is taken to be the conditional probability,
P(B|A). That is, support (A⇒B) = P(A∪B)

confidence(A⇒B) = P(B|A).

A set of items is referred to as an itemset. 2 An itemset that contains k items is a k-itemset. The
set {computer, antivirus software} is a 2-itemset. The occurrence frequency of an itemset is the
number of transactions that contain the itemset.

In general, association rule mining can be viewed as a two-step process:

1. Find all frequent itemsets: By definition, each of these itemsets will occur at least as
frequently as a predetermined minimum support count, min sup.

2. Generate strong association rules from the frequent itemsets: By definition, these rules
must satisfy minimum support and minimum confidence.

Efficient and Scalable Frequent Item set Mining Methods

In this section, you will learn methods for mining the simplest form of frequent patterns—single-
dimensional, single-level, Boolean frequent item sets, such as those discussed for market basket
analysis.

Association rule mining:

The Association rule is a learning technique that helps identify the


dependencies between two data items. Based on the dependency,
it then maps accordingly so that it can be more
profitable. Association rule furthermore looks for interesting
associations among the variables of the dataset. It is undoubtedly
one of the most important concepts of Machine Learning and has
been used in different cases such as association in data
mining and continuous production, among others. However, like all
other techniques, association in data mining, too, has its own set
of disadvantages. The same has been discussed in brief in this
article.

An association rule has 2 parts:

 an antecedent (if) and


 a consequent (then)
An antecedent is something that’s found in data, and a consequent
is an item that is found in combination with the antecedent. Have a
look at this rule for instance:

“If a customer buys bread, he’s 70% likely of buying milk.”

In the above association rule, bread is the antecedent and milk is


the consequent. Simply put, it can be understood as a retail store’s
association rule to target their customers better. If the above rule
is a result of a thorough analysis of some data sets, it can be used
to not only improve customer service but also improve the
company’s revenue.
Association rules are created by thoroughly analyzing data and
looking for frequent if/then patterns. Then, depending on the
following two parameters, the important relationships are
observed:

1. Support: Support indicates how frequently the if/then relationship


appears in the database.
2. Confidence: Confidence tells about the number of times these
relationships have been found to be true.
Types Of Association Rules In Data Mining

There are typically four different types of association rules in data


mining. They are

 Multi-relational association rules


 Generalized Association rule
 Interval Information Association Rules
 Quantitative Association Rules
Multi-Relational Association Rule

Also known as MRAR, multi-relational association rule is defined as


a new class of association rules that are usually derived from
different or multi-relational databases. Each rule under this class
has one entity with different relationships that represent the
indirect relationships between entities.

Generalized Association Rule

Moving on to the next type of association rule, the generalized


association rule is largely used for getting a rough idea about the
interesting patterns that often tend to stay hidden in data.

Quantitative Association Rules

This particular type is actually one of the most unique kinds of all
the four association rules available. What sets it apart from the
others is the presence of numeric attributes in at least one
attribute of quantitative association rules. This is in contrast to the
generalized association rule, where the left and right sides consist
of categorical attributes.

Constraint-Based Association Mining:

A data mining procedure can uncover thousands of rules from a given set of information, most of
which end up being independent or tedious to the users. Users have a best sense of which
“direction” of mining can lead to interesting patterns and the “form” of the patterns or rules they
can like to discover.

Therefore, a good heuristic is to have the users defines such intuition or expectations as
constraints to constraint the search space. This strategy is called constraint-based mining.

Constraint-based algorithms need constraints to decrease the search area in the frequent itemset
generation step (the association rule generating step is exact to that of exhaustive algorithms).
The general constraint is the support minimum threshold. If a constraint is uncontrolled, its
inclusion in the mining phase can support significant reduction of the exploration space because
of the definition of a boundary inside the search space lattice, following which exploration is not
needed.

The important of constraints is well-defined − they create only association rules that are
appealing to users. The method is quite trivial and the rules space is decreased whereby
remaining methods satisfy the constraints.

Constraint-based clustering discover clusters that satisfy user-defined preferences or constraints.


It depends on the characteristics of the constraints, constraint-based clustering can adopt rather
than different approaches.

The constraints can include the following which are as follows −

Knowledge type constraints − These define the type of knowledge to be mined, including
association or correlation.
Data constraints − These define the set of task-relevant information such as Dimension/level
constraints − These defines the desired dimensions (or attributes) of the information, or methods
of the concept hierarchies, to be utilized in mining.
Interestingness constraints − These defines thresholds on numerical measures of rule
interestingness, including support, confidence, and correlation.
Rule constraints − These defines the form of rules to be mined. Such constraints can be defined
as metarules (rule templates), as the maximum or minimum number of predicates that can appear
in the rule antecedent or consequent, or as relationships between attributes, attribute values,
and/or aggregates.

The following constraints can be described using a high-level declarative data mining query
language and user interface. This form of constraint-based mining enables users to define the
rules that they can like to uncover, thus by creating the data mining process more efficient.

Furthermore, a sophisticated mining query optimizer can be used to deed the constraints defined
by the user, thereby creating the mining process more effective. Constraint-based mining boost
interactive exploratory mining and analysis.

APRIORI ALGORITHM
Apriori algorithm, a classic algorithm, is useful in mining frequent item sets and relevant
association rules. Usually, you operate this algorithm on a database containing a large number of
transactions.

Example: The items customers buy at a supermarket. It helps the customers buy their items with
ease, and enhances the sales performance of the departmental store.

Three significant components comprise the apriori algorithm. They are as follows.

 Support
 Confidence
 Lift

Example: you need a big database. Let us suppose you have 2000 customer transactions in a
supermarket. You have to find the Support, Confidence, and Lift for two items, say bread and
jam. It is because people frequently bundle these two items together.

Out of the 2000 transactions, 200 contain jam whereas 300 contain bread. These 300 transactions
include a 100 that includes bread as well as jam. Using this data, we shall find out the support,
confidence, and lift.

Support

Support is the default popularity of any item. You calculate the Support as a quotient of the
division of the number of transactions containing that item by the total number of transactions.
Hence, in our example,

Support (Jam) = (Transactions involving jam) / (Total Transactions)

= 200/2000 = 10%

Confidence

In our example, Confidence is the likelihood that customer bought both bread and jam. Dividing
the number of transactions that include both bread and jam by the total number of transactions
will give the Confidence figure.
Confidence = (Transactions involving both bread and jam) / (Total Transactions involving jam)
= 100/200 = 50%. It implies that 50% of customers who bought jam bought bread as well.

Lift

According to our example, Lift is the increase in the ratio of the sale of bread when you sell jam.
The mathematical formula of Lift is as follows.

Lift = (Confidence (Jam͢͢ – Bread)) / (Support (Jam))

= 50 / 10 = 5

It says that the likelihood of a customer buying both jam and bread together is 5 times more than
the chance of purchasing jam alone. If the Lift value is less than 1, it entails that the customers
are unlikely to buy both the items together. Greater the value, the better is the combination.

Working of Apriori Algorithm in Data Mining


The Apriori Algorithm makes the following assumptions.
 All subsets of a frequent item set should be frequent.
 In the same way, the subsets of an infrequent item set should be infrequent.
 Set a threshold support level. In our case, we shall fix it at 50%

Example:

Consider a supermarket scenario where the itemset is I = {Onion, Burger, Potato, Milk, Beer}.
The database consists of six transactions where 1 represents the presence of the item and 0 the
absence.
Step 1

Create a frequency table of all the items that occur in all the transactions. Now, prune the
frequency table to include only those items having a threshold support level over 50%. We arrive
at this frequency table.

This table signifies the items frequently bought by the customers.

Step 2

Make pairs of items such as OP, OB, OM, PB, PM, BM. This frequency table is what you arrive
at.
Step 3

Apply the same threshold support of 50% and consider the items that exceed 50% (in this case 3
and above).

Thus, you are left with OP, OB, PB, and PM

Step 4

Look for a set of three items that the customers buy together. Thus we get this combination.

OP and OB gives OPB

PB and PM gives PBM

Step 5
Determine the frequency of these two item sets. You get this frequency table.

If you apply the threshold assumption, you can deduce that the set of three items frequently
purchased by the customers is OPB. In reality, you have hundreds and thousands of such
combinations.

Apriori Algorithm – Pros

 Easy to understand and implement


 Can use on large item sets

Apriori Algorithm – Cons

 At times, you need a large number of candidate rules. It can become computationally
expensive.
 It is also an expensive method to calculate support because the calculation has to go
through the entire database.

Generating Association Rules from Frequent Item sets

Once the frequent item sets from transactions in a database D have been found, it is
straightforward to generate strong association rules from them (where strong association rules
satisfy both minimum support and minimum confidence). This can be done using Equation (5.4)
for confidence, which we show again here for completeness:
Mining Various Kinds of Association Rules

We have studied efficient methods for mining frequent item sets and association rules. In this
section, we consider additional application requirements by extending our scope to include
mining multilevel association rules, multidimensional association rules, and quantitative
association rules in transactional and/or relational databases and data warehouses. Multilevel
association rules involve concepts at different levels of abstraction. Multidimensional association
rules involve more than one dimension or predicate (e.g., rules relating what a customer buys as
well as the customer’s age.).

Mining Multilevel Association Rules

For many applications, it is difficult to find strong associations among data items at low or
primitive levels of abstraction due to the sparsity of data at those levels. Strong associations
discovered at high levels of abstraction may represent commonsense knowledge. Moreover, what
may represent common sense to one user may seem novel to another. Therefore, data mining
systems should provide capabilities for mining association rules at multiple levels of abstraction,
with sufficient flexibility for easy traversal among different abstraction spaces.

Let’s examine the following example.

Mining multilevel association rules. Suppose we are given the task-relevant set of transactional
data in Table 5.6 for sales in an All Electronics store, showing the items purchased for each
transaction.
FP- Growth algorithm:
 FP stands for frequent pattern
 It is an efficient & scalable method for mining the complete set of frequent
pattren using a tree structure.
 The tree structure is used for storing the information about fb called fp tree

Example:
The following table shows the transaction & the data item which
are selected in each transaction here the minimum support 30%

Transaction-ID Items
1 6,A,D,B
2 D,A,E,C,B
3 C,A,B,E
4 B,A,D
5 D
6 D,B
7 A,D,E
8 B,C

Solution:
Step-1: Find out the frequency and priority of each item using the
above data items.

Item set Frequency Priority


A 5 3
B 6 1
C 3 5
D 6 2
E 4 4

Order: B,D,A,E,C
Step-2: Order the items in the above table according to the
priority.

Transaction-ID Items Ordered items


1 E,A,D,B B,D,A,E
2 D,A,E,C,B B,D,A,E,C
3 C,A,B,E B,A,E,C
4 B,A,D B,D,A
5 D D
6 D,B B,D
7 A,D,E D,A,E
8 B,C B,C
Step-3: construct the fp tree using the above ordered items.
NULL

B D

D A
C
E

A E

B-1,2,3,4,5,6 E C

D-1,2,3,4
A-1,2,3
E-1,2
C-1
A-1
E-1
C-1
D-1,2
A-1
E-1
C-1

UNIT-3

CLASSIFICATION AND PRIDICTION:


Classification is constructed to predict categorized labels such as “safe (or) risk” for loan
application data, “yes or no” for marketing data, treatment A, treatment B & treatment C for
medical data.

Issues regarding classification and prediction:

1. Cleaning (Reduce noise & handle the missing values).

2. Relevant analysis (removing redundancy attributes)

3. Transformation & reduction

a) Normalization –scaling the data such that it falls which in the specified range

B) Generalization –This is used for continuous valued attributes

Example: - result -1st class, second class, third class fail.

Classification by decision tree induction:


A decision tree is a flow chart like tree structure, where each internal hodes(non-leaf node)
denotes a test on an attribute each branch represents an outcomes of the test and each leaf
no holds a class label.

(1) Suppose we have given with data types of the student with their average then we have
single node.

Example: Average

(2) Suppose if the avg of the student is above 80./. it means a data types belongs to single
class.
Example: Avg>80%

For, suppose if all the students do not have the avg above 80./. then the algorithm proceed
further.

(3) If attribute selection method, to identify the scripting criteria:


(i) Above 80%
(ii) Above 65% & less than 80%
(iii) Above 40% & less than 65%

Avg
Above80% >65% and <80% >40% and <65%

Tree Pruning:-
When a decision tree is built, many of the branches will reflect anomalies because of noise and
outliers in the training data. Such branches should be removed.

There are two types of pruning

1) Pre pruning

2) Post pruning

Pre pruning: In the preprinting approach, a tree is pruned by halting its construction early.

Post pruning: In this approach the sub trees are removed from a “fully grown “ tree.

A sub tree at a given node is pruned by removing its branch and replacing it with a leaf
The leaf is labeled with the most frequent class among the sub tree being replaced.
Bayesian’s Classification:
Baye’s Theorem:

Bayesian classifiers are statistical classifiers. They can predict class membership probability
such as the probability that a given touple belongs to a particular class . Bayesian’s
classification is based on Baye’s theorem.

2 Girls 3Girls

3Boys 4Boys

class A class B

 Class ‘A’ is choosen


P(G/A)=2/5
 Student is selected
(i) Should be ‘G’
(ii) Should be from class ‘A’

P(A^G)- P(A).P(G/A)

What is the probability of ‘Girl’ student from two classes

P(G)=P(A^G)+P(B G)

Baye’s theorem

P(A/G) = P(A) P(G/A)/P(G)

Where

p(G/A)= probability of evidence give the hypothesis


P(A) = The prior probability of the hypothesis.

P(G) = The prior probability of the evidence.

Navie Bayes Classification: Navie Bayes is one of the most efficient and effective Inductive
learning algorithms for machine learning and data mining.

In simple terms, a navie Bayes classifier assumes that presence of a particular feature in a
class is unrelated to the presence of any other feature.

Person id Name Gender Height Class


1 A F 1.6M Short
2 B M 2.0M Tall
3 C F 1.9M Medium
4 D F 1.85M Medium
5 E M 2.8M Tall
6 F M 1.7M Short
7 G M 1.8M Medium
8 H F 1.6M Short
9 I F 1.65M Short

Find the probability of class attributes:


P (short) = 4/9
P (medium) = 3/9
P (tall) = 2/9

Attributes Values Probability


Gender S M T S M T
MALE 1 1 2 ¼ 1/3 2/2
FEMALE 3 2 0 ¾ 2/3 0

Height 0-1.6 2 0 0 2/4 0 0


1.61-1.7 2 0 0 2/4 0 0
1.71-1.8 0 1 0 0 1/3 0
1.81-1.9 0 2 0 0 2/3 0
1.91-2.0 0 0 1 0 0 ½
2.1- 0 0 1 0 0 1/2
Using Bayesian classification and given data classify the tuple
T={J, M, 2.2M}

Step-1:- p (T/classTable)=p[M/c.l] * P(2.0- / CLASStABLE)

P(T/Short) = P (M/Short) * P (2.0- /Short) =1/4 * 0=0

P (T/Medium) = p(M/Medium)* P(2.0 - /Medium) = 1/3 * 0 = 0

P(T/Tall) = p (M/Tall) * P (2.0- /Tall) = 1*1/2 = 0.5

P(T/Short) =0

P(T/Medium) =0

P(T/TALL) =0.5

Step2:- Likehood for all class Labels

Short =P(T/Short) * P(Short) = 0* 4/9 =0

Tall = P(T/Tall) * P (Tall) =1/2 *2/9 =0.11

Step:- P(X/Y) = P(X) * P(Y/X)/P(Y)

P(Short/t) = P(t/Short) * p(Medium)/p(t) = 0*4/9/0.11=0

P(medium/t)=p(t/medium)x p(medium)/p(t) =0x 3/9 /0.9=0

P(Tall/t) =p(t/tall)x p(tall)/p(t) =1/2 x2/9 /0.11 =1

By comparing the tree class lables the highest value is 1 for which we got for tall, so ,the given
touple belongs to tall class 10 j m 2.2 tall.

Rule based classification:


Using If-Then rules for classification:

Rules are a good way of representing information or bits of knowledge. A rule based classifier
uses a set of If-Then riles for classification. An If-Then rule is an expression of the form

If Condition Then Condition


Rules of this classification:

 If Outlook = “sunny” and Humidity = “high”


Then Play=”No”
 If Outlook = “sunny” and Humidity = ”Normal”
Then Play=”Yes”
 If Outlook = “Overcast” Then Play = ”Yes”

 If Outlook = “Rainy” and windy = ”Strong”


Then Play=”No”
 If Outlook = “Rainy” and windy = ”Weak”
Then Play=”Yes”

Classification by back propagation:-


Back propagation is a neural network learning algorithm.

Neural network:-

A neural network is a set of connected input/output units which each connection has a weight
associated with it.

Multi-layer feed-forward Neural Network:-


The back propagation algorithm performs learning on a multi layer feed-forward neural network
. A multilayer feed forward neural network consists of an input layer

-One or more hidden layers


- An output layer
Example:

Initial I/o weight and Bias value

X1 X2 X3 W14 W1S W24 W25 W34 W35 W46


1 0 1 0.2 -0.3 0.4 0.1 -0.5 0.2 -0.3

I/P = IJ == ∑ Iwij 0i + фj/O/p=oj = 1/1+e-I

The Net I/P & O/P calculation:

Unit Net I/P , IJ O/P,OJ


4 (w14×1)+(W24×0)+(W34×1)+ 1/1+e+0.7=0.332
(-0.4)=0.7
5 (-0.3)+0+0.2+0.2=0.1 1/1+e-0.1=0.525
6 W46×0.332+W56×0.525+0.1=- 1/1+e0.105=0.474
0.105

 Errj = Oj(1-oj) (Tj-oj)→Normal Node

Calculate the error of each node:


6 0.474(1-0.474) (1-0.474) = o.1311

5 0.525(1-0.525) (0.1311) (-0.2) = -0.0065

4 0.332(1-0.332) (0.1311) (-0.3) = -0.0087

Updated Bias and weights

 Wij = Wij +(L)Errj oj


 Фj = фj + (L) Errj

Weight Updated Values=0.9


W46 (-0.3)+(0.9)(0.1311)(0.332)=-0.261
W56 (-0.2)+(0.9)(0.1311)(0.52)=-0138
W14 (0.2)+(0.9)(-0.0087)(1)=0.192
W15 (-0.3)+(0.9)(-0.0065)=-0.306
W24 (0.4)+(0.9)(-0.0087)=0.4
W25 (0.1)+(0.9)(-0.0065)=0.1
W34 (-0.5)+(0.9)(-0.0087)=-0.508
W35 (0.2)+(0.9)(-0.0065)=0.194
Ф4 (-0.4)+(0.9)(-0.0087)=-0.408
Ф5 =0.194
Ф6 =0.128

 Support Vector Machines (SVM)


 Support Vector Machines are based on the concept of decision planes that define decision
boundaries. A decision plane is one that separates between a set of objects having
different class memberships. A schematic example is shown in the illustration below. In
this example, the objects belong either to class GREEN or RED. The separating line
defines a boundary on the right side of which all objects are GREEN and to the left of
which all objects are RED. Any new object (white circle) falling to the right is labeled,
i.e., classified, as GREEN (or classified as RED should it fall to the left of the separating
line).
 The above is a classic example of a linear classifier, i.e., a classifier that separates a set of
objects into their respective groups (GREEN and RED in this case) with a line. Most
classification tasks, however, are not that simple, and often more complex structures are
needed in order to make an optimal separation, i.e., correctly classify new objects (test
cases) on the basis of the examples that are available (train cases). This situation is
depicted in the illustration below. Compared to the previous schematic, it is clear that a
full separation of the GREEN and RED objects would require a curve (which is more
complex than a line). Classification tasks based on drawing separating lines to distinguish
between objects of different class memberships are known as hyperplane classifiers.
Support Vector Machines are particularly suited to handle such tasks.


 The illustration below shows the basic idea behind Support Vector Machines. Here we
see the original objects (left side of the schematic) mapped, i.e., rearranged, using a set of
mathematical functions, known as kernels. The process of rearranging the objects is
known as mapping (transformation). Note that in this new setting, the mapped objects
(right side of the schematic) is linearly separable and, thus, instead of constructing the
complex curve (left schematic), all we have to do is to find an optimal line that can
separate the GREEN and the RED objects.

Prediction:

Following are the examples of cases where the data analysis task is Prediction −
Suppose the marketing manager needs to predict how much a given customer will spend during
a sale at his company. In this example we are bothered to predict a numeric value. Therefore the
data analysis task is an example of numeric prediction. In this case, a model or a predictor will
be constructed that predicts a continuous-valued-function or ordered value.
It involves a response variable Y and a single predictor variable X

Y=W0+W1X

Method of least squares: It estimates the best –fitting straight line.

101 101
W1=∑ (xi-x¯)(yi-y⁻)/∑(xi-x⁻)²;W0=Y⁻-w1X⁻
i=1

X Y
Mid Exam Final Exam
72 84
50 63
81 71
74 78
94 96
86 75
59 49
83 79
65 77
33 52
88 74
81 90

(1) Least square line= Y=W0+wix


(2) Regression coefficient =W0=Y⁻-W1x⁻
|D|
∑ (Xi-X¯)(Yi-Y¯)
I=1
(3) W1=
|D|
∑ (Xi-X¯) ²
I=1
Where X⁻ & Y⁻ are mean of X & Y
X⁻ = 866/12=72.16
Y⁻ = 888/12=74

(72-72.16)(84-73.5)+(50-72.16)(63-73.5)……………..
W1=
(72-72.16)²(50-72.16)²…………….

W1=1940 / 3286.96=0.59
W0=74-0.59(72.16) =31.4
Y=31.4+0.59(X)
Y =31.4+0.59(87)
Y =83

PLOT GRAPH:

Lazy learners:
Lazy learning refers to machine learning processes in which generalization of the training data is
delayed until a query is made to the system. This type of learning is also known as Instance-
based Learning. Lazy classifiers are very useful when working with large datasets that have a
few attributes.
Learning systems have computation occurring at two different times: training time and
consultation times.
Training time is the time before the consultation time. During training time, the system derives
inferences from training data to prepare for the consultation time.
Consultation time is the time between the moment when an object is presented to a system so
that the system can make an inference and the moment when the system finishes making the
inference.
In a lazy learning algorithm, most of the computation is done during consultation time.
Essentially, a lazy algorithm defers the processing of examples till it receives an explicit request
for information.
Instance-based learning, local regression, K-Nearest Neighbors (K-NN), and Lazy Bayesian
Rules are some examples of lazy learning.

Difference between lazy learning and eager learning:


Lazy learning and eager learning are very different methods. Here are some of the differences:
Lazy learning systems just store training data or conduct minor processing upon it. They wait
until test tuples are given to them.
Eager learning systems, on the other hand, take the training data and construct a classification
layer before receiving test data.
So while lazy learning systems have a low or non-existent training time and a high consultation
time, eager learning systems have a high training time and a low consultation time.
In eager learning, the system needs to commit to a single hypothesis that covers the entire
instance space, while in lazy learning, systems can make use of a richer hypothesis space
because it employs multiple local linear functions to form its implicit global approximation to the
target function.
Advantages of a lazy learning:

 It is very useful when not all the examples are priorly available, but need to be collected
online. In such a situation, a new example observed would only require an update to the
database.
 In lazy learning, collecting examples about an operating regime does not degrade the
modeling performance of other operating regimes. Essentially, lazy learning is not prone
to suffering from data interference.
 The problem-solving capabilities of a lazy learning algorithm increase with every newly
presented case.
 Lazy learning is easy to maintain because the learner will adapt automatically to changes
in the problem domain.

Other Classification Methods:


 In this section, we give a brief description of several other classification methods,
including genetic algorithms, rough set approach, and fuzzy set approaches. In general,
these methods are less commonly used for classification.
Genetic Algorithms
 Genetic algorithms attempt to incorporate ideas of natural evolution. In general, genetic
learning starts as follows. An initial population is created consisting of randomly
generated rules. Each rule can be represented by a string of bits. As a simple example,
suppose that samples in a given training set are described by two Boolean attributes, A1
and A2, and that there are two classes,C1 andC2. The rule “IF A1 ANDNOT A2
THENC2” can be encoded as the bit string “100,” where the two leftmost bits represent
attributes A1 and A2, respectively, and the rightmost bit represents the class. Similarly,
the rule “IF NOT A1 AND NOT A2 THENC1” can be encoded as “001.” If an attribute
has k values, where k > 2, then k bits may be used to encode the attribute’s values.
Classes can be encoded in a similar fashion.
Rough Set Approach
 Rough set theory can be used for classification to discover structural relationships within
imprecise or noisy data. It applies to discrete-valued attributes. Continuous-valued
attributes must therefore be discretized before its use. Rough set theory is based on the
establishment of equivalence classes within the given training data.
Fuzzy Set Approaches
 Rule-based systems for classification have the disadvantage that they involve sharp
cutoffs for continuous attributes. For example, consider the following rule for customer
credit application approval. The rule essentially says that applications for customers who
have had a job for two or more years and who have a high income (i.e., of at least
$50,000) are approved:

 IF (years employed _ 2) AND (income _ 50K) THEN credit = approved: (6.47)

 By Rule (6.47), a customer who has had a job for at least two years will receive credit if
her income is, say, $50,000, but not if it is $49,000. Such harsh thresholding may seem
unfair.
Page rank –Algorithm:

Page rank Algorithm is an algorithm used for web-pages by giving Rank to


the pages. Rank is decided based on the priority and importance and priority
of the page.
The following formula is used for calculating the Rank
PR(p)=(1-d)+d(PR(N1/n)+PR(N2/n)+………PR(Nn/n))
here
n=Total web pages linked to a
webpage
d=It is a constant value from 0 to 1
N=It is the series of web pages
Example for Page rank algorithm:

Iteration Iteration Iteration PageRan


A 0 1 2 k
A
B
B
C
D
C


D

Iteration -1:
AA
PR(A)=(1/4)/3=1/12
B
PR(B)=(1/4)/2+(1/4)/3=2.5/12
C
C B
D

D
PR(C)=(1/4)/2+(1/4)/2=4.5/12

B
A
C

D
PR(D)=(1/4)/3+(1/4)/3=4/12
A

B
C

D
A
Now iteration 2: B
PR(A) =(4.5)/12/3=1.5/12
C
D

PR(B)=(1/12)/2+(4.5/12)/3=2/12

A
B
C

D
PR(D)=(4.5)/12/3+(2.5)/12/3=4/12
B

C
D

 The web Page which have highest value that has highest Rank. In this
example web page ‘c’ has highest rank with 4.5/12

You might also like