0% found this document useful (0 votes)
14 views16 pages

Chapter 03

sentiment analysis part3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views16 pages

Chapter 03

sentiment analysis part3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

DATA MINING TECHNIQUES

Chapter 3
Data Mining Techniques
3. DM Techniques

3.1 Statistics
Statistics is a vital component of data selection, sampling, data mining, and knowledge
evaluation. In the data cleaning process, statistics offer the techniques to detect outliers to
simplify data when necessary, and to estimate noise, it deals with missing data using estimation
techniques [65], [66].

3.2 Classification and Prediction


One of the most useful data mining techniques for e-learning is classification. Classification
maps data into a predefined group of classes. Classification is a supervised learning approach

performance with high accuracy is more beneficial for identifying the low academic performance
of the students at the beginning.

Classification [67] is the processing of finding a set of models which describe and distinguish
data classes or concepts. The derived results may be represented in various forms, such as
classification (IF-THEN) rules, decision trees, or neural networks. Models then can be used for
predicting the class label of data objects. In many applications, there is a need to predict some
missing data values rather than class labels. e.g. case when the predicted values are numerical
data and is often specifically referred to as prediction. Classification is a data mining technique
and is a supervised learning technique where training data set is input for the classifier [87].

3.2.1 Decision Trees

Decision tree classifiers are used in data mining to generate trees after studying the training set
and formulate predictions [88].

Decision tree induction consists of Greedy strategy in which we split the records based on an
attribute test that consists of certain criteria. The conventional models used for classification are
decision trees, neural network, statistical and clustering techniques [89].In order to determine the

33 | P a g e
DATA MINING TECHNIQUES

best split of nodes, we have to measure node impurity. The nodes with a low degree of impurity
are preferred.

It is a tree-like structure, which starts with root attributes and ends with leaf nodes [90].
Decision-tree algorithms usually work top-down by choosing variable at each step that best splits
the set of items [91].

Information Gain is one of the metrics used by the ID3, J48 tree-generation algorithms. It is
based on the concept of entropy which is a common way to measure impurity and when entropy
is higher that means information content is more. Information Gain measure is used in a J48
algorithm to select the test attribute at each node in the tree. The attribute having largest
information gain is selected as the test attribute for the current node. If we have to choose one of
the attributes as root node then, we have to calculate Information Entropy for all the attributes
present in the educational dataset. The selection of the root node depends effectively on the
highest information gain for the attribute values present in the attributes. After decision tree is
built, the tree pruning phase is applied [87].

Decision tree recursively partitions the training set and there are leaf and non-leaf nodes. Each
non-leaf node of the tree contains a split point which is a test on one or more attributes and
determines how the data is partitioned. It is developed in two phases: A growth and prune phase.
In the first phase when the tree is growing, the goal at each node is to determine the split point
that best divides the training records belonging to that node [92].

A decision tree is constructed from a training set which consists of data tuples. Each tuple is
completely described by a set of attributes and a class label. Based on the attribute values of the
tuple, the path from the root to a leaf can be followed. The class of the leaf is the class predicted
by decision tree for that tuple [93].

3.2.1.1 Calculation of Information Gain

The parameter which best classifies data is Entropy (H). Entropy is a good measure of the
information carried by an ensemble of events. There are 2 states i.e. positive and negative
examples from set S. Entropy of set S is denoted by H(S).

34 | P a g e
DATA MINING TECHNIQUES

If S=sample of n training events and Pi is the probability of occurrence of an event, then entropy
is given by:

(3.1)
Such that:

In decision trees, for each attribute, we have to calculate the information gain. Information gain
is a statistical quantity measuring how well an attribute classifies the data and the attribute with
the highest information gain will be chosen for decision-making.

(3.2)

Information gain is the metric for how well one attribute Ai classifies the training data.
Information gain for a particular attribute gives the information about target function, given the
value of that attribute i.e. conditional entropy.

3.3 Clustering

Clustering groups the data, which is not predefined and it can identify dense and sparse regions
in object space. Unlike classification and prediction, which analyze class labeled data objects,
clustering analyses data objects without consulting a known class label. The class labels are not
present in the training data and clustering can be used to generate such labels. Clusters of objects
are formed so that objects within a cluster have high similarity in comparison to one another, but
are very dissimilar to objects in other clusters. Each cluster formed can be viewed as a class of
objects, from which rules can be derived [33]. Application of clustering in education can help in
Clustering is the unsupervised
classification of patterns into clusters [8].

3.3.1 K-Means Clustering

Using data clustering, we extract previously unknown and hidden pattern from large datasets
[94].K-means clustering algorithm uses Euclidean distance measure, where the distance is
computed by finding the square of the distance between each score, summing the squares and
finding the square root of the sum [95].

35 | P a g e
DATA MINING TECHNIQUES

3.3.1.1 Implementation of K-Means Data Mining Model

The K-means algorithm which is a cluster analysis algorithm used as a partitioning method, and
was developed by MacQueen in 1967[96].K-Means is for clustering analysis. The goal of this
algorithm is to minimize the total distance between the cluster and its corresponding centroid.
Cluster analysis, also called segmentation analysis, creates groups, or clusters of data. Clusters
are formed in such a way that objects in the same cluster are very similar and objects in different
clusters are very distinct. The measures of similarity depend on the application.

K-Means clustering is a partitioning method. The function k-means partitions data into k clusters
and returns the index of the cluster to which it has assigned each observation. K-means clustering
operates on actual observations and creates a single level of clusters. K-means clustering is more
suitable than hierarchical clustering for large amounts of data because it treats each observation
in data as an object having a location in space. It finds a partition in which objects within each
cluster are as close to each other as possible, and as far from objects in other clusters as possible.
Table 3.1 shows the basic differences between Partitioning and Hierarchical Clustering.

Each cluster in the partition is defined by its member objects and by its centroid, or center. The
centroid for each cluster is the point to which the sum of distances from all objects in that cluster
is minimized. K-Means computes cluster centroid differently for each distance measure, to
minimize the sum with respect to the measure that we specify.

Table 3.1 Difference between Partitioning and Hierarchical clustering

S. No. Partitioning Clustering Hierarchical Clustering

Partitioning clustering produces a Hierarchical Clustering can give different


1
single partitioning. partitioning.

Partitioning clustering needs the


2
number of clusters to be specified. number of clusters to be specified.

Partitioning clustering is usually


3 Hierarchical clustering can be slow.
more efficient.

Cluster membership is determined A hierarchical algorithm combines or


by calculating the centroid for each divides existing groups, creating a
4
group and assigning each object to hierarchical structure that reflects the order
the group with the closest centroid. in which groups are merged or divided.

36 | P a g e
DATA MINING TECHNIQUES

K-Means uses an iterative algorithm that minimizes the sum of distances from each object to its
cluster centroid, over all clusters. This algorithm moves objects between clusters until the sum
cannot be decreased further. The result is a set of clusters that are as compact and well-separated
as possible. The details of the minimization using several optional input parameters to K-Means,
including ones for the initial values of the cluster centroid, and for the maximum number of
iterations can be controlled [97].

3.3.1.2 Calculating K-Means

Kmeans uses a two-phase iterative algorithm to minimize the sum of point-to-centroid distances,
summed over all k clusters. The first phase uses batch updates, where each iteration consists of
reassigning points to their nearest cluster centroid, all at once, followed by recalculation of
cluster centroid but this phase usually does not converge to solution that is a local minimum,
whereas the second phase uses online updates, where points are individually reassigned which
reduces the sum of distances and cluster centroid is recomputed after each reassignment. Each
iteration during the second phase consists of one pass through all the points. The second phase
converges to a local minimum. Table 3.2 shows the steps for calculating K-Means.

Table 3.2 Steps for Calculating K-Means


Step1 K different clusters are selected
Each cluster is associated with a centroid (Centre point) .A centroid is typically the
Step2 mean of the points in the cluster.

Step3 Euclidean distance of each object to the centroid is determined.


(3.3)

Step4 Objects are grouped based on minimum distance.


Loop until all the objects are assigned to the closest centroid and recompute the
Step5
centroid of each cluster.
Step6 Stop if no further object clusters can be formed.

3.4 Association Rule Mining


Association rule mining is to find a set of binary variables that occur in the transaction database
repeatedly. Apriori measures are the association rule mining algorithm. Association analysis is
the discovery of association rules showing attribute-value conditions that occur frequently

37 | P a g e
DATA MINING TECHNIQUES

together in a given set of data. The association rule A=>B shows those database tuples that
satisfy the conditions in A as well as in B.

In data mining, association rule learning is a method for discovering interesting relations
between variables in large databases [98]. ARM task is to discover the hidden association
relationship between the different item sets in the transaction database [99]. An X Y type
association rule expresses a close correlation between items in a database [100]. Association
rules provide information in the form of if-then statements and they are probabilistic in nature.

In the ARM, if part is the antecedent and then part is the consequent. Association rules
analyze the antecedent and consequent for a set of items called item set that is disjoint having no
items in common and for examining each row in the database, a user has to set two threshold
values i.e. the first value is called the support for the row and the second is called the confidence
for the row. Support is simply the number of transactions that includes all the items in the
antecedents and consequent part of the row. It can sometimes be expressed as a percentage of the
total number of records in the database. In ARM [101], rules are selected only if they satisfy both
a minimum support and a minimum confidence threshold.

Support of the rule A=>B is shown in Equation .3.4

support (A=>B[s, c]) = p(AUB) = support ({A, B}) (3.4)

s,c represents support and confidence. Equation 3.4 denotes the frequency of the rule within all
transactions in the database i.e. the probability that a transaction contains both A and B and
Equation. 3.5 denotes the percentage of transactions containing A which also contains B i.e. the
probability that a transaction containing A also contains B [102]. The confidence of the rule
A=>B is shown in Equation 3.5.

confidence (A=>B[s, c]) = p (B|A) = p (AUB)/p (A)


= support ({AUB})/support ({A}) (3.5)

3.4.1 Mining using APRIORI Algorithm

Apriori algorithm is the best-known algorithm to mine association rules. Apriori Algorithm is
having a property which states that all nonempty subsets of a frequent itemset must also be
frequent and an item set is any subset of all the items in the database. Table 3.3 shows the steps
followed by the Apriori algorithm.

38 | P a g e
DATA MINING TECHNIQUES

Apriori algorithm traverses the item set one level at a time, from frequent 1-itemsets to the
maximum size of frequent itemsets. Secondly, new candidate itemsets are generated from the
frequent itemsets found in the previous iterations and support of each is then counted and
checked against the minimum threshold values [103].

Table 3.3 Steps Followed by APRIORI Algorithm


Step 1 Initially make a single pass over the dataset to determine the support of each item.
Step 2 Find all frequent 1-itemsets.
Step 3 Iteratively generate a new candidate set for 2-itemset, 3 item set and so on from
the frequent item sets found in the previous iterations.
Step 4 Repeat step 2 for finding all frequent 2-itemsets, 3 item sets.
Step 5 Support for each candidate is then counted and checked against the minsup
threshold value given.
Step 6 Algorithm eliminates or prunes all candidate itemsets whose support count is less
than minsup threshold value given.
Step 7 Finally, algorithm terminates when there are no new frequent itemsets generated.

3.5 Neural Networks

Artificial Neural Networks consists of training and learning in supervised learning where on
giving some input, there is some expected output provided a certain pattern of weights whereas
in unsupervised learning it does not require a step-by-step method to achieve the desired
outcome instead the network is directed to do the task. As the training process continues, the
weights start converging and move on to gain knowledge [104].

Based on neural networks, there are two approaches to educational data mining. The first
approach is based on a self-organizing map (SOM) which is a type of ANN (Artificial neural
network) that is trained using unsupervised learning to produce low-dimensional views of high-
dimensional data. Using this approach, in the educational sector, the students can be clustered
based on certain attributes into natural classes so that similar classes are grouped together. The
second approach uses pattern recognition through a two-layer feed-forward network to classify
inputs into a set of target categories. Neural Networks is basically a group of interconnected
neurons that uses computational or mathematical models to process information. Neural Network
usually learns by examples. It consists of three layers i.e. input, output and hidden. Each node

39 | P a g e
DATA MINING TECHNIQUES

from input layer is connected to a node from hidden layer and every node from the hidden layer
is connected to a node in output layer and usually, some weights are associated with every
connection [105].

3.5.1 Clustering and Pattern Recognition Using Neural Network

Data Pre-processing can be simple transformations performed on single variables. SOM


algorithm uses the Euclidean metric to measure distances between vectors [106].ANN consists of
training and learning phases. In the training phase, with some input there is predictable output,
provided a certain pattern of weights. Here, Neural Network does not require a step-by-step
procedure to perform the desired task instead the network can be taught to do the task. As the
training process proceeds, the weights will converge to values and perform some useful
computations knowing nothing initially and moves on to gain knowledge. This phase is known
as unsupervised learning phase [104].

3.5.1.1 Self-Organizing Map (SOM)

A self-organizing map consists of neurons and each neuron is associated with a weight vector of
the same dimension as the input data vectors. SOM is an unsupervised neural network algorithm
and projects high-dimensional data onto a two-dimensional map. In this similar data items are
mapped to nearby locations.

Input Dataset

Load Data

Self-Organizing Map

Train Network

Iterations
performed

Clustering

Figure 3.1 NN based SOM Clustering

40 | P a g e
DATA MINING TECHNIQUES

In SOM, winner neurons are computed and updated from the random input selected. The
iterations are performed for all the input data. The methodology used for clustering is shown in
Figure 3.1 where data is fed into the network and trained to perform clustering.

The weight vectors which are assigned shift towards the winner neuron among all the input
vectors or neighbor of a winner and on the basis of it, classification of data is there.

A two-layer feedforward network is used for pattern recognition which transforms sets of input
signals into a set of output signals. This type of learning is unsupervised and there is a good
internal representation of the input [107].

SOM Layer

Input Output
W +
6 100

10x10

Figure 3.2 Self-Organizing Map Network


Figure 3.2 shows the process of self-organizing map clustering which consists of a competitive
layer that classifies dataset with any number of dimensions into as many classes as the layer has
neurons. After collection of data, there is pre-processing of data in which data is cleaned and
transformed into an appropriate format to be mined.

3.5.1.2 Pattern Identification

Input Dataset Target Class

Load Data

Pattern Recognition

Train Network

Classification

Figure 3.3 NN based Pattern Recognition

41 | P a g e
DATA MINING TECHNIQUES

In pattern recognition using a neural network, with input classification, target classes are taken.
The methodology for pattern recognition is displayed in Figure 3.3 where a sample of desired
target class shown in Table 3.4 is fed into the network to classify data. In pattern recognition,
neural network classifies inputs into a set of target categories. Neural pattern recognition helps to

select data, create and train a network, and evaluate its performance using cross-entropy and
confusion matrices.
Table 3.4 Desired Target Class

1 0 1 1 1 1 1 1 1 1 ---- n
1 0 1 0 1 1 1 1 0 0 ---- n
1 1 1 1 1 1 1 1 1 1 ---- n

The Neural Network architecture for pattern recognition is shown in Figure.3.4 where data
selection and training over a network is done. After training, the performance evaluation is
represented in confusion matrices. In pattern recognition using Neural Network, sample input
dataset is presented to the network and target class is shown in Table 3.4 which is the desired
network output.

The input dataset is classified according to the target class and network training is done to assign
the inputs according to the targets. The sampling of data is done which consists of training,
validation, and testing. In training, the network adjustment according to its error is done whereas
in validation network, observation is done and training of network halts when observation stops
further improvement. In the testing phase, the network staging during and after training is
checked.

Hidden Layer Output Layer

W W
Input + Output
+
b b
3 3
10 3

Figure 3.4 Neural Network Architecture for the Pattern Recognition

42 | P a g e
DATA MINING TECHNIQUES

Figure 3.4 shows a two-layer feed-forward network, with sigmoid hidden and output neurons
which classify vectors arbitrarily well given enough neurons in its hidden layer containing
neurons and output layer which does classification.

The network architecture showed in Figure 3.4 defines neural network architecture for pattern
recognition, setting a number of hidden neurons and shows the Mean Squared Error. Here, minor
values are improved and zero means no error. On the other hand Percent error indicates the
fragment of an instance which is misclassified. A value of zero means no misclassifications, 100
indicates maximum misclassifications. Another measure of neural network performance is the
confusion plot. The confusion matrix shows the percentages of correct and incorrect
categorization. Correct classifications are shown in green squares on the matrices diagonal and
incorrect categorization form the red squares. If the network has learned to classify properly, the
percentages in the red squares should be very small, indicating few misclassifications. The steps
followed in Self-Organizing Map clustering are shown in Table 3.5.

Table 3.5 SOM Clustering Steps


Step1 Random input is selected.
Step2 Winner Neuron is computed.
Step3 Neurons are updated.
Step4 Repeat for all input data.
Step 5 Each weight vector then moves to the average position of all the input vectors
for which it is a winner or for which it is in the neighborhood of a winner.
Step 6 Finally, the classification of input data is there.

Table 3.6 Command-Line Functions


Command-line Function Description
plotsomtop(net) Plots the self-organizing map topology.
plotsomd(net) Plots self-organizing map neighbor distance and shows how
close each neuron's weight vector is to its neighbors.
Plots self-organizing sample hits. Each neuron shows the
plotsomhits(net, inputs)
number of input vectors that it classifies.
Plots self-organizing map weight planes and shows a weight
plotsomplanes(net)
plane for each of the input features.
plotsomnc(net) Plots self-organizing map neighbor connections.

43 | P a g e
DATA MINING TECHNIQUES

The distances between neurons are calculated from their positions with a distance function.
Using dist function, distance from a particular neuron to its neighbors is calculated. The dist
function calculates the Euclidean distance from a home neuron to any other neuron. There are
some command-line functions shown in Table 3.6 that can be used to analyze the resulting
clusters once the network has been trained.

The network is trained and the inputs are classified according to the targets. On training multiple
times generates different results due to different initial conditions and sampling. After training, it
shows results in the form of cross-entropy and percent error. Minimizing cross-entropy results in
good classification. The lower values are better whereas zero means no error. Percent error
indicates the fraction of samples which are misclassified. A value of 0 means no
misclassifications, 100 indicates maximum misclassifications. Another measure of how well the
neural network has fit the data is the confusion plot.

3.6 Support Vector Machines

SVM is one of the data mining techniques. It is a supervised learning method used for both
regression and classification. SVM is one of the most popular classification algorithms for
handling high dimensional data [108]. SVM classifier [27] is known for maximum accuracy and
Minimum Root Mean Square error (RMSE). In the educational sector, SVM classifiers can be
used for the prediction of placement of students, as in many cases, students focus only on their
regular curriculum of studies besides on other educational trends which are necessary for the
overall development of students and their placements. SVM is a discriminant classifier which is
defined by a separating Hyperplane. SVM constructs a Hyperplane, which is used for
classification and regression [109]. It finds nearest data vectors called support vectors (SV), to
the decision confinement in the training set and it segregates a given new test vector by using
only these nearest data vectors [110]. Steps followed in SVM have been described in Table 3.7.

Hyperplane has been defined in Equation 3.6:


(3.6)

which are closer to the Hyperplane and they are known as support vectors.

44 | P a g e
DATA MINING TECHNIQUES

Support vector machine operator consists of various kernel types including dot, radial,
polynomial, neural, anova etc. Kernel shows the vector similarity in training dataset samples
[111]. The functions of kernels are summarized in Table 3.8. SVM collects input data and
predicts the two probable classes which embrace the input, making the SVM a non-probabilistic
binary linear classifier.

Table 3.7 Steps Followed for SVM


Let say there are 2 Classes C1 and C2. Now unknown feature vertex x either belongs
Step1
to class C1 or class C2.
Define Linear Discriminant Function
Step 2 g(x) = wT
-dimensional space or vector.
In a 2-d space, if a feature vector is a 2-D vector, then this linear equation represents
Step 3
a straight line i.e. wT(x)+b=0
If the input feature vector is 3-D feature vector then this linear equation if but equal
Step 4
to zero in 2-D dimension otherwise it becomes plane in 3-D.
If the dimensionality of a feature vector is more than 3 then it becomes Hyperplane.
Step 5 the

Classification Rules: For every feature vector x, compute linear function. If x lies
on positive side of Hyperplane, then g(x1)=wT(x1 )+b
wT(x1 ) + b > 0 (3.7)
Step 6 if this x1 belongs to ve side of Hyperplane, then
wT(x1 ) + b < 0 (3.8)
if this x1 lies on the Hyperplane, then
wT(x1 ) + b = 0 (3.9)
SVM classifies data by finding the Hyperplane which separates all data points of one
Step 7 class from those of the other class. The Hyperplane which is considered best for an
SVM means the one which is having the largest margin between the two classes.

In support vector machines, classes are categorized as follows:

1. Multiclass Classification

Multiclass classification makes the inference that each sample is appointed to one label only.

2. Multi-Label Classification

In this, each sample is assigned target labels and several classification tasks are there.

45 | P a g e
DATA MINING TECHNIQUES

Table 3.8 Kernels in SVM

Kernel Function

dot The dot kernel shows the inner product of x and y vectors.

In this kernel, the gamma value is taken as the performance metric. It is


radial
defined by exp (-g ||x-y||^2) where g is the gamma.

Polynomial kernel not only considers the input attributes to find the similarity
polynomial but also their combinations. The polynomial kernel is defined by k(x, y)
=(x*y+1) ^d where d is the degree of polynomial and k is the kernel degree.

The neural network consists of input, hidden and output layers. It is defined
neural by a two-layered neural net tanh (a(x*y) + b), where a and b are kernel
parameters.

The anova kernel is used for analyzing the variations among the different
anova variables and dependencies on their subsets. Here Kernel gamma and kernel
degree parameters are adjusted.

The multiquadric kernel is defined as the square root of ||x-y||^2 + c^2. It


multiquadric consists of kernel sigma and kernel sigma shift parameters which are applied
for Gaussian or multiquadric combinations.

3.7 Bayes Classifier

Bayesian Classification is a supervised learning method as well as a statistical method for


classification. Bayes theorem is used in decision making and it uses the knowledge of prior
events to predict future events. Bayesian networks are evaluated for representing and detecting
students learning styles in a web-based educational system [112]. Bayes theorem states that if the
probability of an event A conditional on event B is to be obtained, then calculate the probability
of both A and B together and divide it by the probability of B. This is stated as follows: Pr (B |
A) = Pr (A and B)/Pr (A) Where P (B|A) is the Conditional probability denoting the probability
of B given that A has already occurred [113].

P (A|B) = P (B|A) X P (A)/P (B) (3.10)

Equation.3.10 states that the probability of A given B equals the probability of B given A times
the probability of A, divided by the probability of B. In Equation.3.10, A is the hypothesis to be
tested and B is the evidence associated with A.

46 | P a g e
DATA MINING TECHNIQUES

3.8 K-Nearest Neighbor

3.8.1 Classifying Data Using K-Nearest Neighbor Euclidean Distance Measure

Neighbour based learning consists of supervised and unsupervised learning methods.


Unsupervised learning consists of clustering i.e. spectral whereas supervised neighbor based
learning consists of classification and regression for data. In nearest neighbor methods, there are
predefined numbers of training samples which are closest in distance to the new point and
prediction of labels is done.

In K-nearest neighbor learning, the no. of samples taken is user-defined constant and Euclidean
Distance, Minkowski, Chebychev are the metric measures. The Nearest Neighbor classification
[114] is an instance-based clas t attempt to construct a general internal model
rather stores instances of the training data. KNN is considered as Lazy learning algorithm [115].
Classification is done from the majority of the nearest neighbors of each point. Here the data
class is assigned for the most representatives within the nearest neighbors of the point.KNN uses
uniform weights and the value assigned is computed from the majority of nearest neighbors. The
training dataset consists of a set of vectors and the class label is associated with each vector.
There are positive and negative classes. K-nearest neighbors use the local neighbors to do
prediction. The distance functions are used to compare the examples similarity [116]. In this
technique, a new observation is placed in the class of the observation from the learning set that is
closest to the new observation [117].

The distance measure calculation using Minkowski and Chebychev are shown in Equation 3.11
and 3.12 whereas Euclidean distance used for distance measure calculation is shown using
Equation 3.13.

Classification using KNN for the nearest neighbo logical


has been performed over the educational dataset in the
results section of K-Nearest Neighbor technique in Chapter 6. The 10 closest neighbors in x are
discovered to each point in y
with a p value of 5. In the case of Minkowski distance, exponent value for p is 2 and for the
different exponent, the 'p' argument is used. For Chebychev distance metric, the maximum
coordinate difference is used.

47 | P a g e
DATA MINING TECHNIQUES

Equation 3.11 shows a Minkowski distance metric whereas Equation 3.12

Minkowski metric. The Euclidean distance in equation 3.13 is a special case of the Minkowski
metric, where p = 2.

d(x,y,p) = (3.11)

d(x,y,p) = (3.12)

d(x,y) = d(y,x)
= (3.13)

3.9 Conclusion

In this chapter, different types of classification and clustering techniques are discussed. The
classification techniques such as Decision Trees, Linear Regression, Neural Networks, Support
Vector Machines, Naïve Bayes, and Statistics etc make use of mathematical techniques and with
the help of these techniques, data items are classified into groups. The clustering techniques like
K-Means, Self-organizing maps etc make a meaningful or useful cluster of objects having similar
characteristics.

Apart from it, another data mining technique discussed is Association Rule Mining. Association
Rule Mining technique is used to find frequent patterns, correlations, associations from datasets
in the form of frequent if/then patterns.

48 | P a g e

You might also like