0% found this document useful (0 votes)
78 views15 pages

Business Data Mining Week 11

Business Data Mining

Uploaded by

pm6566
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views15 pages

Business Data Mining Week 11

Business Data Mining

Uploaded by

pm6566
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Week 11 - LAQ's

Describe how the CART Algorithm, Artificial Neural Networks,


and their associated elements such as the Model of an Artificial
Neuron and Learning Process can be effectively applied to
business data mining. Illustrate your answer with real-life
examples.
------------------------------------------------------------------------------------------------------------
CART( Classification And Regression Trees) is a variation of the decision tree algorithm.
It can handle both classification and regression tasks. Scikit-Learn uses the Classification
And Regression Tree (CART) algorithm to train Decision Trees (also called “growing”
trees). CART was first produced by Leo Breiman, Jerome Friedman, Richard Olshen, and
Charles Stone in 1984.

CART(Classification And Regression Tree) for Decision Tree


CART is a predictive algorithm used in Machine learning and it explains how the target
variable’s values can be predicted based on other matters. It is a decision tree where each
fork is split into a predictor variable and each node has a prediction for the target variable
at the end.
The term CART serves as a generic term for the following categories of decision trees:
 Classification Trees: The tree is used to determine which “class” the target variable
is most likely to fall into when it is continuous.
 Regression trees: These are used to predict a continuous variable’s value.
In the decision tree, nodes are split into sub-nodes based on a threshold value of an attribute.
The root node is taken as the training set and is split into two by considering the best attribute
and threshold value. Further, the subsets are also split using the same lo gic. This continues
till the last pure sub-set is found in the tree or the maximum number of leaves possible in
that growing tree.

CART Algorithm
Classification and Regression Trees (CART) is a decision tree algorithm that is used for both
classification and regression tasks. It is a supervised learning algorithm that learns from
labelled data to predict unseen data.
 Tree structure: CART builds a tree-like structure consisting of nodes and branches. The
nodes represent different decision points, and the branches represent the possible
outcomes of those decisions. The leaf nodes in the tree contain a predicted class label or
value for the target variable.
 Splitting criteria: CART uses a greedy approach to split the data at each node. It
evaluates all possible splits and selects the one that best reduces the impurity of the
resulting subsets. For classification tasks, CART uses Gini impurity as the splitting
criterion. The lower the Gini impurity, the more pure the subset is. For regression tasks,
CART uses residual reduction as the splitting criterion. The lower the residual reduction,
the better the fit of the model to the data.

 Pruning: To prevent overfitting of the data, pruning is a technique used to remove the
nodes that contribute little to the model accuracy. Cost complexity pruning and
information gain pruning are two popular pruning techniques. Cost complexity pruning
involves calculating the cost of each node and removing nodes that have a negative cost.
Information gain pruning involves calculating the information gain of each node and
removing nodes that have a low information gain.

How does CART algorithm works?


The CART algorithm works via the following process:
 The best-split point of each input is obtained.
 Based on the best-split points of each input in Step 1, the new “best” split point is
identified.

 Split the chosen input according to the “best” split point.


 Continue splitting until a stopping rule is satisfied or no further desirable splitting is

available.

CART algorithm uses Gini Impurity to split the dataset into a decision tree .It does that by
searching for the best homogeneity for the sub nodes, with the help of the Gini index criterion.
Gini index/Gini impurity
The Gini index is a metric for the classification tasks in CART. It stores the sum of squared
probabilities of each class. It computes the degree of probability of a specific variable that is
wrongly being classified when chosen randomly and a variation of the Gini coefficient. It
works on categorical variables, provides outcomes either “successful” or “failure” and hence
conducts binary splitting only.
The degree of the Gini index varies from 0 to 1,
 Where 0 depicts that all the elements are allied to a certain class, or only one class exists
there.

 The Gini index of value 1 signifies that all the elements are randomly distributed across
various classes, and

 A value of 0.5 denotes the elements are uniformly distributed into some classes.
Mathematically, we can write Gini Impurity as follows:

where is the probability of an object being classified to a particular class.

CART for Classification


A classification tree is an algorithm where the target variable is categorical. The algorithm is
then used to identify the “Class” within which the target variable is most likely to fall.
Classification trees are used when the dataset needs to be split into classes that belong to the
response variable(like yes or no)
For classification in decision tree learning algorithm that creates a tree-like structure to
predict class labels. The tree consists of nodes, which represent different decision points, and
branches, which represent the possible result of those decisions. Predicted class labels are
present at each leaf node of the tree.

How Does CART for Classification Work?


CART for classification works by recursively splitting the training data into smaller and
smaller subsets based on certain criteria. The goal is to split the data in a way that minimizes
the impurity within each subset. Impurity is a measure of how mixed up the data is in a
particular subset. For classification tasks, CART uses Gini impurity
 Gini Impurity- Gini impurity measures the probability of misclassifying a random
instance from a subset labeled according to the majority class. Lower Gini impurity
means more purity of the subset.
 Splitting Criteria- The CART algorithm evaluates all potential splits at every node and
chooses the one that best decreases the Gini impurity of the resultant subsets. This process
continues until a stopping criterion is reached, like a maximum tree depth or a minimum
number of instances in a leaf node.

CART for Regression


A Regression tree is an algorithm where the target variable is continuous and the tree is used
to predict its value. Regression trees are used when the response variable is continuous. For
example, if the response variable is the temperature of the day.
CART for regression is a decision tree learning method that creates a tree-like structure to
predict continuous target variables. The tree consists of nodes that represent different
decision points and branches that represent the possible outcomes of those decisions.
Predicted values for the target variable are stored in each leaf node of the tree.

How Does CART works for Regression?


Regression CART works by splitting the training data recursively into smaller subsets based
on specific criteria. The objective is to split the data in a way that minimizes the residual
reduction in each subset.
 Residual Reduction- Residual reduction is a measure of how much the average squared
difference between the predicted values and the actual values for the target variable is
reduced by splitting the subset. The lower the residual reduction, the better the model fits
the data.
 Splitting Criteria- CART evaluates every possible split at each node and selects the one
that results in the greatest reduction of residual error in the resulting subsets. This process
is repeated until a stopping criterion is met, such as reaching the maximum tree depth or
having too few instances in a leaf node.

Pseudo-code of the CART algorithm


CART model representation
CART models are formed by picking input variables and evaluating split points on those
variables until an appropriate tree is produced.
Steps to create a Decision Tree using the CART algorithm:
 Greedy algorithm: In this The input space is divided using the Greedy method which is
known as a recursive binary spitting. This is a numerical method within which all of the
values are aligned and several other split points are tried and assessed using a cost
function.
 Stopping Criterion: As it works its way down the tree with the training data, the recursive
binary splitting method described above must know when to stop splitting. The most
frequent halting method is to utilize a minimum amount of training data allocated to every
leaf node. If the count is smaller than the specified threshold, the split is rejected and also
the node is considered the last leaf node.
 Tree pruning: Decision tree’s complexity is defined as the number of splits in the tree.
Trees with fewer branches are recommended as they are simple to grasp and less prone
to cluster the data. Working through each leaf node in the tree and evaluating the effect
of deleting it using a hold-out test set is the quickest and simplest pruning approach.
 Data preparation for the CART: No special data preparation is required for the CART
algorithm.
Decision Tree CART Implementations
Here is the code implements the CART algorithm for classifying fruits based on their color
and size. It first encodes the categorical data using a LabelEncoder and then trains a CART
classifier on the encoded data. Finally, it predicts the fruit type for a new instance and decodes
the result back to its original categorical value.

fromsklearn.tree importDecisionTreeClassifier
fromsklearn.preprocessing importLabelEncoder

# Define the features and target variable

features =[
["red", "large"],
["green", "small"],
["red", "small"],

["yellow", "large"],
["green", "large"],
["orange", "large"],
]

target_variable =["apple", "lime", "strawberry", "banana", "grape",


"orange"]

# Flatten the features list for encoding


flattened_features =[item forsublist infeatures foritem insublist]

# Use a single LabelEncoder for all features and target variable


le =LabelEncoder()

le.fit(flattened_features +target_variable)

# Encode features and target variable


encoded_features =[le.transform(item) foritem infeatures]
encoded_target =le.transform(target_variable)

# Create a CART classifier


clf =DecisionTreeClassifier()

# Train the classifier on the training set


clf.fit(encoded_features, encoded_target)

# Predict the fruit type for a new instance


new_instance =["red", "large"]

encoded_new_instance =le.transform(new_instance)
predicted_fruit_type =clf.predict([encoded_new_instance])
decoded_predicted_fruit_type =le.inverse_transform(predicted_fruit_type)
print("Predicted fruit type:", decoded_predicted_fruit_type[0])

Output:
Predicted fruit type: apple

Advantages of CART
 Results are simplistic.
 Classification and regression trees are Nonparametric and Nonlinear.
 Classification and regression trees implicitly perform feature selection.
 Outliers have no meaningful effect on CART.
 It requires minimal supervision and produces easy-to-understand models.

Limitations of CART
 Overfitting.
 High Variance.
 low bias.
 the tree structure may be unstable.

Applications of the CART algorithm


 For quick Data insights.
 In Blood Donors Classification.
 For environmental and ecological data.
 In the financial sectors.

CART Algorithm in Business Data Mining


Overview
The Classification and Regression Trees (CART) algorithm is a powerful decision tree
technique used for both classification and regression tasks. CART builds binary trees by
splitting the data at each node according to a feature that results in the best split, typically
evaluated using metrics like Gini impurity for classification or mean squared error for
regression.

Applications in Business:
1. Customer Segmentation: By classifying customers into distinct groups based on
behaviors and demographics, businesses can tailor marketing strategies effectively.
2. Credit Risk Assessment: Financial institutions use CART to classify loan applicants into
risk categories, aiding in decision-making for loan approvals.
3. Churn Prediction: Companies predict which customers are likely to leave, allowing
proactive retention efforts.

Advantages:
 Interpretability: Decision trees are easy to understand and interpret, making it simpler to
explain decisions to stakeholders.
 Versatility: Can handle both categorical and numerical data.

Limitations:
 Overfitting: Trees can become overly complex and fit the noise in the training data.
Pruning and cross-validation are necessary to mitigate this.

 Bias: Trees can be biased if the attributes have different numbers of values.
******************

Artificial Neural Networks

As all of us are aware that how technology is growing day-by-day and a Large amount of data
is produced every second, analyzing data is going to be very important because it helps us in
fraud detection, identifying spam e-mail, etc. So Data Mining comes into existence to help us
find hidden patterns, discover knowledge from large datasets. The way human brain processes
information is how Artificial Neural Networks (ANN) bases its assimilation of data. The brain
has neurons process information in the form of electric signals.
In the same way, ANN receives input of information through several processors that operate in
parallel and are arranged in tiers. The raw data is received by the first tier, which is processed
through interconnected nodes, having their own rules and packages of knowledge.
The processor passes it on to the next tier as output. All such successive tier of processors
receive the output from its predecessor; therefore, raw data isn’t processed every time. The
Neural Networks modify themselves as they are self-learning after processing additional
information. Each link between nodes is associated with weights.
A preference is put on the input stream with higher weight. The higher the weight of the unit,
the more influence it has on another. It helps in reducing predictable errors, and it is done
through a gradient descent algorithm.

Neural Network:
Neural Network is an information processing paradigm that is inspired by the human nervous
system. As in the Human Nervous system, we have Biological neurons in the same way in
Neural networks we have Artificial Neurons which is a Mathematical Function that originates
from biological neurons. The human brain is estimated to have around 10 billion neurons
each connected on average to 10,000 other neurons. Each neuron receives signals through
synapses that control the effects of the signal on the neuron.

How Artificial Neural Network Work?


Let us Suppose that there are n input like X1,X2,…,Xn to a neuron.
=> The weight connecting n number of inputs to a neuron are represented by
[W]=[W1,W2,..,Wn].

=> The Function of summing junction of an artificial neuron is to collect the weighted inputs
and sum them up.

Yin=[X1*W1+X2*W2+….+Xn*Wn]

=> The output of summing junction may sometimes become equal to zero and to prevent such
a situation, a bias of fixed value Bo is added to it.

Yin =[X1*W1+X2*W2+….+Xn*Wn] + Bo

// Yin then move toward the Activation Function.

=> The output Y of a neuron largely depends on its Activation Function (also known as
transfer function).

=> There are different types of Activation Function are in use, Such as
1. Identity Function
2. Binary Step Function With Threshold
3. Bipolar Step Function With Threshold
4. Binary Sigmoid Function
5. Bipolar Sigmoid Function

Neural Network Architecture:


While there are numerous different neural network architectures that have been created by
researchers, the most successful applications in data mining neural networks have been
multilayer feedforward networks. These are networks in which there is an input layer
consisting of nodes that simply accept the input values and successive layers of nodes that
are neurons as depicted in the above figure of Artificial Neuron. The outputs of neurons in a
layer are inputs to neurons in the next layer. The last layer is called the output layer. Layers
between the input and output layers are known as hidden layers.
As you know that we have two types of Supervised learning one is Regression and another
one is classification. So in the Regression type problem neural network is used to predict a
numerical quantity there is one neuron in the output layer and its output is the prediction.
While on another hand in the classification type problem the output layer has as many nodes
as the number of classes and the output layer node with the largest output values gives the
network’s estimate of the class for a given input. In the special case of two classes, it is
common to have just one node in the output layer, the classification between the two classes
being made by applying a cut-off to the output value at the node.

Why use Neural Network Method in Data Mining?


Neural networks help in mining large amounts of data in various sectors such as retail,
banking (Fraud detection), bioinformatics(genome sequencing), etc. Finding useful
information for large data which is hidden is very challenging and very necessary also. Data
Mining uses Neural networks to harvest information from large datasets from data
warehousing organizations. Which helps the user in decision making.
Some of the Applications of Neural Network In Data Mining are given below:
 Fraud Detection: As we know that fraudsters have been exploiting businesses, banks for
their own financial gain for many past years, and the problem is going to increase in
today’s modern world because of the advancement of technology, which makes fraud
relatively easy to commit but on the other hand technology also helps is fraud detection
and in this neural network help us a lot in detecting fraud.

 Healthcare: In healthcare, Neural Network helps us in Diagnosing diseases, as we know


that there are many diseases and there are large datasets having records of these diseases.
With neural networks and these records, we diagnosed these diseases in the early stage
as soon as possible.

Different Neural Network Method in Data Mining


Neural Network Method is used For Classification, Clustering, Feature mining, prediction,
and pattern recognition. McCulloch-Pitts model is considered to be the first neural network
and the Hebbian learning rule is one of the earliest and simplest learning rules for the neural
network. The neural network model can be broadly divided into the following three types:
 Feed-Forward Neural Networks: In Feed-Forward Network, if the output values cannot
be traced back to the input values and if for every input node, an output node is calculated,
then there is a forward flow of information and no feedback between the layers. In simple
words, the information moves in only one direction (forward) from the input nodes,
through the hidden nodes (if any), and to the output nodes. Such a type of network is
known as a feedforward network.
 Feedback Neural Network: Signals can travel in both directions in a feedback network.
Feedback neural networks are very powerful and can become very complex. feedback
networks are dynamic. The “states” in such a network are constantly changing until an
equilibrium point is reached. They stay at equilibrium until the input changes and a new
equilibrium needs to be found. Feedback neural network architectures are also known as
interactive or recurrent. Feedback loops are allowed in such networks. They are used for
content addressable memory.

 Self Organization Neural Network: Self Organizing Neural Network (SONN) is a type
of artificial neural network but is trained using competitive learning rather than error-
correction learning (e.g., backpropagation with gradient descent) used by other artificial
neural networks. A Self Organizing Neural Network (SONN) is an unsupervised learning
model in Artificial Neural Network termed as Self-Organizing Feature Maps or Kohonen
Maps. It is used to produce a low-dimensional (typically two-dimensional) representation
of a higher-dimensional data set while preserving the topological structure of the data.

Use of Artificial Neural Networks in Business


Companies now understand that data that they possess can help them provide information
when it comes to decision making. Businesses are leveraging neural networks to utilize the
benefits of data streams.

ANNs have the ability to learn and model non-linear relationships. Unlike other prediction
techniques, it doesn’t impose restrictions on input variables.

Here’s how industries and organizations apply neural networks to gain an advantage:

1. Forecasting of Data
Traditionally forecasting models have limitations to data, and such problems are complex.
If ANN is applied correctly, ANNs forecasts without such limitations, as its modeling ability
is able to define relationships and extract unseen features.

2. Character – Image Recognition


Since ANN can take a multitude of inputs and can process them in complex non-linear
relationships, this makes them ideally positioned for character recognition, such as
handwriting. This can, in turn, be used as a fraud detector. The same goes for image
recognition – for facial recognition on social media, cancer detection in the field of
healthcare, and satellite imagery for agriculture.

Artificial Neural Networks for Data Mining


Neural networks help in mining data in various sectors such as banking, retail, and
bioinformatics. Finding information that is hidden in the data is challenging but at the same
time, necessary. Data warehousing organizations can use neural networks to harvest
information from data sets.

This helps users to make more informed decisions through neural networks. ANNs can carry
out business tasks with structured data. They can range from tracking and documenting real-
time communications to finding new leads or potential customers.

As a matter of fact, until recently, decision-makers relied on extracted data from organized
data sets. Even though these are easier to analyze, they don’t offer a more in-depth insight
as the unstructured data does.

Neural networks provide information such as looking into the ‘why’ of a particular
customer’s behavior. Neural Network Step by Step Guide

Let’s take a look at real-life examples of Artificial neural network’s applications in Data
Mining:

1. Healthcare
Neural networks analyzed 100,000 records of patients who were in the Intensive Care Unit
(ICU), and it learned to apply experience to diagnose the ideal course of treatment. 99% of
these recommendations matched and sometimes improved a doctor’s decision.

2. Social Media
Business and employment-oriented website in LinkedIn use neural networks to pick up spam
or abusive content. LinkedIn also uses it to understand all kinds of content shared, so they
can build a better recommendation and search parameter for their members.
Overview
Artificial Neural Networks (ANNs) are computational models inspired by the human brain.
They consist of interconnected layers of nodes (neurons) where each connection has a weight.
ANNs are capable of learning complex patterns through training processes involving
backpropagation and gradient descent.

Learning Process:
1. Initialization: Randomly initialize weights and biases.
2. Forward Propagation: Calculate the output of the network.
3. Loss Calculation: Compute the error using a loss function (e.g., mean squared error for
regression, cross-entropy for classification).
4. Backpropagation: Calculate the gradient of the loss function with respect to each weight
and bias.
5. Gradient Descent: Update the weights and biases to minimize the error.
Applications in Business:
1. Sales Forecasting: ANNs can model complex relationships between sales data and various
factors such as seasonality, promotions, and economic indicators to predict future sales.
2. Fraud Detection: In financial services, ANNs can learn to identify patterns indicative of
fraudulent activity.
3. Customer Sentiment Analysis: Analyzing customer reviews and feedback to gauge
sentiment and improve customer service.
Advantages:
 Accuracy: ANNs can model complex, non-linear relationships and often achieve high
accuracy.

 Flexibility: Can be applied to a wide range of problems from image recognition to natural
language processing.
Limitations:
 Interpretability: ANNs are often considered "black boxes" due to their complexity,
making it hard to interpret the decision process.
 Data Requirement: Require large amounts of data to train effectively.
 Computationally Intensive: Training ANNs can be resource-intensive, requiring
significant computational power.
Effective Application in Business Data Mining
1. Combining Techniques:
o Hybrid Models: Using CART for initial feature selection and understanding, followed by
ANNs for complex pattern recognition.
o Ensemble Methods: Combining multiple CART models or ANNs to improve robustness
and accuracy.
2. Real-World Implementation:
o Customer Insights: Businesses can use CART to segment customers initially and then
apply ANNs to predict customer behavior within each segment.
o Operational Efficiency: Predictive maintenance using ANNs can forecast equipment
failures, allowing businesses to schedule timely maintenance and avoid downtime.
3. Addressing Limitations:
o Overfitting in CART: Apply pruning techniques and cross-validation to create more
generalizable models.
o Interpretability of ANNs: Use techniques like SHAP (SHapley Additive exPlanations) or
LIME (Local Interpretable Model-agnostic Explanations) to interpret ANN predictions.

********************************

Conclusion
Both CART and ANNs are potent tools in business data mining, each with unique strengths.
CART's interpretability and simplicity make it ideal for initial data exploration and feature
selection. In contrast, ANNs excel in capturing complex patterns and making accurate
predictions, albeit with higher computational demands and lower interpretability. By
leveraging these techniques effectively, businesses can gain valuable insights, optimize
operations, and make data-driven decisions.

You might also like