0% found this document useful (0 votes)

26 views49 pages

Module4 DS PPT

data science module 4

Uploaded by

rajaa.david

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views49 pages

Module4 DS PPT

data science module 4

Uploaded by

rajaa.david

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 49

Module 4: Decision Tree

What Is a Decision Tree?, Entropy, The Entropy of a Partition, Creating a Decision Tree,
Putting It All Together, Random Forests, Neural Networks, Perceptrons, Feed-Forward Neural
Networks, Back propagation, Example: Fizz Buzz, Deep Learning, The Tensor, The Layer
Abstraction, The Linear Layer, Neural Networks as a Sequence of Layers, Loss and
Optimization, Example: XOR Revisited, Other Activation Functions, Example: Fizz Buzz
Revisited, Softmaxes and Cross-Entropy, Dropout, Example: MNIST, Saving and Loading
Models, Clustering, The Idea, The Model, Example: Meetups, Choosing k, Example:
Clustering Colors, Bottom-Up Hierarchical Clustering
Text Book : Chapters 17, 18, 19 and 20
Decision tree
• A decision tree is a popular machine learning algorithm used for both classification and
regression tasks.
• It works by recursively splitting the data into subsets based on certain conditions, resulting
in a tree-like structure.
• A decision tree uses a tree structure to represent a number of possible decision paths and an
outcome for each path.
• Each internal node represents a feature (or attribute), each branch represents a decision rule,
and each leaf node represents the outcome (either a class label for classification or a
continuous value for regression).

Root Node

Decision
Node

Leaf Node
Working of decision tree
1. Splitting: The algorithm splits the dataset at each node based on a feature that provides the
best separation of the classes. This is often determined using metrics like:
• Entropy
• Gini Impurity: Another measure of impurity or disorder.
• Information Gain: The reduction in entropy or impurity achieved by splitting the dataset.
2. Recursive Partitioning: The process of splitting continues recursively, creating more
branches and nodes until one of the stopping criteria is met, such as:
• A maximum depth of the tree is reached.
• A minimum number of samples per leaf is reached.
• No further information gain can be achieved.
3. Prediction: Once the tree is built, making predictions involves traversing the tree from the
root to a leaf node based on the feature values of the input data. The class or value at the leaf
node is the prediction.
• There are many decision tree algorithms such as ID3, C4.5, CART, GUIDE, CTREE etc.
• The most commonly used decision tree algorithms are ID3 (Iterative Dichotomizer 3), C4.5
advanced version of ID3, CART stands for classification and regression.
Example 1
Example 2:

The game of Twenty Questions, then you are familiar with decision trees.

• “I am thinking of an animal.”
• “Does it have more than five legs?”
• “No.”
• “Is it delicious?”
• “No.”
• “Does it appear on the back of the Australian five-cent coin?”
• “Yes.”
• “Is it an echidna?”
• “Yes, it is!”
This corresponds to the path:
“Not more than 5 legs” → “Not delicious” → “On the 5-cent coin” → “Echidna!”
A “guess the animal” decision tree
The decision tree for hiring
Entropy
• Entropy is a key concept used to measure the impurity or disorder of a
dataset.
• It is used as part of the information gain calculation when determining how
to split nodes in a decision tree.
• The goal is to choose splits that reduce entropy, making the child nodes
more "pure" (i.e., containing mostly one class).
• For a dataset with multiple classes, the entropy is calculated using the
formula:

𝐶
𝐻 ( 𝑆 )=− ∑ 𝑝𝑖. 𝑙𝑜𝑔 2(𝑝𝑖)
Where 𝑝𝑖 is the proportion of instances in class 𝑖 and 𝐶 is the total number of classes.
𝑖=1
The entropy will be small when every pi
is close to 0 or 1 (i.e., when most
of the data is in a single class),
and it will be larger when many of the
pi’s are not close
to 0 (i.e., when the data is spread across
multiple classes).

A graph of -p log p
Random Forest
• An ensemble method that builds multiple decision trees and merges their outputs to improve
accuracy and reduce overfitting.
• Each tree in a random forest provides a “vote” for a predicted outcome, and the final result
is based on the majority vote for classification or the average prediction for regression.

Working of Random Forest

Bootstrap Sampling:
• Random Forest uses bootstrap sampling to create different subsets of the original data. This
means each tree is trained on a different subset, making them diverse.
Random Feature Selection:
• At each split in each decision tree, only a random subset of features is considered, further
diversifying the trees.
• This helps prevent trees from being identical and allows the forest to capture a wider range
of patterns.
Advantages of Random Forest
1. Reduced Overfitting: By averaging the results of multiple trees, Random Forest
reduces the risk of overfitting that is common in single decision trees.
2. Robustness: The combination of many trees makes Random Forest more robust to noise
in the data.
3. Feature Importance: Random Forest provides insights into which features are most
important for making predictions, useful in understanding and refining the model.
Neural Network
• A Neural Network is a computational model inspired by the structure of the human brain.
• It consists of layers of interconnected nodes (neurons) that process and transmit information
to recognize patterns, make decisions, and learn from data.
• Neural networks are a foundation of deep learning and excel at handling complex tasks such
as handwriting recognition, image recognition, face detection, natural language processing,
and even playing games.

Types of Neural Networks

1. Feedforward Neural Networks (FNN): Information flows in one direction from input to
output, without loops. This is the most basic architecture.
2. Convolutional Neural Networks (CNN): Used for image processing, CNNs have special
layers that detect spatial features like edges and textures.
3. Recurrent Neural Networks (RNN): Designed to handle sequential data (e.g., time series or
language), RNNs have connections that allow them to "remember" previous inputs.
4. Generative Adversarial Networks (GANs): These networks consist of two competing
neural networks (a generator and a discriminator) used in generating new data, like images.
Perceptrons
• Perceptron is the one of the simplest types of Artificial neural network, primarily used for
binary classification tasks.
• Perceptron is a single layer neural network with single neuron. It takes several binary
inputs, applies weights, sums them and passes the result through an activation function to
produce a binary output.
• Figure 10.5 shows the perceptron model.
Inputs- A perceptron takes multiple inputs represented as x1,x2,x3,…xn
Weights – Each input is associated with a weight w1, w2,w3, …wn. The weights are real
valued numbers that adjust input importance.
Bias- An additional parameter b is added to the weighted sum to adjust the output
independently of the input values
Summation

Activation function
• The summation result is passed through an activation function to produce the final output y.
• An activation function determines the output of each neuron by introducing non-linearity,
allowing the network to learn complex patterns and relationships within the data.
• The Perceptron typically uses the step function as its activation function, which outputs:
+1 (or class 1) if the weighted sum of inputs ≥0.
−1 (or class 0) otherwise.
Other common activation functions are:
Sigmoid: Compresses outputs to a range between 0 and 1.
ReLU (Rectified Linear Unit): Outputs zero for negative values and passes positive values
unchanged. It’s efficient and commonly used in hidden layers.
Softmax: Used in the output layer of classification models to produce probabilities for each
class.
Perceptron algorithm
1. Initialize weights and bias randomly, typically small values close to zero.
2. Loop over each data point in the training set:
• Calculate the output of the perceptron using the current weights and inputs.
• Calculate the error as the difference between the actual label and the
predicted label.
• Update weights and bias based on the error:
𝑤𝑖=𝑤𝑖+ Δwi
𝑏=𝑏+Δb
Where
Δwi =×(𝛼 𝑦true−𝑦pred)×𝑥𝑖
𝑦 𝑦true−𝑦pred)
Δb=×(
Here, 𝛼 is the learning rate, a hyperparameter that controls how much the
weights adjust with each update.
3. Repeat the process for a specified number of iterations or until convergence,
i.e., when there are no more errors.
Feed-Forward Neural Network (FNN)
• It is the is one of the simplest types of artificial neural networks. In a feed-forward network,
information flows in one direction only—from the input layer, through any hidden layers, to
the output layer.
• There are no cycles or loops in this structure, which makes it easier to understand and
implement compared to other types of neural networks.
• Just like in the perceptron, each (noninput) neuron has a weight corresponding to each of its
inputs and a bias.
• As with the perceptron, for each neuron we’ll sum up the products of its inputs and its
weights. But here, rather than outputting the step function applied to that product, use the
sigmoid function
Why use sigmoid instead of the simpler step_function?
• In order to train a neural network, we need to use calculus, and in order to use calculus,
we need smooth functions.
• step_function isn’t even continuous, and sigmoid is a good smooth approximation of it.

A neural network for XOR

Training a Feedforward Neural Network
• Training a Feedforward Neural Network involves adjusting the weights of the neurons to
minimize the error between the predicted output and the actual output.
• This process is typically performed using backpropagation and gradient descent.
Backpropagation
• Backpropagation is a fundamental algorithm for training artificial neural networks,
especially deep networks with multiple layers.
• It works by calculating the gradient of the loss function with respect to each weight in the
network and using it to adjust weights and minimize the error in predictions.
• Imagine we have a training set that consists of input vectors and corresponding target
output vectors.
• For example, the input vector [1, 0] corresponded to the target output [1]. Imagine that our
network has some set of weights. We then adjust the weights using the following
algorithm:
1. Run feed_forward on an input vector to produce the outputs of all the neurons
in the network.
2. We know the target output, so we can compute a loss that’s the sum of the squared
errors.
3. Compute the gradient of this loss as a function of the output neuron’s weights.
4. “Propagate” the gradients and errors backward to compute the gradients with respect to
the hidden neurons’ weights.
5. Take a gradient descent step.
Deep Learning
• Deep learning is a subset of machine learning focused on training artificial neural networks
with many layers (or "deep" layers) to perform complex tasks like image recognition, natural
language processing, and more.
• Deep learning originally referred to the application of “deep” neural networks (that is,
networks with more than one hidden layer),
• Deep learning models use neural networks with multiple layers.
• Each layer consists of a set of neurons (or nodes) that process inputs, apply weights and
biases, and pass the result through an activation function to introduce non-linearity.
• Input Layer: The first layer, which receives raw data as input.
• Hidden Layers: These layers extract features from the data, with each subsequent layer
learning more complex representations.
• Output Layer: Produces the final output, such as class labels in classification or numeric
values in regression.
The Tensor
• A tensor is a multi-dimensional array or data structure that represents data in various
dimensions.
• Tensors are the fundamental building blocks of data manipulation in neural networks
• A tensor is a generalization of scalars, vectors, and matrices to higher dimensions.
• Scalar: A single number (0-dimensional tensor).
• Vector: A 1-dimensional array of numbers, like [1,2,3] (1-dimensional tensor).
• Matrix: A 2-dimensional array of numbers, (2-dimensional tensor).
• Higher-Dimensional Tensors: An n-dimensional array, with more than two axes, used to
represent more complex data structures, such as images, sequences, or batches of data in
machine learning.
Example
• A 3D tensor could represent color images, where each pixel has three values (R, G, B).
• A 4D tensor could represent a batch of images, where each entry in the batch is a 3D tensor
(for each image).
The Layer Abstraction
• The Layer Abstraction in deep learning refers to the conceptual organization of neural network
models into distinct layer, each performing specific types of computations or transformations
on the input data.
• This hierarchical structure allows for the progressive extraction of features from raw data with
each layer learning increasingly complex representation.
• Layer abstractions help modularize and structure neural networks, allowing easier design,
training, and modification.
• In deep learning frameworks like TensorFlow and PyTorch, layers are treated as modular
components, each with a specific role in transforming data. For example, layers handle tasks
like feature extraction (Convolutional layers), data transformation (Dense layers), or
regularization (Dropout layers). These modular layers can be stacked to create complex neural
network architectures.
Types of layers include:
Input Layer: The first layer, which accepts raw data and passes it to the next layer.
Hidden Layers: These are the intermediate layers where transformations, feature extraction,
and pattern recognition occur.
• Dense (Fully Connected) Layer: Every neuron in the layer is connected to every neuron in
the previous layer. Often used in the last layers of the network for combining learned
features.
• Convolutional Layer: Used mainly for image data, it applies convolution operations to
detect local patterns, like edges or textures.
• Recurrent Layer: Used for sequential data like text, it maintains memory by carrying
information across time steps.
Output Layer: Produces the final output of the network.
Benefits of Layer Abstraction in Deep Learning
Complexity Management: Each layer performs a specific task, making it easier to understand,
optimize, and troubleshoot the network.
Hierarchical Feature Learning: With layered abstraction, neural networks can learn features
at multiple levels of granularity, allowing them to model complex patterns and relationships in
the data.
Modularity and Flexibility: Layers can be swapped or fine-tuned without changing the entire
network structure, making neural networks highly customizable.
Efficient Computation: Deep learning frameworks optimize layer operations separately,
allowing for faster computation and more efficient memory usage.
Linear Layer
• The Linear Layer is a fundamental component in neural networks, particularly in fully
connected or dense networks.
• It applies a linear transformation to the input data, which means it performs a matrix-vector
multiplication, followed by an optional addition of a bias term.
• For an input vector x and an output vector y, the linear layer computes:
y=Wx+b
where:
• W is the weight matrix, containing learnable parameters.
• b is the bias vector, also learnable parameters.
• x is the input vector.
• y is the output vector.
• The linear layer is often placed between other layers (like activation layers) in a neural
network.
• When combined with nonlinear activation functions (like ReLU, Sigmoid, or Tanh), it
enables the network to model complex, nonlinear relationships.
Neural Networks as a Sequence of Layers
• Neural networks are structured as sequences of layers, where each layer processes the data
it receives and then passes it to the next layer.
• This sequential layer design enables neural networks to progressively extract and learn
increasingly complex features from data.

Layer Composition
Neural networks are built by stacking various types of layers in a specific order.
The most common layer types include:
Input Layer: The first layer that receives the raw data input.
Hidden Layers: Intermediate layers where most of the computation and feature extraction
occur. They can be fully connected (dense) layers, convolutional layers, recurrent layers, etc.
Output Layer: The final layer that produces the network's prediction or classification result.
Loss and Optimization
To train a neural network, you need a loss function to measure the difference between the
model's predictions and the true values, and an optimization algorithm to update the model
parameters to minimize this loss.
1. Loss Functions
The loss function quantifies how well or poorly the model is performing. For most neural
networks, you’ll use one of the following:
•Mean Squared Error (MSE): Common for regression tasks.
•Cross-Entropy Loss: Typically used for classification tasks.
2. Optimization Algorithms
An optimizer adjusts the model’s weights based on the gradients from backpropagation.
Common optimizers include:
•Stochastic Gradient Descent (SGD): A simple and effective method that adjusts
weights by taking small steps proportional to the negative gradient.
•Adam: An advanced version of gradient descent that adapts the learning rate for each
parameter, often leading to faster convergence.
Other activation function
• Activation functions are essential in neural networks because they introduce non-linearity,
enabling networks to learn complex patterns and approximate non-linear functions.
• Without activation functions, a neural network composed of linear layers would be
equivalent to a single-layer model, limiting its capacity to solve complex tasks.
1. Sigmoid Function
2. Tanh (Hyperbolic Tangent)
3. ReLU (Rectified Linear Unit)
4. Leaky ReLU
5. Softmax
Softmaxes and Cross-Entropy
• Softmax and Cross-Entropy are two important concepts in machine learning, often used
together, especially in classification tasks.
• The Softmax function takes a vector of values (like scores for each class) and converts them
into probabilities. It’s used in the output layer of a neural network to interpret the results as
probabilities for each class.
• Cross-Entropy is a loss function used to measure the difference between two probability
distributions – the true distribution (the actual label) and the predicted distribution (output
from the Softmax function).
How They Work Together
• When Softmax is applied to the neural network’s output, it transforms these raw scores
into probabilities.
• Then, Cross-Entropy is used to measure how well these probabilities align with the
actual class labels, penalizing the model more if it assigns low probability to the correct
class.
• This combination is widely used in multi-class classification tasks because:Softmax
ensures the model outputs probabilities.
• Cross-Entropy penalizes incorrect predictions, guiding the model to improve its
accuracy.
Dropout
In machine learning, dropout is a regularization technique primarily used in neural
networks to reduce the risk of overfitting during training.
Random Neuron Deactivation: During each forward pass, dropout temporarily "drops
out" (sets to zero) a random subset of neurons within a layer.
This forces the network to not rely on specific neurons for prediction, distributing the
learning across multiple neurons.
Typically, this is controlled by a probability (called the "dropout rate") that defines the
fraction of neurons to be dropped out.
Applied During Training: Dropout is only used during the training phase, and the
complete network is used during inference (testing/validation) by scaling neuron
activations to account for the missing connections in training.
Benefits:
• Prevents Overfitting: Dropout helps prevent the model from overfitting by ensuring that
the neural network does not become overly reliant on certain neurons.
• Improves Generalization: By making the network more robust to noise and forcing it to
learn more varied representations, dropout improves the network's ability to generalize
on unseen data.

Choosing Dropout Rate:

• Common dropout rates are 0.2 to 0.5, but they vary based on the architecture and
dataset.
• Too high a dropout rate can hinder the model's ability to learn, while too low a rate may
not effectively prevent overfitting.
Example: MNIST
• MNIST is a dataset of handwritten digits that everyone uses to learn deep learning It is
available in a somewhat tricky binary format, so install the mnist library to work with it.
• python program to compute loss and optimization in deep learning using mnsit data set.

Figure 1 MNIST images

import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.losses import SparseCategoricalCrossentropy
from tensorflow.keras.metrics import SparseCategoricalAccuracy

# Load MNIST dataset

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0 # Normalize data to [0, 1]

# Create a simple neural network model

model = Sequential([
Flatten(input_shape=(28, 28)), # Flatten the input images (28x28 pixels)
Dense(128, activation='relu'), # First hidden layer with 128 units and ReLU activation
Dense(10, activation='softmax') # Output layer with 10 units (for 10 classes) and softmax activation
])

# Compile the model with loss function and optimizer

model.compile(optimizer=SGD(learning_rate=0.01), # Stochastic Gradient Descent optimizer with learning rate 0.01
loss=SparseCategoricalCrossentropy(), # Loss function for multi-class classification
metrics=[SparseCategoricalAccuracy()]) # Evaluation metric

# Train the model and compute loss

history = model.fit(x_train, y_train, epochs=10, batch_size=32, validation_data=(x_test, y_test))

# Evaluate the model

test_loss, test_accuracy = model.evaluate(x_test, y_test)
print(f"Test Loss: {test_loss:.4f}, Test Accuracy: {test_accuracy:.4f}")
• Data Loading and Normalization: MNIST is loaded and normalized to values between 0 and 1.
• Model Architecture: A sequential model is created with one hidden layer and an output layer.
• Loss Function: SparseCategoricalCrossentropy is used since the dataset labels are integers.
• Optimizer: SGD (Stochastic Gradient Descent) with a learning rate of 0.01 is used for optimization.
• Training and Validation: The model is trained for 10 epochs, and validation data is provided for
monitoring.
• Evaluation: The model is evaluated on test data to report final loss and accuracy.
• Below shows the output of the model for 9th and 10 epoch

Epoch 9/10
1875/1875 ━━━━━━━━━━━━━━━━━━━━ 4s 2ms/step - loss: 0.1807 -
sparse_categorical_accuracy: 0.9489 - val_loss: 0.1731 -
val_sparse_categorical_accuracy: 0.9503
Epoch 10/10
1875/1875 ━━━━━━━━━━━━━━━━━━━━ 5s 2ms/step - loss: 0.1672 -
sparse_categorical_accuracy: 0.9542 - val_loss: 0.1651 -
val_sparse_categorical_accuracy: 0.9525
313/313 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 0.1923 -
sparse_categorical_accuracy: 0.9439
Test Loss: 0.1651, Test Accuracy: 0.9525
Clustering
• Clustering is an unsupervised machine learning technique that involves grouping data
points into clusters, or groups, such that data points in the same group are more similar
to each other than to those in other groups.
• This similarity is usually measured using metrics like Euclidean distance, Manhattan
distance, or cosine similarity.
• Clustering is a core concept in data analysis and machine learning, especially when
dealing with unlabeled data, as it allows us to reveal hidden patterns and relationships
within a dataset.
• Different types of clustering methods are
1. K-means Clustering
2. Hierarchical Clustering:
3. DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
4. Model-Based Clustering (Gaussian Mixture Models (GMM))
5. Grid-Based Clustering
K-means Clustering:
• One of the simplest clustering methods is k-means, in which the number of clusters k is
chosen in advance,
• The goal is to partition the inputs into sets S1, ..., Sk in a way that minimizes the total
sum of squared distances from each point to the mean of its assigned cluster.
• Divides data into 𝐾 clusters by minimizing the variance within clusters. It randomly
initializes centroids and assigns each data point to the closest centroid.
• The centroids are then updated iteratively.
• Iterative algorithm that usually finds a good clustering
1. Start with a set of k-means, which are points in d-dimensional space.
2. Assign each point to the mean to which it is closest.
3. If no point’s assignment has changed, stop and keep the clusters.
4. If some point’s assignment has changed, recompute the means and return to
step 2.
Choosing k
There are various ways to choose a k. One that’s reasonably easy to understand involves
plotting the sum of squared errors (between each point and the mean of its cluster) as a
function of k and looking at where the graph “bends”:
Bottom-Up Hierarchical Clustering
• Bottom-up hierarchical clustering, also known as agglomerative hierarchical
clustering.
• It is a type of clustering approach that builds a hierarchy of clusters by starting with
each data point as an individual cluster and progressively merging clusters based on
their similarity.
• Here “grow” clusters from the bottom up.

1. Make each input its own cluster of one.

2. As long as there are multiple clusters remaining, find the two closest clusters and
merge them.
How Bottom-Up Hierarchical Clustering Works
1. Start with each point as its Own Cluster:
Initially, each data point is considered a separate cluster. If you have 𝑛 data points, you start
with 𝑛 clusters.
2. Compute Pairwise Distances Between Clusters:
Calculate the distance between each pair of clusters. The choice of distance metric (e.g.,
Euclidean, Manhattan, cosine similarity) depends on the nature of the data and the problem
at hand.
3. Merge the Closest Clusters:
Find the two clusters with the smallest distance and merge them into a single cluster. This
reduces the total number of clusters by one.
4. Update Distances Between Clusters:
After merging two clusters, update the distances between the new cluster and the remaining
clusters.
Different strategies for calculating these distances result in different types of hierarchical
clustering:
• Single Linkage: Distance between clusters is based on the closest pair of points (nearest
neighbor).
• Complete Linkage: Distance between clusters is based on the farthest pair of points
(furthest neighbor).
• Average Linkage: Distance between clusters is based on the average distance between all
pairs of points in the two clusters.
• Centroid Linkage: Distance between clusters is based on the distance between their
centroids

5. Repeat Steps 3 and 4 Until Only One Cluster Remains:

Continue merging clusters until all points are merged into a single cluster or until a
desired number of clusters is reached. This results in a hierarchy of clusters, often
visualized as a dendrogram.
A dendrogram is a tree-like diagram that shows the merging process of clusters at each step.
It is read from the bottom up, with each leaf node representing an individual data point.
Advantages:
• Does not require specifying the number of clusters in advance.
• Creates a hierarchy of clusters, allowing for flexibility in choosing the final clusters.
Disadvantages:
• Computationally intensive for large datasets, with a time complexity of 𝑂( 𝑛3).
• Sensitive to noise and outliers, which can significantly affect the hierarchical structure.
Some Important questions
1. Illustrate the working of Decision tree and hence explain importance of entropy in decision tree.
2. Discuss perceptron neural network in detail
3. What is feed forward neural network ? Explain back propagation method to train neural network
4. Bring out the differences between machine learning and deep learning
5. Explain the working of Artificial neural network.
6. What is clustering and explain k means clustering in detail.
7. Explain layer abstraction in deep learning
8. Write python program to compute loss and optimization in deep learning using mnsit data set
9. What is an activation function ? Discuss different types of activation function
10.Write a note on dropout

Advisory!: (For More Details, See The "Plugin Preset Overview" Section Below.)
No ratings yet
Advisory!: (For More Details, See The "Plugin Preset Overview" Section Below.)
52 pages
Sap Fico Project Book Material
No ratings yet
Sap Fico Project Book Material
100 pages
Apple-Training Certification Catalog
No ratings yet
Apple-Training Certification Catalog
16 pages
Data Science
No ratings yet
Data Science
5 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
21 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Neural Nets (Wrap-Up) and Decision Trees: CS 188: Artificial Intelligence
No ratings yet
Neural Nets (Wrap-Up) and Decision Trees: CS 188: Artificial Intelligence
26 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Pa Unit-Iii
No ratings yet
Pa Unit-Iii
75 pages
ML06 Neural-Network 2024-2025
No ratings yet
ML06 Neural-Network 2024-2025
78 pages
Non Linear 1704955560
No ratings yet
Non Linear 1704955560
50 pages
21AI502 Syllbus
No ratings yet
21AI502 Syllbus
5 pages
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
No ratings yet
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
54 pages
Three Machine Learning Algorithms
No ratings yet
Three Machine Learning Algorithms
11 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
Lecture Notes 2016
No ratings yet
Lecture Notes 2016
132 pages
Decision Tree
No ratings yet
Decision Tree
16 pages
Unit 3,4,5 ML (CS - AI)
No ratings yet
Unit 3,4,5 ML (CS - AI)
37 pages
Unit 6
No ratings yet
Unit 6
55 pages
What Is Decision Tree
No ratings yet
What Is Decision Tree
35 pages
22cse63 Module 5
No ratings yet
22cse63 Module 5
90 pages
Pattern Recognition 14
No ratings yet
Pattern Recognition 14
46 pages
Chapter 11 Neural Nets (Python)
No ratings yet
Chapter 11 Neural Nets (Python)
43 pages
ML Fundamentals by Bitspace
No ratings yet
ML Fundamentals by Bitspace
19 pages
Lecture 12 - Decision and Regression Trees
No ratings yet
Lecture 12 - Decision and Regression Trees
35 pages
Lecture 3 - Decision Trees and Random Forest
No ratings yet
Lecture 3 - Decision Trees and Random Forest
20 pages
Present
No ratings yet
Present
20 pages
Random Forest Algorithm
No ratings yet
Random Forest Algorithm
39 pages
Unit 1
No ratings yet
Unit 1
70 pages
21 Decision Trees
No ratings yet
21 Decision Trees
62 pages
Decision Tree
No ratings yet
Decision Tree
45 pages
AnaPeixoto SupervisedVSUnsupervised IRISHEPHSFIndia 16012025
No ratings yet
AnaPeixoto SupervisedVSUnsupervised IRISHEPHSFIndia 16012025
59 pages
ML & DL Notes
No ratings yet
ML & DL Notes
30 pages
06-Classification Part2
No ratings yet
06-Classification Part2
34 pages
Lecture 8
No ratings yet
Lecture 8
28 pages
Neural Network
No ratings yet
Neural Network
7 pages
Data Science Crash Course
100% (1)
Data Science Crash Course
32 pages
Decision Tree
No ratings yet
Decision Tree
52 pages
AI & DS-II MU QPaper Solution (June 2023)
No ratings yet
AI & DS-II MU QPaper Solution (June 2023)
21 pages
Unit 3 (MLT)
No ratings yet
Unit 3 (MLT)
42 pages
Module 3 - Machine Learning Algorithms
No ratings yet
Module 3 - Machine Learning Algorithms
17 pages
Module 5 Machine Learning
No ratings yet
Module 5 Machine Learning
36 pages
Day1 05 Introduction To DeepLearning Part
No ratings yet
Day1 05 Introduction To DeepLearning Part
20 pages
Chapter 04
No ratings yet
Chapter 04
48 pages
Decision Trees - A Complete Introduction With Examples - by Shubham Koli - Medium
No ratings yet
Decision Trees - A Complete Introduction With Examples - by Shubham Koli - Medium
22 pages
Week03 Classification
No ratings yet
Week03 Classification
22 pages
Lecture 4 - Intro To Machine Learning and Decision Trees
No ratings yet
Lecture 4 - Intro To Machine Learning and Decision Trees
61 pages
What Is A Neural Network? - IBM
No ratings yet
What Is A Neural Network? - IBM
10 pages
AI-ML Module3
No ratings yet
AI-ML Module3
117 pages
Lecture 11 Slides - After
No ratings yet
Lecture 11 Slides - After
55 pages
Chapter 4 SqCzYr
No ratings yet
Chapter 4 SqCzYr
47 pages
Unit 4 Datamining
No ratings yet
Unit 4 Datamining
5 pages
8 Classification
No ratings yet
8 Classification
45 pages
AAM Book
No ratings yet
AAM Book
159 pages
Unit4 PPT
No ratings yet
Unit4 PPT
118 pages
Unit 5 ML
No ratings yet
Unit 5 ML
37 pages
2021 Lecture10 BasicML
No ratings yet
2021 Lecture10 BasicML
76 pages
Lecture-7 Machine Learning With Python
No ratings yet
Lecture-7 Machine Learning With Python
42 pages
AIch 5
No ratings yet
AIch 5
50 pages
Decision Trees
No ratings yet
Decision Trees
45 pages
Lecture2 Decision Tree and Random Forest
No ratings yet
Lecture2 Decision Tree and Random Forest
24 pages
Long Short Term Memory: Fundamentals and Applications for Sequence Prediction
From Everand
Long Short Term Memory: Fundamentals and Applications for Sequence Prediction
Fouad Sabry
No ratings yet
Feedforward Neural Networks: Fundamentals and Applications for The Architecture of Thinking Machines and Neural Webs
From Everand
Feedforward Neural Networks: Fundamentals and Applications for The Architecture of Thinking Machines and Neural Webs
Fouad Sabry
No ratings yet
Acer Aspire Es1-512 Wistron Ea53-Bm SCH PDF
No ratings yet
Acer Aspire Es1-512 Wistron Ea53-Bm SCH PDF
49 pages
B.Sc. (Data Science)
No ratings yet
B.Sc. (Data Science)
9 pages
MS PDF VIEWER Snowsetanswers 2
No ratings yet
MS PDF VIEWER Snowsetanswers 2
475 pages
Oracle 1Z0-083 v2022-05-21 q220 - 2
No ratings yet
Oracle 1Z0-083 v2022-05-21 q220 - 2
75 pages
Manual
No ratings yet
Manual
17 pages
Accesing IO
No ratings yet
Accesing IO
3 pages
Chapter 4
No ratings yet
Chapter 4
58 pages
GNSS Solutions v3.80
100% (1)
GNSS Solutions v3.80
3 pages
Introduction To Solana - Grayscale-Building-Blocks-Solana-1
No ratings yet
Introduction To Solana - Grayscale-Building-Blocks-Solana-1
18 pages
Data Communication & Networking (CEN-222)
No ratings yet
Data Communication & Networking (CEN-222)
12 pages
CME538 Lecture 1 Slide 1
No ratings yet
CME538 Lecture 1 Slide 1
122 pages
DAA Practical
No ratings yet
DAA Practical
68 pages
Nurse Call System: For Healthcare
No ratings yet
Nurse Call System: For Healthcare
12 pages
Discrete Structure Notes by Samujjwal Bhandari
100% (2)
Discrete Structure Notes by Samujjwal Bhandari
151 pages
Benchmarking For Comparative Evaluation of RP Systems and Processes
No ratings yet
Benchmarking For Comparative Evaluation of RP Systems and Processes
13 pages
Banlga Text To Bangla Sign Language Parvez For Conference
No ratings yet
Banlga Text To Bangla Sign Language Parvez For Conference
6 pages
2.PC Jotun Chart1011 PDF
No ratings yet
2.PC Jotun Chart1011 PDF
3 pages
XSL (Extensible Stylesheet Language)
100% (1)
XSL (Extensible Stylesheet Language)
11 pages
Installation Instructions Model DB2-HR: Detector Relay Base
No ratings yet
Installation Instructions Model DB2-HR: Detector Relay Base
8 pages
Artificial Intelligence - AL3391 2021 Regulation - Question Paper 2023 Nov Dec
No ratings yet
Artificial Intelligence - AL3391 2021 Regulation - Question Paper 2023 Nov Dec
4 pages
EXPERIMENT 9 (Kubernetes Multi Node Cluster)
No ratings yet
EXPERIMENT 9 (Kubernetes Multi Node Cluster)
4 pages
Tally ERP History
No ratings yet
Tally ERP History
9 pages
FUJITSU PFC QLE2690 / QLE2692 Fibre Channel Adapters: Data Sheet
No ratings yet
FUJITSU PFC QLE2690 / QLE2692 Fibre Channel Adapters: Data Sheet
4 pages
Presentation Impress Shortcut Keys
No ratings yet
Presentation Impress Shortcut Keys
3 pages
Number System
No ratings yet
Number System
23 pages
Understanding Information Centric Networking and Mobile Edge Computing
No ratings yet
Understanding Information Centric Networking and Mobile Edge Computing
24 pages
Legacy Manual UK
No ratings yet
Legacy Manual UK
15 pages

Module4 DS PPT

Uploaded by

Module4 DS PPT

Uploaded by

Module 4: Decision Tree

Working of Random Forest

Types of Neural Networks

A neural network for XOR

Choosing Dropout Rate:

Figure 1 MNIST images

# Load MNIST dataset

# Create a simple neural network model

# Compile the model with loss function and optimizer

# Train the model and compute loss

# Evaluate the model

1. Make each input its own cluster of one.

5. Repeat Steps 3 and 4 Until Only One Cluster Remains:

You might also like