ML Document-1 - Merged
ML Document-1 - Merged
FORWARD SELECTION:
cross-validation :
the data is divided into two different groups, called a training set and
a testing set. The data is then separated into a number of groups or
subsets called folds. Each fold contains about the same amount of
data. The number of folds used depends on factors like the size, data
type, and model.
For example, if you separate your data into 10 subsets or folds, you
would use nine as the training group and only one as a testing group.
1. Partition the data
Divide the data set into 10 subsets or folds. Each fold contains an
equal proportion of the data
2. Train and test model
Train and test the model 10 times, using a different fold as the test
set in each iteration. In the first iteration, the first fold is the test set,
and you train the model on the remaining k-1 folds. In the second
iteration, the second fold is the test set, and the process continues in
this way until you reach 10 times.
3. Calculate performance metrics
After each iteration, calculate your model's performance metrics
based on the model’s predictions on the test set.
4. Aggregate results
The performance metrics gathered in each iteration are usually
aggregated to generate an overall assessment of the model's
performance and create an evaluation model.
Example:
Step-2: Find the best attribute in the dataset using Attribute Selection
Measure (ASM).
Step-3: Divide the S into subsets that contains possible values for the best
attributes.
Step-4: Generate the decision tree node, which contains the best attribute.
Step-5: Recursively make new decision trees using the subsets of the dataset
created in step -3. Continue this process until a stage is reached where you
cannot further classify the nodes and called the final node as a leaf node.
o Information Gain
o Gini Index
1. Information Gain:
o Information gain is the measurement of changes in entropy after the
segmentation of a dataset based on an attribute.
o It calculates how much information a feature provides us about a class.
o According to the value of information gain, we split the node and build the
decision tree.
o A decision tree algorithm always tries to maximize the value of information
gain, and a node/attribute having the highest information gain is split first.
It can be calculated using the below formula:
Where,
o S= Total number of samples
o P(yes)= probability of yes
o P(no)= probability of no
2. Gini Index:
o Gini index is a measure of impurity or purity used while creating a decision
tree in the CART(Classification and Regression Tree) algorithm.
o An attribute with the low Gini index should be preferred as compared to the
high Gini index.
o It only creates binary splits, and the CART algorithm uses the Gini index to
create binary splits.
o Gini index can be calculated using the below formula:
Learning Process
1. Training : Feed the network with labelled data (inputs and desired
outputs)
2. Forward Propagation : Inputs flow through the network, generating
predictions.
3. Error Calculation : Compare predictions with actual outputs,
calculating the error.
4. Backpropagation : Adjust weights and biases to minimize the error.
5. Optimization : Repeat steps 2-4 until convergence or stopping
criteria.
Key Concepts
Examples
1. Image Classification:
Input: Images of animals (e.g., dogs, cats, birds).
Output: Labels indicating the type of animal.
Neural network learns to identify patterns in images and classify them
correctly.
2. Speech Recognition:
Input: Audio recordings of spoken words.
Output: Transcribed text.
Neural network learns to recognize patterns in audio signals and transcribe
them into text.
3. Predicting Stock Prices:
Input: Historical stock price data.
Output: Predicted future stock prices.
Neural network learns to identify patterns in stock price fluctuations and
make predictions.
4. Sentiment Analysis:
Input: Text reviews or comments.
Output: Sentiment labels (positive, negative, neutral).
Neural network learns to identify patterns in language and determine
sentiment.
• For each hidden unit ‘h’, training error ‘ 𝛿 ‘ can be calculated by the given formula in
which the training error of output units to which the hidden layer is connected is taken
into consideration:
• weight vector from jth node to ith node is updated using above formula in which ‘η’ is
the learning rate, ‘𝛿’ is the training error and ‘x’ is the input vector for the given node.
Termination Criterion for Multi layer networks
The above algorithm is continuously implemented on all data points until we specify a
termination criterion, which can be implemented in either of these three ways:
• training the network for a fixed number of epochs ( iterations ).
• setting the threshold to an error, if the error goes below the given threshold, we can
stop training the neural network further.
• Creating a validation sample of data, after every iteration we validate our model with
this data and the iteration with the highest accuracy can be considered as the final
model.
The first way of termination might not yield us better results , the most recommended way
is the third way as we are aware of the accuracy of our model so far.
Conclusion:
So, this is the information of math , regarding multi layer neural networks. Multilayer neural
networks, with their multiple layers and nonlinear activations, excel at capturing complex
patterns in data. Backpropagation is the key training algorithm, which propagates errors
backward from the output to the input layer, allowing for weight adjustments using gradient
descent. This process enables the network to learn effectively. Together, these techniques
form the backbone of modern deep learning, leading to significant advancements in areas like
computer vision, natural language processing, and beyond.