0% found this document useful (0 votes)
6 views13 pages

ML Notes

The document provides an overview of key concepts in machine learning, including definitions and comparisons of artificial intelligence and machine learning, advantages and disadvantages of algorithms like KNN and Naive Bayes, and the processes involved in feature engineering. It also explains various learning paradigms such as supervised, unsupervised, and reinforcement learning, along with techniques like logistic regression, decision trees, and neural networks. Additionally, it discusses performance metrics, ensemble methods like bagging and boosting, and applications of machine learning across different industries.

Uploaded by

Souvik Gon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views13 pages

ML Notes

The document provides an overview of key concepts in machine learning, including definitions and comparisons of artificial intelligence and machine learning, advantages and disadvantages of algorithms like KNN and Naive Bayes, and the processes involved in feature engineering. It also explains various learning paradigms such as supervised, unsupervised, and reinforcement learning, along with techniques like logistic regression, decision trees, and neural networks. Additionally, it discusses performance metrics, ensemble methods like bagging and boosting, and applications of machine learning across different industries.

Uploaded by

Souvik Gon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Compare Artificial Intelligence vs.

Machine Learning
Artificial Intelligence (AI): The broader concept of machines performing tasks that typically
require human intelligence, such as reasoning, problem-solving, and decision-making. AI
encompasses fields like natural language processing, robotics, and expert systems.
Machine Learning (ML): A subset of AI focusing on algorithms that enable systems to learn
patterns from data and improve performance over time without explicit programming. ML is
data-driven and forms the basis of many AI applications.
Key Differences: AI is the overall goal, while ML is a specific approach to achieve AI. AI
includes ML, expert systems, and reasoning algorithms; ML focuses solely on learning from
data.

Give Advantages and Disadvantages of KNN


Advantages:
1. Simple and easy to implement.
2. Non-parametric, meaning it does not assume any underlying data distribution.
3. Effective for small datasets with low-dimensional feature space.
Disadvantages:
4. Computationally expensive for large datasets.
5. Sensitive to irrelevant features and noisy data.
6. Requires careful selection of the value of kk.

Describe the Process of Feature Engineering in Machine Learning


1. Understanding Data: Analyze and preprocess the raw dataset to understand its
features and relationships.
2. Feature Extraction: Identify and create relevant features from raw data using domain
knowledge (e.g., extracting time-related features from timestamps).
3. Feature Transformation: Apply mathematical transformations (e.g., normalization,
scaling, encoding categorical data).
4. Feature Selection: Identify and retain the most important features using methods like
correlation analysis, mutual information, or recursive feature elimination.

Explain the Concept of Overfitting in Machine Learning


Overfitting occurs when a model learns the noise and details in the training data to the extent
that it performs well on the training set but poorly on unseen data. It happens due to excessive
complexity in the model, such as using too many parameters or training for too long.
How Does Reinforcement Learning Differ from Supervised and Unsupervised
Learning?
Reinforcement Learning (RL): Learns by interacting with the environment and receiving
feedback in the form of rewards or penalties. Focuses on sequential decision-making.
Supervised Learning: Requires labeled data to train a model to predict an output given an
input.
Unsupervised Learning: Works on unlabeled data to find patterns or groupings, such as
clustering.
Key Difference: RL focuses on learning through trial-and-error, unlike supervised learning
(label-based) and unsupervised learning (pattern-based).

Explain the k-Means Algorithm with an Example


1. Initialization: Select kk initial centroids.
2. Assignment: Assign each data point to the nearest centroid based on the distance.
3. Update: Recompute centroids as the mean of all points in each cluster.
4. Repeat: Iterate until centroids no longer change significantly.
Example: For a dataset of points, k-means groups them into kk clusters by
minimizing intra-cluster variance.

Advantages and Disadvantages of Naive Bayes Learning Algorithm


Advantages:
1. Simple and fast to implement.
2. Performs well with large datasets.
3. Works well for categorical data.
Disadvantages:
4. Assumes independence between features, which may not always hold.
5. Struggles with numerical data without proper scaling.

1.
(a) What are Training and Test Data: Training data is used to teach the model, enabling it
to learn patterns. Test data evaluates the model's performance on unseen data.
(b) Bayes Theorem and Its Significance:
Significance: It calculates conditional probabilities and is widely used in probabilistic models
like Naive Bayes.
(c) Differences Between Linear and Logistic Regression:
Linear regression predicts continuous outcomes; logistic regression predicts probabilities for
classification. Linear regression fits a line; logistic regression uses the sigmoid function to
model probabilities.

2.
(a) Differences Between Classification and Regression:
Classification predicts discrete labels (e.g., spam or not spam); regression predicts continuous
values (e.g., house price).
(b) Steps for Building a Decision Tree:
1. Identify the best split based on criteria like Gini index or entropy.
2. Divide data into subsets based on the split.
3. Repeat the process recursively until a stopping condition is met (e.g., maximum
depth).

3.
(a) Ensemble Modeling: Combines multiple models (e.g., bagging, boosting) to improve
predictive performance.
(b) Recurrent Networks: RNNs have feedback loops allowing information to persist,
making them suitable for sequential data like time series and natural language processing.
(c) Concept of a Perceptron: A single-layer neural network with weights, bias, and an
activation function. It classifies linearly separable data by adjusting weights based on input.
[Attach a diagram: shows input, weights, summation, and activation output.]

4.
(a) ANN: Artificial Neural Networks mimic biological neurons for complex tasks like image
recognition.
(b) Deep Learning: A subset of ML using multi-layered ANNs to learn from large datasets,
achieving state-of-the-art results in vision and language.
(c) Hierarchical Agglomerative Clustering: A bottom-up clustering technique that merges
similar data points into clusters iteratively.
(d) PCA: Principal Component Analysis reduces dimensionality by transforming data to a
new coordinate system.
(e) Multilayer Networks and Backpropagation: Multilayer networks contain multiple
hidden layers; backpropagation adjusts weights by minimizing the error between predicted
and actual outputs using gradient descent.

Explain Logistic Regression with Example


Logistic regression is a supervised learning algorithm used for binary classification problems.
It predicts the probability of an outcome belonging to a particular class using the sigmoid
function, based on study hours (X). If X=5 hours, logistic regression provides the probability
of passing.

Write Short Note on Artificial Neural Networks (ANN)


ANNs mimic biological neural networks to perform tasks like pattern recognition,
classification, and regression. They consist of interconnected layers of nodes (neurons),
including an input layer, hidden layers, and an output layer. Each neuron processes inputs
using weights, bias, and an activation function to produce outputs. ANNs are widely used in
computer vision, speech recognition, and natural language processing.

Explain Linear Discriminant Analysis (LDA) with Example


LDA is a dimensionality reduction technique used in classification tasks. It projects data onto
a lower-dimensional space while maximizing class separability.
Example: For a dataset with two classes (e.g., cats and dogs), LDA finds a linear boundary to
differentiate them by maximizing the ratio of inter-class variance to intra-class variance.

Applications of Regression and Classification in Real-World Scenarios


Regression: Used in forecasting (e.g., stock prices, weather), analyzing relationships (e.g.,
sales vs. marketing spend), and predicting continuous values (e.g., house prices).
Classification: Used in spam detection, medical diagnosis (e.g., cancer detection), fraud
detection in banking, and image classification tasks.

Compare Logistic Regression with Linear Regression


 Logistic Regression: Used for classification, uses the sigmoid function to model
probabilities, and outputs values between 0 and 1.
 Linear Regression: Used for regression tasks, predicts continuous outcomes, and fits
a straight line to data.
Use Cases: Logistic regression is used in binary outcomes (e.g., pass/fail), while
linear regression is used in predictions like sales or temperature.
Assumptions: Linear regression assumes a linear relationship, while logistic
regression assumes the log-odds of the dependent variable follow a linear relationship.

What is Bias and Variance?


 Bias: Error introduced due to simplifying assumptions in the model, leading to
underfitting.
 Variance: Error introduced by the model’s sensitivity to small changes in training
data, leading to overfitting.
A good model strikes a balance between bias and variance.

Major Applications of Machine Learning


1. Healthcare: Disease diagnosis and drug discovery.
2. Finance: Fraud detection, stock price prediction.
3. Retail: Recommendation systems.
4. Transportation: Autonomous vehicles, route optimization.
5. Manufacturing: Predictive maintenance.

Compare Biological Neuron and Artificial Neuron


 Biological Neuron: Processes inputs via dendrites, integrates signals in the soma, and
outputs through the axon.
 Artificial Neuron: Processes weighted inputs, applies an activation function, and
produces an output. Biological neurons are more complex and interconnected,
whereas artificial neurons are mathematical abstractions.

Relate Entropy and Information Gain


Entropy: Measures the impurity or randomness in a dataset. Lower entropy indicates more
homogeneity.
Information Gain: Measures the reduction in entropy achieved by splitting data based on an
attribute. It helps in selecting the best feature for decision tree splits.

Explain Noisy Data and Pruning


 Noisy Data: Refers to data containing errors, inconsistencies, or irrelevant
information that can affect model accuracy.
 Pruning: A technique in decision trees to remove branches with little significance,
reducing overfitting and improving generalization.

Write Short Notes on Regression and Correlation


 Regression: Analyzes the relationship between dependent and independent variables,
predicting continuous outcomes.
 Correlation: Measures the strength and direction of the linear relationship between
two variables. It ranges from -1 to +1.

Advantages and Disadvantages of SVM


Advantages:
1. Effective in high-dimensional spaces.
2. Robust to overfitting in smaller datasets.
Disadvantages:
3. Computationally expensive for large datasets.
4. Struggles with overlapping classes and noisy data.

Expectation Maximization (EM) for Soft Clustering


EM is an iterative algorithm used for soft clustering by estimating the probability distribution
of data points. It alternates between:
1. Expectation Step (E): Assigning probabilities to clusters based on current
parameters.
2. Maximization Step (M): Updating parameters to maximize the likelihood of
observed data.

What is Bagging, and How Does it Work?


Bagging (Bootstrap Aggregating) is an ensemble method that improves stability and accuracy
by training multiple models on different subsets of data and combining their predictions (e.g.,
majority voting for classification). It reduces variance and mitigates overfitting.

What is Boosting, and How Does it Differ from Bagging?


Boosting is an ensemble method that builds models sequentially, where each model corrects
errors of the previous one. Unlike bagging, boosting focuses on reducing bias and works
iteratively, often leading to better performance on complex datasets.

Common Performance Metrics for Machine Learning Models


1. Accuracy: Proportion of correct predictions out of total predictions.
2. Precision: Proportion of true positives out of predicted positives.
3. Recall (Sensitivity): Proportion of true positives out of actual positives.
4. F1 Score: Harmonic mean of precision and recall.
5. ROC-AUC: Measures the trade-off between sensitivity and specificity.

Explain Multilayer Networks


Multilayer networks consist of an input layer, multiple hidden layers, and an output layer.
Each layer processes inputs using weights and activation functions. Backpropagation adjusts
weights iteratively to minimize error, enabling the network to learn complex patterns.

Bayes Net and Markov Nets


 Bayes Net: Represents probabilistic relationships among variables using directed
acyclic graphs.
 Markov Nets: Represents dependencies using undirected graphs, focusing on local
interactions.

Explain Backpropagation with Example


Backpropagation is a supervised learning algorithm for neural networks that adjusts weights
by propagating the error backward through the network. Example: In a network predicting
house prices, backpropagation minimizes the difference between predicted and actual prices
using gradient descent.

Explain Multiple Linear Regression with Example


Multiple linear regression predicts a dependent variable based on multiple independent
variables.
Example: Predicting house price (Y) based on size (X1) and location (X2). By fitting the data
to this model, we predict Y for new inputs.

Supervised, Unsupervised, and Semi-Supervised Learning


 Supervised Learning: The model learns from labeled data, where both input and
corresponding output are provided. Example: Predicting house prices using features
like size and location.
 Unsupervised Learning: The model identifies patterns or structures in unlabeled
data. Example: Clustering customers based on their purchase history.
 Semi-Supervised Learning: Combines a small amount of labeled data with a large
amount of unlabeled data to improve learning accuracy. Example: Speech recognition
systems with limited transcriptions.

Define Classification and Regression with Example


 Classification: Predicts categorical outcomes (e.g., classifying emails as spam or not
spam). Example: A binary classification model predicting whether a customer will
churn.
 Regression: Predicts continuous numerical outcomes. Example: Predicting house
prices based on square footage and location.

Basic Structure and Training of a Neural Network


A neural network consists of:
 Input Layer: Accepts the input features for processing.
 Hidden Layers: Perform computations using weights, biases, and activation
functions to extract patterns.
 Output Layer: Produces the final prediction or classification.
Training Process:
1. Forward Propagation: Compute output using input weights and activation functions.
2. Loss Calculation: Compare the predicted output with the actual label using a loss
function.
3. Backward Propagation: Adjust weights and biases by propagating the error
backward using gradient descent to minimize the loss.

Working of Support Vector Machine (SVM) for Binary Classification


SVM separates two classes by finding the hyperplane that maximizes the margin between the
closest data points (support vectors) of both classes.
1. Optimal Decision Boundary: Identifies the hyperplane, where ww is the weight
vector and bb is the bias.
2. Maximizing Margin: Ensures the distance between the hyperplane and the support
vectors is maximized.
3. Kernel Trick: Transforms data into higher dimensions when the classes are not
linearly separable.

Short Notes
 Principal Component Analysis (PCA): A dimensionality reduction technique that
transforms data into a lower-dimensional space while retaining maximum variance. It
uses eigenvectors and eigenvalues of the covariance matrix.
 Logistic Regression: A classification algorithm predicting probabilities using the
sigmoid function. It is used for binary outcomes.
 Artificial Neural Network (ANN): A network of interconnected nodes (neurons)
designed to simulate human learning, widely used in image recognition and NLP.
 Decision Tree and Pruning: A tree-based algorithm for classification and regression.
Pruning removes unnecessary branches to prevent overfitting and improve
generalization.
 Multiple Linear Regression: Models the relationship between a dependent variable
and multiple independent variables, expressed

k-Nearest Neighbor Algorithm (KNN)


KNN is a non-parametric algorithm that classifies a data point based on the majority class of
its kk nearest neighbors. Steps:
1. Calculate the distance (e.g., Euclidean) between the query point and all other points.
2. Identify the kk nearest points.
3. Assign the class based on majority voting. Example: Predicting if a fruit is an apple or
orange based on size and color.

Concepts of Entropy and Information Gain


 Entropy: Measures the impurity or randomness in data. Lower entropy indicates
purer data. Formula:

 Information Gain (IG): Measures the reduction in entropy after a dataset is split on
an attribute.

Example: In a decision tree, IG helps choose the attribute that best splits the dataset.

Bagging and Boosting Algorithm


 Bagging (Bootstrap Aggregating): Reduces variance by training multiple models on
random subsets of data and aggregating their predictions (e.g., majority voting).
Example: Random Forest.
 Boosting: Sequentially trains models where each model corrects the errors of the
previous one. It reduces bias and variance. Example: AdaBoost.
Differences: Bagging focuses on parallel training to reduce variance, while boosting is
sequential and reduces bias.
Short Notes
1. Principal Component Analysis (PCA):
PCA is a dimensionality reduction technique that identifies the directions (principal
components) in which data varies the most. It transforms data into a lower-dimensional space
while retaining as much variance as possible. The principal components are the eigenvectors
of the covariance matrix, and their corresponding eigenvalues indicate the amount of variance
captured.
2. Decision Tree:
A decision tree is a flowchart-like structure used for classification and regression tasks.
Nodes represent features, branches represent decision rules, and leaves represent outcomes. It
uses metrics like entropy and information gain to determine the best split at each node.
Pruning is used to reduce overfitting.
3. Multiple Linear Regression:
Multiple linear regression models the relationship between a dependent variable and multiple
independent variables.
It is used in applications like sales prediction, risk assessment, and forecasting.
4. Artificial Neural Network (ANN):
ANNs are computational models inspired by biological neural networks. They consist of an
input layer, hidden layers, and an output layer. Each neuron applies weights, biases, and
activation functions to inputs, propagating results forward. Backpropagation is used for
training by adjusting weights to minimize errors.
5. Logistic Regression:
Logistic regression is a classification algorithm that predicts probabilities using the sigmoid
function. It is commonly used for binary outcomes, such as determining whether an email is
spam.

Differentiation: Simple Linear Regression vs. Multiple Linear Regression


 Simple Linear Regression: Models the relationship between one dependent variable
and one independent variable
 Multiple Linear Regression: Models the relationship between one dependent
variable and multiple independent variables

Explain Least Square Gradient Descent


The least square gradient descent minimizes the error (difference between predicted and
actual values) by iteratively updating model parameters. The gradient of the loss function is
calculated with respect to each parameter, and the parameter is updated in the opposite
direction of the gradient.

Hierarchical and Agglomerative Clustering


 Hierarchical Clustering: Builds a tree-like structure (dendrogram) to represent
nested groupings of data points.
 Agglomerative Clustering: A bottom-up approach where each data point starts as a
single cluster, and clusters are iteratively merged based on similarity.

Training a Neural Network Using Backpropagation


1. Forward Propagation: Compute outputs for each layer using weights, biases, and
activation functions.
2. Loss Calculation: Compare predicted outputs with actual labels using a loss function.
3. Backward Propagation: Compute the gradient of the loss function with respect to
weights and biases using the chain rule.
4. Weight Updates: Adjust weights and biases using gradient descent to minimize loss.
5. Repeat: Iterate through multiple epochs until convergence.

Filter and Wrapper Methods for Feature Selection


 Filter Methods: Use statistical techniques to evaluate the relevance of features
independently of the model. Example: Correlation, Chi-square test.
 Wrapper Methods: Use the performance of a predictive model to select features.
Example: Forward selection, backward elimination.

K-Means Algorithm
K-Means is a clustering algorithm that partitions data into kk clusters:
1. Initialize k centroids.
2. Assign each data point to the nearest centroid.
3. Update centroids as the mean of points in each cluster.
4. Repeat steps 2 and 3 until centroids stabilize.

Linear Discriminant Analysis (LDA)


LDA is a dimensionality reduction technique that projects data onto a lower-dimensional
space while maximizing class separability. It works by finding linear combinations of
features that best separate classes, minimizing intra-class variance and maximizing inter-class
variance.

Basic Architecture of a Neural Network


A neural network consists of:
 Input Layer: Receives input features.
 Hidden Layers: Process inputs using weights, biases, and activation functions.
 Output Layer: Produces the final prediction or classification.

Activation Functions and Their Importance


Activation functions introduce non-linearity into neural networks, allowing them to learn
complex patterns. Common activation functions include:
1. ReLU
2. Sigmoid:
3. Tan h

Semi-Supervised and Reinforcement Learning


 Semi-Supervised Learning: Combines a small amount of labeled data with a large
amount of unlabeled data. Example: Speech recognition with limited transcriptions.
 Reinforcement Learning: Learns optimal actions by interacting with an environment
and receiving rewards or penalties. Example: Training AI for chess.

Decision Tree Construction


1. Start with the entire dataset as the root.
2. Split the data based on the attribute that provides the highest information gain.
3. Repeat the process for each subset until a stopping condition is met (e.g., pure nodes,
maximum depth).

Role of Entropy and Information Gain


 Entropy: Measures the impurity or randomness of a dataset.
 Information Gain: Reduction in entropy achieved by splitting the dataset on an
attribute.
These metrics are used to select the best attribute for splitting nodes.

Bayes Theorem
P(A∣B)=P(B∣A)P(A)P(B)
Bayes theorem calculates the probability of an event A given evidence B. It is widely used in
probabilistic models like Naive Bayes.
Hierarchical Clustering
A clustering technique that builds a tree-like structure (dendrogram) by either merging
clusters (agglomerative) or splitting clusters (divisive). It does not require the number of
clusters to be specified beforehand.

Short Notes
 Supervised Learning: Models learn from labeled data. Example: Predicting housing
prices.
 Unsupervised Learning: Models discover patterns in unlabeled data. Example:
Customer segmentation.
 Agglomerative Clustering: A bottom-up approach to clustering where individual
data points are merged iteratively.
 Overfitting: When a model performs well on training data but poorly on unseen data.
 Support Vector Machine (SVM): A supervised algorithm that finds the optimal
hyperplane for separating classes.

PCA, Supervised, and Unsupervised Learning


 PCA: Dimensionality reduction by retaining maximum variance.
 Supervised Learning: Learns from labeled data for prediction.
 Unsupervised Learning: Identifies patterns in unlabeled data.

You might also like