Machine Learning3
Machine Learning3
VI Semester
MACHINE LEARNING
Course Code: CA-C27T
Supervised learning
Supervised learning is a type of machine learning where the model learns from labeled data ,
meaning each training example consists of input data (features) along with their corresponding
correct output (labels). The goal of supervised learning is to learn a mapping from input to
output based on the available labeled data, such that the model can make accurate predictions
on new, unseen data.
Types of Supervised Learning:
Classification :
• Classification is a type of supervised learning where the goal is to predict the category
or class label of input data.
• The output variable is discrete and categorical, with a finite number of possible values
or classes.
• Example: Email spam detection, sentiment analysis, image classification.
Regression :
• Regression is a type of supervised learning where the goal is to predict a continuous
output variable based on input features.
• The output variable is numerical, representing a quantity or value along a continuous
scale.
• Example: Predicting house prices, stock prices, temperature forecasting, salary of person.
Classification
Binary Classifier:
A binary classifier is a type of classification model that predicts two possible outcomes or
classes for each input instance. The output variable is binary, meaning it has only two
possible values.
Examples:
Logistic Regression : A popular linear model used for binary classification tasks.
Support Vector Machines (SVM) : Effective for separating two classes with a hyperplane in the
feature space.
Decision Trees : Can be used for binary classification by splitting the feature space into
regions corresponding to each class.
Multi-class Classifier:
A multi-class classifier is a type of classification model that predicts multiple possible
outcomes or classes for each input instance. The output variable can have more than two
possible values.
Examples:
1. Random Forest : Can handle multi-class classification tasks by combining multiple decision
trees.
2. K-Nearest Neighbors (KNN): Can be used for multi-class classification by assigning the
majority class among the K nearest neighbors.
Lazy Learner (Instance-based Learning):
Lazy learners, also known as instance-based learners, delay the process of learning until a
new instance needs to be classified. They store the training instances and perform
classification based on similarity measures between the new instance and the stored
instances.
Example:
1. K-Nearest Neighbors (KNN): A lazy learning algorithm that stores all instances of the
training data and classifies new instances based on the majority class among its nearest
neighbors.
Eager Learner (Model-based Learning):
Eager learners, also known as model-based learners, construct a generalized model from
the training data during the learning phase. This model is then used to make predictions on
new instances without requiring the entire training data.
Example:
1. Decision Trees : An eager learning algorithm that constructs a tree-based model by
recursively partitioning the feature space based on the training data.
Classsification learning steps.
1. Data Collection and Preprocessing :
1. Gather labeled data, where each instance is associated with a class label. Preprocess the data by handling missing values, outliers, and scaling
features if necessary.
3. Model Selection :
1. Choose an appropriate classification algorithm based on the nature of the problem, size of the dataset, and computational resources available.
5. Model Evaluation :
1. Evaluate the trained model's performance using metrics such as accuracy, precision, recall, and F1-score on a separate validation set or through
cross-validation.
6. Hyperparameter Tuning :
1. Fine-tune the model's hyperparameters to optimize its performance. This may involve grid search, random search, or other optimization
techniques.
Decision trees are a supervised learning algorithm used in machine learning for
classification and regression modeling. They can be used to determine whether an
event happened or didn't happen, or to predict continuous values based on
previous data.
DECISION TREE IMPORTANCE
1. TERMINOLOGIES
Root Node :
1. The topmost node of the decision tree, representing the feature that best splits the dataset into subsets. It has no incoming
edges.
2. Leaf Node (Terminal Node) :
1. Nodes at the bottom of the decision tree that do not split further. Each leaf node represents a class label (in classification) or a
predicted value (in regression).
Decision Rule :
1. The rule or condition based on which the dataset is split at each node. It involves comparing the value of a feature with a
threshold.
Impurity :
1. A measure of the disorder or uncertainty in a dataset. Decision trees aim to minimize impurity at each node to make the splits
more informative.
Gini Impurity :
1. A measure of impurity used in classification tasks. It calculates the probability of misclassifying a randomly chosen data point if
it were labeled according to the distribution of classes in the subset.
Entropy :
1. Another measure of impurity is used in classification tasks. It quantifies the uncertainty in a dataset's class distribution.
Pruning :
1. The process of removing nodes or branches from the decision tree to prevent overfitting. Pruning helps simplify the tree and
improve its generalization ability.
Working Principle:
1. Splitting Criteria :
1. Decision trees recursively split the data into subsets based on the values of features. The
algorithm selects the best feature and split point that maximizes the homogeneity (or purity) of the
subsets.
2. Tree Construction :
1. Starting from the root node, the algorithm iteratively splits the data at each node based on the
selected splitting criteria until a stopping criterion is met. This process continues until the subsets
are pure (all instances belong to the same class) or the tree reaches a predefined depth.
3. Decision Rules :
1. At each internal node, the decision tree applies a decision rule based on a feature value.
4. Leaf Nodes :
1. When a stopping criterion is met (e.g., maximum depth reached, no further improvement in purity),
the algorithm creates a leaf node representing the majority class (in classification) or the average
value (in regression) of the instances in that subset.
5. Prediction :
1. To make predictions for new instances, the decision tree traverses from the root node down to a
leaf node based on the decision rules defined at each node. The predicted class or value at the
leaf node is assigned to the new instance.
Advantages of Decision Trees:
1. Interpretability :
1. Decision trees are easy to understand and interpret, making them suitable for explaining the
decision-making process to non-experts.
2. Handles Mixed Data Types :
1. Decision trees can handle both numerical and categorical data without the need for feature
scaling or one-hot encoding.
3. Implicit Feature Selection :
1. Decision trees perform implicit feature selection by selecting the most informative features for
splitting the data at each node.
4. Handles Missing Values and Outliers :
1. Decision trees can handle missing values and outliers in the data by choosing alternative
splits.
Disadvantages of Decision Trees:
1. Overfitting :
1. Decision trees are prone to overfitting, especially on noisy or high-dimensional data.
2. Instability :
1. Small variations in the data or random noise can lead to different decision tree structures,
making them sensitive to the specific training data used.
3. Limited Expressiveness :
1. Decision trees may fail to capture complex relationships in the data, especially when the
decision boundaries are not axis-aligned.
APPLICATION OF DECISION TREE
Decision trees find applications in various domains due to their simplicity and
interpretability:
1. Medical Diagnosis : Decision trees aid in diagnosing diseases based on symptoms,
guiding healthcare professionals in treatment decisions.
2. Credit Scoring : Financial institutions employ decision trees for credit scoring,
assessing creditworthiness and determining loan approvals.
3. Fraud Detection : Decision trees help detect fraudulent activities in banking,
insurance, and e-commerce by identifying suspicious patterns.
4. Product Recommendation : E-commerce platforms utilize decision trees for
personalized product recommendations, enhancing user experience and boosting
sales.
NAIVE BAYES CLASSIFIER.
The Naive Bayes classifier is a simple probabilistic machine learning algorithm based
on Bayes' Theorem with an assumption of independence between features. Despite its
simplicity, it is surprisingly effective for classification tasks, especially for text
classification and document categorization.
Working Principle:
1. Bayes' Theorem :
1. Naive Bayes classifier calculates the probability of a class label given a set of features using
Bayes' Theorem.
2. It assumes that the presence of a particular feature is independent of the presence of any
other feature, hence the term "naive".
2. Training :
1. The classifier estimates the probabilities of each class and the conditional probabilities of
each feature given the class labels from the training data.
3. Classification :
1. To classify a new instance, the classifier calculates the probability of each class given the
features using Bayes' Theorem and selects the class with the highest probability.
APPLICATION OF NAIVE BASE
CLASSIFIER
1. Text Classification : Naive Bayes classifiers are widely used for document classification tasks
such as spam detection, sentiment analysis, and topic categorization.
2. Email Filtering : Naive Bayes classifiers are employed in email filtering systems to classify
emails as spam or non-spam based on the presence of specific words or features.
3. Medical Diagnosis : Naive Bayes classifiers aid in medical diagnosis by predicting the
likelihood of diseases based on patient symptoms and diagnostic test results.
4. Recommendation Systems : Naive Bayes classifiers contribute to recommendation systems
by predicting user preferences and suggesting relevant items or content.
5. Customer Segmentation : Naive Bayes classifiers help in segmenting customers based on
their behavior, demographics, and preferences, enabling targeted marketing strategies.
6. Fraud Detection : Naive Bayes classifiers assist in fraud detection by identifying anomalous
patterns in financial transactions or user behavior.
7. Image Classification : Naive Bayes classifiers are used in image classification tasks, such as
identifying objects in images or recognizing handwritten digits.
Advantages of Naive Bayes Classifier:
1. Simple and Fast : Naive Bayes is computationally efficient and easy to implement, making
it suitable for large datasets and real-time applications.
2. Handles High-Dimensional Data : Performs well even with high-dimensional data and
irrelevant features due to its assumption of feature independence.
3. Effective with Small Data : Requires a small amount of training data to estimate
parameters accurately, making it suitable for small datasets.
Disadvantages of Naive Bayes Classifier:
1. Assumption of Feature Independence : The assumption of feature independence may not
hold true in real-world datasets, leading to suboptimal performance.
2. Sensitive to Input Data Quality : Naive Bayes can be sensitive to the quality of input data,
especially when features are highly correlated or contain missing values.
3. Limited Expressiveness : Naive Bayes has limited expressive power compared to more
complex models, which may result in lower accuracy for certain tasks.
REGRESSION
Simple Linear Regression is a statistical method used to model the relationship between
a single independent variable (predictor) and a continuous dependent variable
(response). It assumes a linear relationship between the predictor and the response,
which can be represented by a straight line
SIMPLE LINEAR REGRESSION
Working of Simple Linear Regression:
1. Data Collection : Gather a dataset containing observations of both the independent
variable (predictor) and the dependent variable (response).
2. Model Training : Fit a linear regression model to the data by estimating the values of
�0β0 and �1β1 that minimize the sum of squared errors (SSE) between the observed and
predicted values of �Y.
3. Model Evaluation : Assess the goodness-of-fit of the model using metrics such as the
coefficient of determination (R-squared), mean squared error (MSE), or residual plots.
4. Prediction : Once the model is trained and evaluated, use it to make predictions on new
or unseen data by plugging in values of �X to estimate corresponding values of �Y.
Advantages of Simple Linear Regression:
1. Interpretability : The coefficients �0β0 and �1β1 have clear interpretations, allowing for
easy understanding of the relationship between the variables.
2. Computational Efficiency : Simple linear regression is computationally efficient and can
handle large datasets with ease.
3. Visualization : The linear relationship can be visualized easily using scatter plots and
regression lines.
Disadvantages of Simple Linear Regression:
1. Assumption of Linearity : Simple linear regression assumes a linear relationship between
the predictor and response variables, which may not always hold true in real-world
scenarios.
2. Sensitive to Outliers : Outliers in the data can significantly impact the estimation of
regression coefficients and reduce the accuracy of predictions.
3. Limited Predictive Power : Simple linear regression may not capture complex
relationships between variables, leading to limited predictive power compared to more
sophisticated models.
Logistic Regression
Logistic Regression is a statistical method used for binary classification tasks, where the
outcome variable is categorical with two possible classes (e.g., yes/no, 1/0). Despite its
name, logistic regression is a classification algorithm rather than a regression algorithm. It
models the probability that an instance belongs to a particular class using the logistic
function.
Logistic Function:
The logistic function, also known as the sigmoid function, is the core component of logistic
regression. It is defined as:
Where x is the linear combination of feature values and model coefficients. The
logistic function maps the input (x) to a value between 0 and 1, representing the
probability of belonging to the positive class.
Application:
Logistic regression is widely used in various fields, including:
• Credit scoring
• Disease diagnosis
• Spam filtering
• Market segmentation