0% found this document useful (0 votes)
4 views

BA Notes[End Sem)

The document discusses relational analysis, focusing on correlation and regression as methods to examine relationships between variables. It explains correlation types (positive, negative, no correlation) and contrasts covariance with correlation, emphasizing the importance of regression techniques like simple and multiple linear regression. Additionally, it covers key metrics for evaluating regression models, including Residual Standard Error (RSE) and R-squared (R²), and introduces various classification techniques such as K-Nearest Neighbors, Decision Trees, Naive Bayes, and Random Forest.

Uploaded by

keiraruthlobo
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

BA Notes[End Sem)

The document discusses relational analysis, focusing on correlation and regression as methods to examine relationships between variables. It explains correlation types (positive, negative, no correlation) and contrasts covariance with correlation, emphasizing the importance of regression techniques like simple and multiple linear regression. Additionally, it covers key metrics for evaluating regression models, including Residual Standard Error (RSE) and R-squared (R²), and introduces various classification techniques such as K-Nearest Neighbors, Decision Trees, Naive Bayes, and Random Forest.

Uploaded by

keiraruthlobo
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 26

Relational analysis – used to examine relationships between variables to determine how they are

associated with each other and whether changes in one variable affect another.

Types of relational analysis – correlation & regression

Correlation - a statistical measure that expresses the extent to which two variables are linearly
related(how strongly related two variables are). It indicates whether an increase or decrease in one
variable corresponds to an increase or decrease in another variable.

The Person Product movement correlation coefficient (r) ranges from -1 to 1

(see from notebook first)

Positive Correlation (+): Both variables move in the same direction.


Example: More study hours lead to higher exam scores, sales & Marketing

Negative Correlation (-): Variables move in opposite directions.


Example: Increase in stress leads to a decrease in productivity, Profit & Cost

No Correlation (0): No relationship between the variables.


Example: Shoe size and intelligence.

Correlation and Covariance :

Covariance is scale dependent- that’s why cov not used for relational analysis, correlation is scale
independent that’s why cor is used for relational analysis

Aspect Covariance Correlation

Definition Measures the direction of the Measures the strength and direction of the
relationship between two variables. relationship between two variables.

Range No fixed range; values can be very large or very small. Always between -1 and 1.

Hard to interpret because it's dependent on the Easy to interpret due to


Interpretation
scale of data. standardized values.

Scale Affected by the units of the variables (e.g., height in Unit-free; not affected by
Dependence cm vs. inches changes covariance). measurement scale.

Regression: Regression is a statistical technique used to model and analyze the relationship between
a dependent variable (Y) and one or more independent variables (X).

Equation: y=mx+c

Where:

 y: Dependent variable (outcome or prediction)

 x: Independent variable (predictor)

 m: Slope of the line (rate of change of y with respect to x)

 c: Intercept (value of y when x = 0)

Types of Regression

1. Simple Linear Regression:

A statistical method used to predict the value of a dependent variable (Y) based on a single
independent variable (X).

Equation:

Y=mx+c

Where:

2. Y: Dependent variable (what we want to predict)


3. X: Independent variable (predictor)
4. m: Slope of the line (effect of X on Y)
5. c: Intercept (value of Y when X = 0)

Example: Predicting Salary (Y) based on Years of Experience (X).


Multiple Linear Regression:

A statistical method used to predict the value of a dependent variable (Y) based on two or more
independent variables (X₁, X₂, ... Xₙ).

Equation:

Y=m1x1 + m2x2 + …… + mnxn + c

Where:

 Y: Dependent variable (what we want to predict)

 X1, X2 ,...Xn: Independent variables (predictors)

 M1, m2, mn : Regression coefficients (effect of each X on Y)

 c: Intercept (value of Y when all X's are 0)

Example:

 Predicting House Prices (Y) based on Size (X₁), Location (X₂), and Number of Rooms (X₃).

Example: Predicting House Prices using Multiple Linear Regression

Let’s predict House Prices (Y) based on:

 Size (X₁): Measured in square feet.

 Location Score (X₂): A score (1-10) indicating how desirable the location is.

 Number of Rooms (X₃): Total number of rooms in the house.

Start studyinh😊- Residual Standard Error (RSE) and R² (R-Squared)Residual Standard Error (RSE)
and R² (R-Squared) are key metrics used to evaluate the performance of a regression model.

Residual Sum of Squares (RSS)

📖 Definition:

The Residual Sum of Squares (RSS) measures the total squared difference between the observed
values and the values predicted by the regression model. It quantifies the amount of variation in
the dependent variable (Y) that remains unexplained by the regression model.
Not imp just read - ✅ Explanation:

1. yᵢ (Actual Value): The real data point from the dataset.

2. ŷᵢ (Predicted Value): The value predicted by the regression model for the corresponding
independent variable(s).

3. (yᵢ - ŷᵢ): The difference between the actual and predicted values is called the residual.

4. (yᵢ - ŷᵢ)²: Squaring each residual ensures all deviations are positive and penalizes larger
errors more heavily.

5. Σ: Summing up all the squared residuals gives the Residual Sum of Squares (RSS).

📊 Why is RSS Important? RSS should be as low as possible

 Model Fit: A lower RSS indicates a better-fitting model, as the predictions are closer to the
actual values.

 Error Measurement: It helps measure the unexplained variance by the regression model.

 Optimization: In regression analysis, the goal is often to minimize RSS to improve model
accuracy.
🚀 Key Takeaways:

1. Lower RSS: Better model fit.

2. Higher RSS: Poor model fit, indicating larger prediction errors.

3. Minimizing RSS: The primary goal of most regression models to achieve accuracy.

1️⃣ Residual Standard Error (RSE)

📖 Definition:

Residual Standard Error measures the average deviation of the observed values from the regression
line. It quantifies how well the regression model fits the data.

🧠 Formula:

✅ Interpretation:
 Low RSE: The model fits the data well (small average error).

 High RSE: The model doesn’t fit the data well (large average error).

 RSE is measured in the same units as the dependent variable (Y).

Example:
If the RSE is 2.5, it means the average deviation of observed data points from the predicted
regression line is approximately 2.5 units.

2️⃣ R² (R-Squared)

📖 Definition:

R² measures the proportion of variance in the dependent variable (Y) that can be explained by the
independent variables (X). It indicates the goodness of fit of the regression model.

✅ Interpretation:

 R² = 1: Perfect fit (model explains 100% of the variance).

 R² = 0: Model explains none of the variance.

 Higher R²: Better the model fits the data.

Example:
If R² = 0.85, it means 85% of the variation in the dependent variable (Y) is explained by the
independent variables (X). The remaining 15% is due to random error or unobserved factors.

 RSE tells you how far off your predictions are, on average.

 R² tells you how well your independent variables explain the variability in the dependent
variable.

3 Types of Business Analytics: Descriptive, Predictive, and Prescriptive Analytics

1. Descriptive Analytics: Understand the past.

2. Predictive Analytics: Forecast the future.

3. Prescriptive Analytics: Make optimal decisions.

Rest part from notebook it is chotu only


 Predictive analytics uses ML.

 Supervised Learning helps in making predictions or classifications based on labeled data


(e.g., predicting fraud or loan defaults).

 Unsupervised Learning is used to find patterns and relationships in data without pre-labeled
outcomes (e.g., customer segmentation and anomaly detection).

 Supervised Learning

In supervised learning, the algorithm is trained on a labeled dataset.

Supervision: The training data (observations, measurements, etc.) are accompanied by labels
indicating the class of the observations

 New data is classified based on the training set

Examples from notebook -

Unsupervised Learning

In unsupervised learning, the algorithm is given data without labels and must find hidden patterns or
relationships in the data.

 The class labels of training data is unknown

 Given a set of measurements, observations, etc. with the aim of establishing the
existence of classes or clusters in the data

Examples from ntbk

Let's dive deeper into each of the K-Nearest Neighbors (KNN), Decision Trees, Naive Bayes, and
Random Forest, including how to write rules for predictions and interpret metrics like accuracy,
sensitivity, and confusion matrix.

Classification Techniques:

1. K-Nearest Neighbors (KNN)

KNN is classification algorithm that assigns a class to a data point based on the majority class of its
nearest neighbors. It assumes that similar data points are located near each other and can be
grouped based on their proximity.

Steps for Prediction (KNN):

1. Select the number of neighbors (k): Choose the number of nearest neighbors (e.g., k = 3).
2. Compute the distance: Calculate the distance between the the test point and all training
points(using Euclidean distance, Manhattan distance, etc.).

3. Sort the distances:.

4. Select the nearest neighbors: Pick the top k nearest neighbors.

5. Vote for the class: The class assigned to the test point is determined by the majority vote. If
there's a tie, you can choose the class that appears first.

6. Make Prediction:

Suppose you're classifying whether a customer will default on a loan based on their credit score and
income.

 Data:

o Customer 1: (Credit Score: 650, Income: 50k) → "No Default"

o Customer 2: (Credit Score: 700, Income: 60k) → "No Default"

o Customer 3: (Credit Score: 750, Income: 80k) → "No Default"

o Customer 4: (Credit Score: 620, Income: 40k) → "Default"

 Test Point: (Credit Score: 630, Income: 45k)

 Stepwise Prediction:

1. Calculate the distance from the test point to all training points.

2. Choose k = 3 (i.e., 3 nearest neighbors).

3. Neighbors for test point: Customer 4 (Default), Customer 1 (No Default), Customer 2
(No Default).

4. Majority vote: "No Default" (2 out of 3 neighbors).

5. Predict: "No Default."

2 main requirements, advantages, disadvantages – from ntbk

Metrics:

 Confusion Matrix:

 See from noteebook


The confusion matrix summarizes the performance of the classification algorithm


2. Decision Trees

A decision tree is a hierarchical model that splits data into subsets based on feature values, forming a
tree structure. Each node represents a feature, and each branch represents a decision rule.

Steps for Prediction (Decision Tree):

1. Select the feature to split on: At each node, select the feature that best splits the data.
Common methods include Gini impurity or Information Gain (Entropy).

2. Split the data: Based on the feature selected, divide the dataset into branches that represent
possible outcomes.

3. Repeat the process: Continue splitting recursively until a stopping condition is met (e.g., all
points belong to the same class, or the tree reaches a predefined depth).

4. Assign class labels: At each leaf node, assign the most frequent class from the data points in
that leaf.

5. Prediction: Starting from the root, follow the decision rules until a leaf node is reached. The
class label at the leaf node is the prediction.

Example:

Consider a decision tree to predict whether a customer will buy insurance, based on age and income.

 Data:

o Customer 1: Age = 30, Income = 50k → "Buy"

o Customer 2: Age = 45, Income = 80k → "Buy"

o Customer 3: Age = 22, Income = 30k → "No Buy"

 Stepwise Prediction:

1. Select a feature to split: "Age" (split at 30 years).

2. For customers with Age ≤ 30, predict "No Buy" (majority class).

3. For customers with Age > 30, split on "Income".

4. If Income ≤ 60k, predict "No Buy".

5. If Income > 60k, predict "Buy".

( take from notebook).

4. Naive Bayes
Naïve Bayes theorem is a probabilistic classification technique based on Bayes’ theorem,
assuming independence between features. It is widely used for classification tasks, such as
spam detection, sentiment analysis, and risk prediction.

Steps for Prediction (Naive Bayes):

1. Calculate prior probabilities


2. Calculate likelihoods:

3. Apply Bayes' Theorem/Calculate posterior probability: Use Bayes' formula to compute the
posterior probability for each

Example:

For spam email classification, suppose we have the following data:

 Spam: 60%

 Not Spam: 40%

The likelihood of certain words appearing in each type of email (spam or not) is calculated. If the
word "free" appears in a new email, the probability of that email being spam is computed using
Bayes' Theorem.

Advantages , disadvantages from ppt

U can do this if u want – explanation of Sir’s example attached in ppt – read through –
4. Random Forest

Random Forest is an ensemble learning method that builds multiple decision trees and combines
their predictions to improve accuracy.

Steps for Prediction (Random Forest):

1. Bootstrap Sampling: Create multiple datasets by randomly sampling from the original
training set (with replacement).

2. Train multiple decision trees: Train a decision tree on each of the bootstrap datasets.

3. Predict using individual trees: Each tree makes a prediction for a test point.

4. Combine the predictions: For classification, use a majority vote to decide the final class
prediction.

5. Prediction: The class chosen by the majority of the trees is the predicted class.

Example:

Similar to the decision tree example above, but you would train multiple trees on different subsets of
data. Each tree might make slightly different predictions, but the overall prediction is the majority
vote.

The slide provides an overview of Random Forest, a popular ensemble learning method introduced
by Leo Breiman in 2001. Below is a detailed explanation of its key concepts:

What is Random Forest?

Random Forest is an ensemble learning technique that builds multiple decision tree classifiers and
aggregates their predictions to improve accuracy and reduce overfitting. The main idea behind
Random Forest is:

1. Each decision tree in the ensemble is built using a random subset of attributes at each node.

2. During classification, each tree provides a vote, and the majority class is chosen as the final
prediction.

Methods to Construct Random Forest

There are two main methods to build a Random Forest:


1. Forest-RI (Random Input Selection)

o At each node, a random subset of F attributes is chosen as candidates for splitting.

o The CART (Classification and Regression Trees) methodology is used to grow the
decision trees.

o The trees are fully grown (maximum depth) without pruning.

2. Forest-RC (Random Linear Combinations)

o Instead of using the original attributes directly, this method creates new attributes.

o These new attributes are linear combinations of existing ones.

o This reduces the correlation between decision trees, making the model more diverse
and robust.

Advantages of Random Forest

 Accuracy & Robustness: Comparable to AdaBoost, but more resistant to noise and outliers.

 Efficiency: Works well even if many attributes are used, and it is faster than bagging or
boosting.

 Feature Selection: Insensitive to the number of attributes chosen for each split, making it
efficient in high-dimensional datasets.

In summary, Random Forest is a powerful and widely used machine learning algorithm that balances
accuracy, robustness, and efficiency while reducing overfitting compared to single decision trees.

Clustering and K-Means Algorithm

What is Clustering?

Clustering is an unsupervised learning technique used to group a set of objects or data points
into clusters (groups) based on similarity. The goal is to ensure that data points within the same
cluster are similar to each other, while data points from different clusters are as dissimilar as
possible.

K-Means Clustering Algorithm

K-Means is one of the most popular and widely used clustering algorithms. It divides data points
into K distinct clusters based on their features.

Steps in the K-Means Algorithm:


The K-Means Clustering Method

 K-Means Algorithm Steps: (from ppt)

1. Divide data into k subsets.

2. Compute centroids (mean points of each cluster).

3. Assign each data point to the closest centroid.

4. Repeat until assignments remain unchanged.

Example of K-Means Clustering:

Suppose we have a dataset with customer information, such as age and income, and we want to
segment customers into 3 groups for targeted marketing.

Dataset:

Customer ID Age Income (in 1000s)

1 25 50

2 45 60

3 30 45
Customer ID Age Income (in 1000s)

4 50 90

5 35 55

6 40 85

Step-by-Step K-Means Algorithm:

1. Choose K = 2 (we want to segment customers into two groups).

2. Initialize Centroids: Let's say the initial centroids are randomly chosen as (25, 50) and (45,
90).

3. Assign Points to Nearest Centroid: Calculate the Euclidean distance between each
customer’s data point and the centroids:

o Customer 1 is closer to centroid 1 (25, 50), so it's assigned to cluster 1.

o Customer 2 is closer to centroid 2 (45, 90), so it's assigned to cluster 2.

o Repeat for all customers.

4. Update Centroids: After assignment, update the centroids by calculating the mean of all
points in each cluster:

o Cluster 1: Average of (25, 50), (30, 45), (35, 55) → New Centroid (30, 50)

o Cluster 2: Average of (45, 60), (50, 90), (40, 85) → New Centroid (45, 78.33)

5. Repeat: Reassign points to the nearest centroid and recalculate centroids until the centroids
do not change.

Final Clusters:

 Cluster 1: Customers with Centroid (30, 50) → Likely young, low-income customers.

 Cluster 2: Customers with Centroid (45, 78.33) → Likely older, higher-income customers.

Choosing the Right Value of K (K-Value):

Selecting the optimal number of clusters (K) can be tricky. A common method for determining
KKK is the Elbow Method:

1. Plot the WCSS (within-cluster sum of squares) for different values of K (e.g., from 1 to 10).

2. Look for the "elbow" point in the graph, where the reduction in WCSS begins to level off. This
is often considered the optimal value for K.

9. Comments on K-Means Method

 Strengths:

o Efficient: Complexity is O(tkn) (where t = iterations, k = clusters, n = data points).


o Faster than other methods (e.g., PAM, CLARA).

 Weaknesses:

o Works only on continuous numerical data.

o Sensitive to noise and outliers.

o Requires k in advance.

o Cannot detect non-convex cluster shapes.

Problems with K-Means

 Sensitive to Outliers:

o Extreme values can distort cluster distribution.

o Solution: Use K-Medoids instead, as it selects actual data points (medoids) instead
of mean values.

12. PAM: A Typical K-Medoids Algorithm

 K-Medoids: Instead of taking the mean value of the object in a cluster as a reference point,
medoids can be used, which is the most centrally located object in a cluster

 Partition Around Medoids (PAM) Steps:

1. Select k medoids randomly.

2. Assign each object to the closest medoid.

3. Swap medoids to reduce clustering cost.

4. Stop when no better swap is found.


 K-Medoids Clustering: Find representative objects (medoids) in clusters

 PAM (Partitioning Around Medoids, Kaufmann & Rousseeuw 1987)

 Starts from an initial set of medoids and iteratively replaces one of the
medoids by one of the non-medoids if it improves the total distance of the
resulting clustering

 PAM works effectively for small data sets, but does not scale well for large
data sets (due to the computational complexity)
15. Hierarchical Clustering

 Unlike partition-based clustering (K-Means, K-Medoids), hierarchical clustering builds a


tree-like structure (dendrogram).

 Two main approaches:

o Agglomerative (Bottom-Up):

 Start with individual data points as separate clusters.

 Iteratively merge the closest clusters until only one cluster remains.

o Divisive (Top-Down):

 Start with one large cluster containing all points.

 Recursively split clusters until each point is its own cluster.

AGNES – FIRST POINTS FROM PPT

THEN -  Merging Process:

 It starts with each data point as its own cluster.

 The two clusters with the least dissimilarity (most similarity) are merged.

 This process continues in a non-descending fashion until all points belong to the same
cluster.
 Illustration:

 The three images at the bottom show how the clustering progresses:

o Left Image: Many small individual clusters.

o Middle Image: Some clusters have merged.

o Right Image: Even larger clusters have formed as the process continues.

Dendogram :

 Dendrogram as a tree of clusters:

o The hierarchical structure of the clustering process is represented as a tree.

o Data objects (points) are decomposed into different levels of nested partitions.

 Clustering by cutting the dendrogram:

o At each level of the tree, clusters merge based on their similarity.

o To obtain the final clusters, we can cut the dendrogram at a desired level.

o The connected components (subtrees) after cutting form the final clusters.

Illustration:

 The dendrogram diagram at the bottom visually represents the merging process.

 Each green circle represents a data point.

 As we move up the tree, clusters merge.

 If we cut at a specific horizontal level, the remaining branches form separate clusters

Niche examples views baaki ka not imp –

Summary of K-Means Algorithm:

Aspect Details

Type of Algorithm Unsupervised Learning (Clustering)

Goal Group similar data points into K clusters

Choose K, Initialize Centroids, Assign Points, Update Centroids, Repeat


Key Steps
until Convergence

Distance Metric Euclidean Distance (default)

Common Use
Customer segmentation, Image compression, Anomaly detection
Cases
Aspect Details

Advantages Simple, efficient, scalable

Sensitive to initialization, requires choosing K, struggles with non-


Disadvantages
spherical clusters

Evaluation Elbow method to choose K, WCSS (Within-cluster sum of squares)

4o mini

Business Scenarios classification examples:

Example 1: Insurance Claim Fraud Detection (Business Classification)

Business Problem:

An insurance company wants to predict whether an insurance claim is fraudulent or not, based on
various features of the claim and the claimant. This helps reduce the number of fraudulent claims
the company processes.

Features:

 Claim Amount: The total amount being claimed by the customer

 Customer Age: Age of the person making the claim

 Policy Type: Type of insurance policy held by the customer (e.g., health, auto)

 Claim History: Number of claims made by the customer in the past

 Geographic Location: Location of the customer (can help detect regional fraud trends)

 Claim Time: The time it took to process the claim (a large delay might indicate suspicious
behavior)

Target Class:

 Fraudulent Claim: Whether the claim is fraudulent (binary: Fraud, Legitimate)

Classification Technique: Random Forest

Steps for Prediction:

1. Prepare the data: Collect data from past claims, where the fraudulent claims are labeled.
The data should contain information like claim amount, policy type, claim history, etc.

2. Train the model: Use random forest, which is an ensemble of decision trees, to train the
model. Each decision tree will make a classification (fraud or legitimate), and the final
prediction will be the majority vote from all the trees.
3. Predict new claims: For new claims, the trained random forest model will predict whether
the claim is likely fraudulent or legitimate.

Interpretation:

 Accuracy: The proportion of claims correctly classified as fraudulent or legitimate.

 Sensitivity (Recall): The proportion of actual fraudulent claims correctly identified.

 Confusion Matrix: Will show how many fraudulent claims were detected, how many
legitimate claims were mistakenly identified as fraudulent, etc.

Example 2: Customer Churn Prediction (Business Classification in Telecom)

Business Problem:

A telecom company wants to predict whether a customer is likely to churn (i.e., leave the company)
based on their account activity. This allows the company to take preventive action, such as offering
promotions or discounts.

Features:

 Customer Tenure: How long the customer has been with the company

 Monthly Charges: The amount the customer is charged monthly for their telecom services

 Service Type: Whether the customer uses mobile, internet, or bundle services

 Customer Support Calls: The number of customer support calls made by the customer in the
past month

 Payment Method: Whether the customer pays via direct debit, credit card, or other methods

 Contract Type: Whether the customer is on a one-year, two-year, or month-to-month


contract

Target Class:

 Churn: Whether the customer will churn (binary: Churn, Stay)

Classification Technique: Decision Tree

Steps for Prediction:

1. Prepare the data: Gather data from past customers, including those who churned and those
who stayed. Label each customer accordingly.

2. Train the model: Use a decision tree algorithm to learn patterns in the features that are
correlated with churn. The tree will split customers into different branches based on features
like tenure, service type, etc.

3. Prediction: For new customers, the decision tree will predict whether they are likely to churn
based on the features in their profile.

Interpretation:
 Accuracy: The percentage of correct predictions made by the decision tree (i.e., correctly
identifying churners and non-churners).

 Sensitivity: How well the model detects customers who will churn.

 Confusion Matrix: Helps to understand how well the model is distinguishing between
customers who churn and those who stay.

Here’s a clear comparison in tabular format outlining the fundamental differences between
Classification vs Regression, Supervised vs Unsupervised Learning, and Labeled vs
Unlabeled Data:

Aspect Classification Regression

Predict a categorical label or Predict a continuous value or a


Goal
class. real number.

Discrete classes (e.g.,


Continuous value (e.g., house
Output "Spam", "Not Spam",
price, temperature, salary)
"Default", "No Default")

Type of
Categorical problem. Numerical problem.
Problem

Email spam detection, loan House price prediction, stock price


Example
default prediction. prediction.

Evaluation Accuracy, Precision, Recall, Mean Absolute Error (MAE), Mean


Metric F1-Score, Confusion Matrix. Squared Error (MSE), R-squared.

Logistic Regression, Linear Regression, Decision Trees,


Model Decision Trees, SVM, Random Forest, Support Vector
Types Random Forest, Neural Regression (SVR), Neural
Networks. Networks.

Aspect Supervised Learning Unsupervised Learning

The algorithm is trained on The algorithm works with


Definition labeled data (input-output unlabeled data, finding
pairs). patterns and structures.

Labeled data (input features + Unlabeled data (only input


Data Type
target labels). features, no labels).

Learn a mapping between input Discover hidden patterns,


Goal
and output to make predictions. structures, or groups in data.

Data grouped into clusters or


Known labels or values for new,
Output reduced dimensions, patterns
unseen data.
identified.

Example Classification (spam detection, Clustering (customer


Aspect Supervised Learning Unsupervised Learning

fraud detection), Regression segmentation, market basket


(price prediction, temperature analysis), Dimensionality
forecasting). reduction (PCA, t-SNE).

Accuracy, Precision, Recall, F1-


Silhouette Score, Davies-
Evaluation Score, Confusion Matrix (for
Bouldin Index, Inertia, Elbow
Metric classification); MAE, MSE, R-
Method.
squared (for regression).

K-means, DBSCAN,
Linear Regression, Logistic
Model Hierarchical Clustering,
Regression, SVM, Decision
Types Principal Component Analysis
Trees, Neural Networks.
(PCA).

Aspect Labeled Data Unlabeled Data

Data that includes both input Data that contains only


Definition features and corresponding output input features, without any
labels (target variable). associated target labels.

Used in unsupervised
Used in supervised learning for
learning for clustering,
Usage both classification and regression
dimensionality reduction,
tasks.
etc.

Email spam detection dataset


Customer data without
(emails with labels "Spam" or "Not
Data labels, e.g., raw transaction
Spam"), Loan approval data
Example data, social media posts, or
(customers with labels "Approved"
sensor data.
or "Rejected").

The model is trained using the The model tries to infer the
labeled data to learn the structure of the data by
Process
relationship between inputs and identifying patterns or
outputs. similarities in input data.

Decision Trees, SVM, KNN, K-means, DBSCAN,


Key
Linear/Logistic Regression, Neural Hierarchical Clustering, PCA,
Algorithms
Networks. t-SNE.

Customer segmentation (no


Customer churn prediction (with
Example labels, but grouped based
"Churn" or "No Churn" as labels).
on behavior or features).

Summary:
 Classification vs Regression: Classification deals with categorical outputs, while regression
predicts continuous numerical values.

 Supervised vs Unsupervised Learning: Supervised learning involves labeled data to train


models for prediction, while unsupervised learning works with unlabeled data to discover
patterns or structures.

 Labeled vs Unlabeled Data: Labeled data has target labels associated with it, whereas
unlabeled data lacks target labels and is often used in unsupervised learning methods.

You might also like