0% found this document useful (0 votes)
1 views

Machine_Learning_Notes

Uploaded by

missionsk81
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

Machine_Learning_Notes

Uploaded by

missionsk81
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Machine Learning Detailed Notes (Unit 1 to Unit 6)

### Unit 1: Supervised Learning (Regression/Classification)

1. Distance-Based Methods

Distance-based methods are classification or regression techniques that rely on measuring

the similarity or difference between data points in terms of distance metrics, such as

Euclidean, Manhattan, or Minkowski distances.

Key Idea: Data points closer in distance are assumed to have similar labels or values.

Applications: Used in clustering, nearest-neighbor classification, and anomaly detection.

Advantages: Simple to implement and interpret.

Disadvantages: Sensitive to scaling and irrelevant features, requires normalization.

Example: Predicting house prices by finding houses nearby with similar features.

Formula:

Euclidean Distance = sqrt(sum((x_i - y_i)^2))

2. Nearest Neighbor Methods (k-NN)

Definition: k-Nearest Neighbors (k-NN) is a simple, non-parametric algorithm that classifies

data points based on the labels of their k closest neighbors.

Working:

1. Calculate the distance between the test point and all training points.

2. Sort the distances and select the k nearest neighbors.

3. Assign the most common class label among the neighbors to the test point.

Applications:

- Image recognition.

- Recommender systems.
Advantages: Easy to understand and implement.

Disadvantages:

- High memory usage.

- Slower for large datasets.

Example: Classifying a fruit based on its features like color, size, and shape.

3. Decision Trees

Definition: A decision tree is a flowchart-like structure where nodes represent decisions

or tests on attributes, branches represent outcomes, and leaves represent class labels

or values.

Key Concepts:

- Entropy: Measures impurity in data.

- Information Gain: Reduction in entropy after a split.

Working:

1. Select the attribute with the highest information gain to split the data.

2. Repeat recursively for each subset.

3. Stop when all data is classified or a stopping criterion is met.

Advantages:

- Intuitive and easy to visualize.

- Handles both categorical and numerical data.

Disadvantages:

- Prone to overfitting.

- Unstable with small changes in data.

Example: Loan approval systems.

4. Naive Bayes

Definition: A probabilistic classifier based on Bayes theorem, assuming independence


between features.

Bayes' Theorem:

P(A|B) = [P(B|A) * P(A)] / P(B)

Steps:

1. Compute prior probabilities (P(C)) for each class.

2. Compute likelihood (P(x|C)) for each feature given the class.

3. Compute posterior probability and assign the class with the highest posterior.

Applications:

- Spam detection.

- Sentiment analysis.

Advantages:

- Fast and efficient.

- Works well with high-dimensional data.

Disadvantages: Assumes feature independence, which is rarely true in real-world data.

Example: Classifying emails as spam or not spam.

### Unit 2: Unsupervised Learning

1. Clustering: K-means

Definition: A clustering algorithm that partitions data into k clusters, where each data

point belongs to the cluster with the nearest mean.

Steps:

1. Initialize k cluster centroids randomly.

2. Assign each point to the nearest centroid.

3. Update centroids as the mean of assigned points.

4. Repeat until centroids stabilize.

Applications:
- Customer segmentation.

- Document clustering.

Advantages:

- Easy to implement.

- Scales well with large datasets.

Disadvantages:

- Sensitive to outliers.

- Requires specifying k beforehand.

Example: Grouping customers based on purchasing behavior.

... (Rest of the content for Units 2, 3, 4, 5, and 6 follows) ...

You might also like