ML ModuleUntitled 2
ML ModuleUntitled 2
Key Points:
1. Structure:
- A decision tree is made up of nodes:
- Root Node: The starting point of the tree.
- Decision Nodes: Nodes that split the data based on a feature's value.
- Leaf Nodes: Final nodes representing the predicted class or value
2. Working:
- The algorithm selects the best feature to split the data based on criteria like Gini impurity
or information gain and variance reduction
- The data is split recursively until all data points are classified or a stopping condition is
met (e.g., max depth or min samples per leaf).
3. Advantages:
- Easy to interpret and visualize.
- Can handle both numerical and categorical
data.
- Non-linear relationships can be captured.
Example:
(young vs. old), (low vs. high).
3.Support Vector Algorithm in Machine Learning :
1. Hyperplane:
- A hyperplane is a decision boundary that separates different classes in the feature space.
- For binary classification, it’s a line (in 2D), a plane (in 3D), or a higher-dimensional plane
in higher dimensions.
2. Support Vectors:
- Support vectors are the data points that are closest to the hyperplane and are critical in
defining the position and orientation of the hyperplane.
3. Margin:
- The margin is the distance between the hyperplane and the support vectors.
- SVM seeks to maximize the margin, ensuring the best possible separation between the
classes, which leads to better generalization on unseen data.
4. Kernel Trick:
- SVM can perform non-linear classification using the kernel trick.
- Common kernels include the linear kernel, polynomial kernel, and radial basis function
(RBF) kernel.
Mathematical Formulation
For a set of training data {(xi,yi)}\{(x_i, y_i)\}{(xi,yi)} where xi∈Rnx_i \in \mathbb{R}^nxi∈Rn
are feature vectors, and yi∈{+1,−1}y_i \in \{+1, -1\}yi∈{+1,−1} are the class labels, SVM tries
to find the following:
• Sparse learning techniques are frequently used for feature selection in high-
dimensional datasets, such as text data genomics, and finance. In these cases, the
number of features can easily exceed the number of samples, making feature
selection crucial to avoid overfitting.
(e.g., selecting only the most relevant keywords for spam detection or sentiment analysis).
2. Genomic Data Analysis:
• Sparse learning is applied to financial and economic modeling, where the goal is
to select the most relevant variables (e.g., stock prices, market indicators) from
a large set of potential predictors.
• Example: In credit scoring, sparse models can help select the most significant
financial factors that predict the likelihood of a customer defaulting on a loan.
5.The K-Nearest Neighbors Algorithm:
1. Input Data:
• A labeled training dataset with feature vectors X={x1,x2,…,xn}X = \{x_1, x_2, \dots,
x_n\}X={x1,x2,…,xn}, where each xix_ixi represents a data point.
• The target variable (class label or continuous value), y={y1,y2,…,yn}y = \{y_1, y_2,
\dots, y_n\}y={y1,y2,…,yn}.
• A query point xqx_qxq that we want to classify or predict.
2. Choose the Value of K:
• Select the number of nearest neighbors kkk that you want to consider. Common
choices for kkk are odd numbers (to avoid ties in classification).
• The choice of kkk influences the model's performance. A small kkk makes the model
sensitive to noise, while a large kkk can smooth out the decision boundary.
3. Calculate the Distance:
• Measure the distance between the query point xqx_qxq and all points in the training
dataset. The most common distance metric is Euclidean distance, but other distance
metrics like Manhattan or Minkowski distance can also be used.
1.Types of Clustering Algorithms: (4th unit)
Clustering algorithms can be broadly categorized into several types, based on their approach
to grouping the data. Here are the main types of clustering:
1. Partitioning Clustering:
• Partitioning methods divide the dataset into k clusters where k is predefined. Each
data point is assigned to exactly one cluster.
• These algorithms aim to minimize a certain criterion (e.g., the sum of squared
distances between points and the center of their clusters).
2. Hierarchical Clustering:
• Clustering is used to partition an image into regions with similar pixel values, which
can then be analyzed for specific features or objects.
2.Artitioning Methods in Machine Learning
Partitioning methods are a type of clustering algorithm in machine learning where the dataset
is divided into distinct groups or clusters based on certain criteria. Each data point is assigned
to exactly one cluster. These methods aim to partition the data in such a way that points
within a cluster are more similar to each other than to those in other clusters.
Key Characteristics of Partitioning Methods:
• The number of clusters k is usually predefined.
• Each point is assigned to exactly one cluster.
• The algorithm iteratively refines the clusters to minimize an objective function (like
intra-cluster variance).
Types of Partitioning Methods
Here are the most common partitioning methods in machine learning:
1. K-Means Clustering
• Description: K-Means is the most widely used partitioning algorithm.
• Choose kkk initial medoids randomly from the dataset. These are the starting
points for the clusters.
2. Assignment Step:
• Assign each data point to the closest medoid. The "closeness" is typically
measured using a distance metric such as Euclidean distance or Manhattan
distance. Each data point is assigned to the medoid that minimizes the distance.
3. Update Step:
• For each cluster, find the new medoid. The new medoid is the point within the
cluster that minimizes the total distance to all other points in that cluster.
Essentially, you compute the sum of distances between every point in the cluster
and each other point, and pick the point that minimizes this sum.
4. Repeat:
• Repeat the assignment and update steps until the medoids no longer change or a
predefined number of iterations is reached.
Advantages of K-Medoids:
1. Robust to Outliers: Unlike K-Means, K-Medoids is less sensitive to outliers because the
medoid is an actual data point, not an average.
2. Flexibility with Distance Metrics: K-Medoids can use a variety of distance metrics,
making it suitable for non-Euclidean spaces.
Disadvantages of K-Medoids:
1. Computationally Expensive: K-Medoids can be more computationally intensive than
K-Means, especially when dealing with large datasets, because it involves checking all
points within a cluster to determine the medoid.
2. Scalability: K-Medoids is less scalable than K-Means, especially for very large datasets,
because it requires calculating the sum of distances for every possible pair of points in
each cluster.
Advantages of Hierarchical Clustering:
1. No Need to Specify kkk: Unlike K-Means or K-Medoids, hierarchical clustering
doesn’t require you to specify the number of clusters in advance. You can decide the
number of clusters by cutting the dendrogram at a desired level.
2. Flexible Distance Measures: Hierarchical clustering allows you to use various distance
metrics (e.g., Euclidean, Manhattan, cosine similarity), making it adaptable to different
types of data.
Disadvantages of Hierarchical Clustering:
1. Sensitivity to Noise: Hierarchical clustering can be sensitive to noise and outliers,
particularly in agglomerative clustering, because once a point is merged into a cluster,
it cannot be reassigned.
2. Difficult for Large Datasets: While hierarchical clustering is great for small to medium
datasets, its computational cost makes it less suitable for very large datasets.