0% found this document useful (0 votes)

31 views39 pages

ML Unit-4-1

The document outlines a syllabus for a course on Artificial Intelligence and Data Science, focusing on machine learning concepts including supervised and unsupervised learning, ensemble methods, and neural networks. It details various clustering techniques, particularly k-means, and their applications in areas such as image segmentation and business intelligence. The document also discusses the limitations of k-means and provides example code for its implementation in image processing.

Uploaded by

N200831 SHAIK ABDUL REHAMAN

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views39 pages

ML Unit-4-1

Uploaded by

N200831 SHAIK ABDUL REHAMAN

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

Artificial Intelligence and Data Science (AI & DS)

Home

Machine Learning [B20AD3201]

Syllabus
Unit-1

Introduction- Artificial Intelligence, Machine Learning, Deep Learning, Types of Machine

Learning Systems, Main Challenges of Machine Learning. Statistical Learning: Introduction,
Supervised and Unsupervised Learning, Training and Test Loss, Tradeoffs in Statistical
Learning, Estimating Risk Statistics, Sampling distribution of an estimator, Empirical Risk
Minimization.

Unit-2

Supervised Learning (Regression/Classification): Basic Methods: Distance-based Methods,

Nearest Neighbours, Decision Trees, Naive Bayes, Linear Models: Linear Regression, Logistic
Regression, Generalized Linear Models, Support Vector Machines, Binary Classification:
Multiclass/Structured outputs, MNIST, Ranking.

Unit-3

Ensemble Learning and Random Forests: Introduction, Voting Classifiers, Bagging and
Pasting, Random Forests, Boosting, Stacking. Support Vector Machine: Linear SVM
Classification, Nonlinear SVM Classification SVM Regression, Naïve Bayes Classifiers.

Unit-4

Unsupervised Learning Techniques: Clustering, K-Means, Limits of K-Means, Using

Clustering for Image Segmentation, Using Clustering for Pre processing, Using Clustering for
Semi-Supervised Learning, DBSCAN, Gaussian Mixtures. Dimensionality Reduction: The
Curse of Dimensionality, Main Approaches for Dimensionality Reduction, PCA, Using Scikit-
Learn, Randomized PCA, Kernel PCA.

Unit-5

Neural Networks and Deep Learning: Introduction to Artificial Neural Networks with Keras,
Implementing MLPs with Keras, Installing Tensor Flow 2, Loading and Preprocessing Data
with Tensor Flow.

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
Unit-4

Unsupervised Learning Techniques: Clustering, K-Means, Limits of K-Means, Using

Clustering
Cluster analysis or simply clustering is the process of partitioning a set of data objects (or
observations) into subsets. Each subset is a cluster, such that objects in a cluster are similar to
one another, yet dissimilar to objects in other clusters. The set of clusters resulting from a
cluster analysis can be referred to as a clustering. In this context, different clustering methods
may generate different clusterings on the same data set. The partitioning is not performed by
humans, but by the clustering algorithm. Hence, clustering is useful in that it can lead to the
discovery of previously unknown groups within the data.
Cluster analysis has been widely used in many applications such as business intelligence,
image pattern recognition, Web search, biology, and security.
• In business intelligence, clustering can be used to organize a large number of customers
into groups, where customers within a group share strong similar characteristics. This
facilitates the development of business strategies for enhanced customer relationship
management.
• In image recognition, clustering can be used to discover clusters or “subclasses” in
handwritten character recognition systems. Suppose we have a data set of handwritten
digits, where each digit is labeled as either 1, 2, 3, and so on. Note that there can be a
large variance in the way in which people write the same digit. Take the number 2, for
example. Some people may write it with a small circle at the left bottompart, while
some others may not. We can use clustering to determine subclasses for “2,” each of
which represents a variation on the way in which 2 can be written. Using multiple
models based on the subclasses can improve overall recognition accuracy.
• Clustering has also found many applications in Web search. For example, a keyword
search may often return a very large number of hits (i.e., pages relevant to the search)

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
due to the extremely large number of web pages. Clustering can be used to organize the
search results into groups and present the results in a concise and easily accessible way.

Cluster is a collection of data objects that are similar to one another within the cluster and
dissimilar to objects in other clusters, a cluster of data objects can be treated as an implicit
class. In this sense, clustering is sometimes called automatic classification. Again, a critical
difference here is that clustering can automatically find the groupings. This is a distinct
advantage of cluster analysis.
Clustering is also called data segmentation in some applications because clustering partitions
large data sets into groups according to their similarity. Clustering can also be used for outlier
detection, where outliers (values that are “far away” from any cluster) may be more interesting
than common cases. Applications of outlier detection include the detection of credit card fraud
and the monitoring of criminal activities in electronic commerce.

Types of Clustering:
1. Partitioning methods.
• k-Means: A Centroid-Based Technique
• k-Medoids: A Representative Object-Based Technique
• CLARANS (Clustering Large Applications based upon RANdomized Search)
2. Hierarchical methods.
• Agglomerative versus Divisive Hierarchical Clustering.
• Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH).
• Chameleon: Multiphase Hierarchical Clustering Using Dynamic Modeling.
3. Density-based methods
• DBSCAN (Density-Based Spatial Clustering of Applications with Noise).
• OPTICS: Ordering Points to Identify the Clustering Structure.
• OPTICS: Ordering Points to Identify the Clustering Structure.
4. Grid-based methods
• STING: STatistical INformation Grid
• CLIQUE (CLustering In QUEst)

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
k-Means
Suppose a data set, D, contains n objects in Euclidean space. Partitioning methods distribute
the objects in D into k clusters, C1, : : : ,Ck. An objective function is used to assess the
partitioning quality so that objects within a cluster are similar to one another but dissimilar to
objects in other clusters. This is, the objective function aims for high intracluster similarity and
low intercluster similarity.
A centroid-based partitioning technique uses the centroid of a cluster, Ci , to represent that
cluster. Conceptually, the centroid of a cluster is its center point. The centroid can be defined
in various ways such as by the mean or medoid of the objects (or points) assigned to the cluster.
The difference between an object p Ci and ci, the representative of the cluster, is measured
by dist(p, ci), where dist(x,y) is the Euclidean distance between two points x and y. The quality
of cluster Ci can be measured by the with in cluster variation, which is the sum of squared
error between all objects in Ci and the centroid ci, defined as

where E is the sum of the squared error for all objects in the data set; p is the point in space
representing a given object; and ci is the centroid of cluster Ci.

k-means algorithm:

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
Example: Consider a set of objects located in 2-D space, as depicted in Figure (a). Let k =3,
that is, the user would like the objects to be partitioned into three clusters.

Clustering of a set of objects using the k-means method; for (b) update cluster centers and
reassign objects accordingly (the mean of each cluster is marked by a C).

We arbitrarily choose three objects as the three initial cluster centers, where cluster centers are
marked by a +. Each object is assigned to a cluster based on the cluster center to which it is the
nearest. Such a distribution forms silhouettes encircled by dotted curves, as shown in Figure
(a).
Next, the cluster centers are updated. That is, the mean value of each cluster is recalculated
based on the current objects in the cluster. Using the new cluster centers, the objects are
redistributed to the clusters based on which cluster center is the nearest. Such a redistribution
forms new silhouettes encircled by dashed curves, as shown in Figure (b).
This process iterates, leading to Figure (c). The process of iteratively reassigning objects to
clusters to improve the partitioning is referred to as iterative relocation. Eventually, no
reassignment of the objects in any cluster occurs and so the process terminates. The resulting
clusters are returned by the clustering process.

• The k-means method is not guaranteed to converge to the global optimum and often
terminates at a local optimum. The results may depend on the initial random selection
of cluster centers.
• To obtain good results in practice, it is common to run the k-means algorithm multiple
times with different initial cluster centers.

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
Limits of K-Means:
1. Sensitivity to Initial Conditions
• K-means is sensitive to initial conditions. The algorithm randomly initializes the
cluster centroids at the beginning, and the final clustering results can vary
depending on these initial positions.
• Different initializations can lead to other local optima, resulting in different
clustering outcomes. This makes the K-means algorithm less reliable and
reproducible.
2. Difficulty in Determining
• One of the drawbacks of the K-means algorithm is that we have to set the number
of clusters (K) in advance. Choosing an incorrect number of clusters can lead to
inaccurate results. Various methods are available to estimate the optimal K, such as
the silhouette analysis or elbow method, but they may not always provide a clear-
cut answer.
• If we use a too-small K, we’ll get too broad clusters.
• It may require multiple runs to find the most suitable value of K, which can be time-
consuming and resource-consuming.
3. Inability to Handle Categorical Data
• The algorithm works with numerical data, where distances between data points can
be calculated. However, categorical data doesn’t have a natural notion of distance
or similarity.
• When categorical data is used with the K-means algorithm, it requires converting
the categories into numerical values, such as using one-hot encoding.
• One shortcoming of using one-hot encoding is that it treats each feature
independently and can degrade performance since it can significantly increase data
dimensionality.
4. Time Complexity
• The time complexity of the algorithm is O(n * K * M * D), where K is the number
of clusters, n is the number of data points, D is the number of dimensions, and M is
the number of iterations.

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
Using Clustering for Image Segmentation
Image segmentation is the process of dividing an image into distinct regions or segments,
where each region has some common characteristic. The goal is to simplify or change the
representation of an image, making it more meaningful and easier to analyze. Each segment
can represent different objects, boundaries, textures, or regions of interest.
Types of Image Segmentation
1. Thresholding: Pixels are grouped based on intensity values (e.g., foreground vs.
background).
2. Edge Detection: Segmentation based on identifying edges in the image (using
techniques like Sobel, Canny).
3. Region-based Segmentation: Dividing the image based on regions with similar
properties, such as color or texture.
4. Clustering-based Segmentation: Using clustering algorithms to group pixels with
similar features.
K-Means Clustering for Image Segmentation
K-Means is a partitional clustering algorithm that divides data into k clusters, where k is
predefined. It assigns each data point to the nearest cluster center and then iteratively updates
the centers based on the points assigned to them. It’s an unsupervised learning algorithm
because it doesn’t require labeled data.
K-Means Algorithm Steps:
1. Choose k cluster centers (centroids): These can be initialized randomly or by some
other method (like K-Means++).
2. Assign each point to the nearest centroid: Compute the Euclidean distance between
each point and the centroids and assign each pixel to the closest centroid.
3. Update the centroids: After assignment, update the centroids by calculating the mean
of all the points in each cluster.
4. Repeat: Repeat steps 2 and 3 until the centroids no longer change significantly
(convergence) or a specified number of iterations is reached.
Using K-Means for Image Segmentation
In the context of image segmentation, K-Means is applied to group similar pixels based on
color, intensity, or other features, making it easier to segment objects, textures, or regions in an
image.

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
Process of K-Means for Image Segmentation:
1. Flatten the Image: The image, which is typically a 3D array of shape (height, width,
channels), is reshaped into a 2D array of shape (num_pixels, channels) where each row
represents a pixel’s RGB (or grayscale) value.
2. Clustering: K-Means is applied to the pixel data. Each pixel is assigned to one of the
k clusters, where k is the number of desired segments or regions.
3. Reconstructing the Image: The result of K-Means clustering is a set of centroids (the
average color of each cluster). The image is then reconstructed by replacing each pixel's
original color with the corresponding centroid color.
4. Output: The output is a segmented image with k regions, each having similar pixel
values. These regions could represent objects or different parts of an image.

Why Use K-Means for Image Segmentation?

1. Unsupervised Learning:K-Means doesn’t need labeled data, which makes it easy to
use when you don’t have predefined categories for the segments.
2. Simplicity and Efficiency:K-Means is relatively simple to implement and works
efficiently for segmentation tasks, especially for images with clear regions of similar
colors or intensities.
3. Color-Based Segmentation:K-Means is particularly effective when you want to
segment based on color because it groups similar colors together into segments.
4. Versatility:It can be applied to both grayscale and color images. For color images, each
pixel's RGB values are used as features, while for grayscale images, only intensity
values are considered.

Applications of K-Means in Image Segmentation

1. Medical Imaging: Segmenting different regions (e.g., tumors, organs) from medical
scans like MRIs and CT scans.
2. Object Recognition: Segmentation helps in recognizing and classifying objects by
isolating them from the background.
3. Satellite Imagery: Segmenting different land types (water, forests, urban areas) in
satellite images.
4. Autonomous Vehicles: K-Means can be used to segment roads, vehicles, pedestrians,
and other objects from camera feeds.

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
Example code:
import cv2
import numpy as np
import matplotlib.pyplot as plt

# Step 1: Load the image

image = cv2.imread('2.jpg') # Replace with your image path
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) # Convert BGR to RGB
plt.imshow(image)
plt.title("Original Image")
plt.axis("off")
plt.show()

# Step 2: Reshape image for clustering

pixel_values = image.reshape((-1, 3)) # Flatten the image
pixel_values = np.float32(pixel_values) # Convert to float32 for k-means

# Step 3: Apply K-Means Clustering

k = 3 # Number of clusters
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 100, 0.2)
_, labels, centers = cv2.kmeans(pixel_values, k, None, criteria, 10,
cv2.KMEANS_RANDOM_CENTERS)
print(criteria)

# Convert centers back to uint8

centers = np.uint8(centers)
segmented_image = centers[labels.flatten()]
segmented_image = segmented_image.reshape(image.shape)

# Step 4: Display the segmented image

plt.imshow(segmented_image)
plt.title("Segmented Image with K-Means (k=3)")
plt.axis("off")
plt.show()

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
Using Clustering for Pre processing

Using clustering in data preprocessing can enhance the quality and efficiency of your machine
learning model by grouping similar data points and creating new features, reducing
dimensionality, or addressing data issues like noise and outliers.
Clustering is an unsupervised machine learning technique where data points are grouped based
on similarities. The idea is that points in the same group (or cluster) share certain characteristics
and should be closer to each other in some feature space than to points in other clusters.
Common clustering algorithms include:
• K-means
• DBSCAN (Density-Based Spatial Clustering)
• Hierarchical Clustering
• Gaussian Mixture Models (GMM)
Each of these algorithms has its strengths and is chosen based on the data's nature (e.g., the
expected shape of the clusters, noise levels, etc.)
Why Use Clustering in Preprocessing?
Clustering is used in preprocessing to simplify, structure, and enhance the data. Here's how
it can be applied at various stages:
1. Feature Engineering
Clustering can be used to create new features that help machine learning models identify
patterns or relationships in the data more easily. By labeling each data point with the cluster it
belongs to, you can add a cluster label as an additional feature, which often improves model
performance.
Example: Suppose you have a dataset of customer information (age, income, spending
behavior) and want to predict customer churn. By clustering the customers into groups based
on their spending patterns, the cluster label can serve as a feature to help the model better
understand the behavior of customers who are likely to churn.
Steps:
1. Apply a clustering algorithm like K-means to the customer data.
2. Assign each customer to a cluster (label).
3. Use the cluster label as a feature for a classification model (e.g., predicting whether a
customer will churn).

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
2. Dimensionality Reduction
High-dimensional datasets (i.e., datasets with many features) can suffer from the curse of
dimensionality, where the complexity of the model increases exponentially as the number of
features grows. Clustering helps to reduce the number of features by grouping similar data
points together.
How Clustering Helps: After clustering, you can use the cluster centroids (the average of all
points in a cluster) as a summarized representation of that group. This reduces the number of
unique data points the model needs to process.
Example: For customer data with many variables (age, income, education, etc.), you can apply
K-means clustering and represent each customer by the distance to their assigned cluster's
centroid, reducing the complexity of the dataset.

3. Outlier Detection and Noise Removal

In any dataset, there will often be points that do not fit well with the majority of the data —
these are outliers. Clustering can help identify outliers, as data points that do not belong to any
meaningful cluster are often considered outliers.
How Clustering Helps: DBSCAN, for example, classifies points that do not belong to any
cluster as noise (often assigned a label of -1). After applying clustering, outliers can be removed
or treated, improving the performance of machine learning models.
Example: In a dataset of transactions, a few very high-value transactions might not belong to
any customer segment. These can be flagged and removed or treated differently by the model.

4. Data Balancing
Many real-world datasets are imbalanced, meaning that some classes are underrepresented
compared to others. Clustering can be used to balance data within clusters, which can improve
model performance, especially in classification tasks.
How Clustering Helps:After clustering the data, you can ensure that each cluster contains a
balanced representation of different classes by using techniques like oversampling or
undersampling within each cluster.
Example: In a medical dataset, the disease class might be underrepresented. After clustering
the data by patient characteristics, you can apply SMOTE (Synthetic Minority Oversampling
Technique) within each cluster to balance the classes before training a model.

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
5. Reducing Complexity in Model Training
When training models, having smaller, more homogeneous groups (clusters) can reduce the
complexity of the training process. Instead of training on the entire dataset, you can train on
clusters that represent more homogeneous data points. This can make models more
interpretable and faster to train.
How Clustering Helps: Clustering helps segment the data into groups that can be trained
independently, making it easier to fit models and analyze the relationships within smaller
subsets of data.
Example: For large datasets with customer behavior data, clustering helps group customers
with similar buying patterns. You can then train a classifier separately for each cluster, leading
to more efficient training and better performance.

6. Creating Synthetic Data

Clustering can be combined with techniques like SMOTE (Synthetic Minority Over-sampling
Technique) to create synthetic data. After clustering, you can apply SMOTE within each cluster
to generate synthetic samples, which can help balance classes in imbalanced datasets.
How Clustering Helps: After clustering, SMOTE can generate synthetic data for minority
class clusters, ensuring that the synthetic data is meaningful and representative of the cluster's
characteristics.

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
Using Clustering for Semi-Supervised Learning
Semi-supervised learning (SSL) is a type of machine learning where a model is trained using
a combination of:
• Labeled data (data with known outputs or labels),
• Unlabeled data (data where the labels are unknown).
In real-world scenarios, labeled data can be scarce, expensive, or time-consuming to obtain,
while unlabeled data is abundant. Semi-supervised learning aims to leverage both labeled and
unlabeled data to improve model performance. Clustering is a powerful technique in SSL, as
it can help make use of the large amount of unlabeled data.
Clustering in Semi-Supervised Learning
Clustering in SSL involves grouping unlabeled data points into clusters based on their features.
These clusters represent different subgroups or patterns in the data. The key idea is to use the
structure of the unlabeled data to guide the learning process and improve predictions on new
or unseen data points.

How Clustering Helps in Semi-Supervised Learning

1. Label Propagation Through Clusters:

Cluster labeling: The basic idea is that once we have a few labeled data points, we can
propagate these labels to other points that belong to the same cluster. Since data points in the
same cluster are similar, we assume that they are likely to have similar labels.
Example: Suppose you have labeled data for cats and dogs, and a large set of unlabeled data
with images of various animals. By clustering the unlabeled data into groups of similar images
(e.g., based on features like shape, texture, etc.), you can propagate the known labels ("cat" or
"dog") to the members of each cluster, assuming the unlabeled data in each cluster will have
the same label.

2. Improving Label Quality:

The labels assigned through clustering can help improve the overall label quality of the dataset.
By clustering similar unlabeled data together, the clustering algorithm may help identify
mislabeling in the training data.
Example: if certain data points from the labeled set are outliers, clustering can group them
with other similar points, making the training process more robust.

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
3. Using Clusters as Pseudo-Labels:
In SSL, pseudo-labelling is a method where you use the predicted label from an unsupervised
learning model (like clustering) as a "pseudo-label" for training the model. By assigning labels
to unlabelled data based on their cluster assignment, we treat them as if they were labeled.
Example: If a clustering algorithm divides a dataset into 5 clusters, we can assume that all the
data points in the same cluster share the same label and train the model accordingly.
4. Data Exploration and Disambiguation:
Clustering can also help in data exploration. By grouping similar data points together,
clustering can reveal hidden structure and relationships in the data. This may help in making
better decisions about how to assign labels or refine training data, especially when there's
ambiguity in labelling.

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
DBSCAN
Clustering is an unsupervised learning technique where we try to group the data points based
on specific characteristics. There are various clustering algorithms with K-
Means and Hierarchical being the most used ones. Some of the use cases of clustering
algorithms include:
• Document Clustering
• Recommendation Engine
• Image Segmentation
• Market Segmentation
• Search Result Grouping
• and Anomaly Detection.
K-Means and Hierarchical Clustering both fail to create clusters of arbitrary shapes. They are
not able to form clusters based on varying densities. That’s why we need DBSCAN clustering.

Density-based spatial clustering of applications with noise (DBSCAN) is a clustering algorithm

used in machine learning to partition data into clusters based on their distance to other points.
Its effective at identifying and removing noise in a data set, making it useful for data cleaning
and outlier detection. Unlike other clustering algorithms (such as K-means), DBSCAN does
not require the number of clusters to be predefined and can discover clusters of arbitrary shapes.
DBSCAN is particularly useful for datasets with noise and can identify clusters with varying
shapes and densities. It is widely used in applications like geographic data analysis, anomaly
detection, and image segmentation.
Example
Let’s try to understand it with an example. Here we have data points densely present in the
form of concentric circles:

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
We can see three different dense clusters in the form of concentric circles with some noise here.
Now, let’s run K-Means and Hierarchical clustering algorithms and see how they cluster these
data points.

You might be wondering why there are four colors in the graph. As I said earlier, this data
contains noise, too. Therefore, I have taken noise as a different cluster, which is represented by
the purple color. Sadly, both of them failed to cluster the data points. Also, they were not able
to detect the noise present in the dataset properly. Now, let’s take a look at the results from
DBSCAN clustering.

Awesome! DBSCAN cannot only cluster the data points correctly but also perfectly detect
noise in the dataset.

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
DBSCAN Algorithm:
“How does DBSCAN find clusters?”Initially, all objects in a given data set D are marked as
“unvisited.” DBSCAN randomly selects an unvisited object p, marks p as “visited,” and checks
whether the neighborhood of p contains at least MinPts objects.If not, p is marked as a noise
point. Otherwise, a new cluster C is created for p, and all the objects in the neighborhood of
p are added to a candidate set, N. DBSCAN iteratively adds to C those objects in N that do not
belong to any cluster. In this process, for an object p1 in N that carries the label “unvisited,”
DBSCAN marks it as “visited” and checks its neighborhood. If the neighborhood of p1 has
at least MinPts objects, those objects in the neighborhood of p1 are added to N. DBSCAN
continues adding objects to C until C can no longer be expanded, that is, N is empty. At this
time, cluster C is completed, and thus is output. To find the next cluster, DBSCAN randomly
selects an unvisited object from the remaining ones. The clustering process continues until all
objects are visited.

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
Reachability and Connectivity
These are the two concepts that you need to understand before moving further. Reachability
states if a data point can be accessed from another data point directly or indirectly, whereas
Connectivity states whether two data points belong to the same cluster or not. In terms of
reachability and connectivity, two points in DBSCAN can be referred to as:
• Directly Density-Reachable
• Density-Reachable
• Density-Connected
Density-reachability and density-connectivity. Consider below figure for a given
represented by the radius of the circles, and, say, let MinPts = 3.
Labelled points, m,p,o, r are core objects because each is in an neighbourhood containing at
least three points. Object q is directly density-reachable from m. Object m is directly density-
reachable from p and vice versa.
Object q is (indirectly) density-reachable from p because q is directly density reachable from
m and m is directly density-reachable from p. However, p is not density reachable from q
because q is not a core object. Similarly, r and s are density-reachable from o and o is density-
reachable from r. Thus, o, r, and s are all density-connected.

Density-reachability and density-connectivity in density-based clustering.

Advantages of the DBSCAN Algorithm

• DBSCAN does not require the number of centroids to be known beforehand as in the
case with the K-Means Algorithm.
• It can find clusters with any shape.
• It can also locate clusters that are not connected to any other group or clusters. It can
work well with noisy clusters.
• It is robust to outliers.

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
Disadvantages of the DBSCAN Algorithm
• It does not work with datasets that have varying densities.
• Cannot be employed with multiprocessing as it cannot be partitioned.
• Cannot find the right cluster if the dataset is sparse.
• It is sensitive to parameters epsilon and minPoints

Applications of DBSCAN
• It is used in satellite imagery.
• Used in XRay crystallography
• Anamoly detection in temperature.

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
Gaussian Mixtures
A Gaussian Mixture Model (GMM) is a probabilistic model used to represent the presence of
subpopulations (clusters) within a larger population, where each subpopulation follows a
Gaussian (normal) distribution. GMM is considered a soft clustering technique because each
data point can belong to multiple clusters with different probabilities.
GMM assumes that:
• The data points are generated by a mixture of several Gaussian distributions.
• Each distribution is defined by its mean vector (μ) and covariance matrix (Σ).

Mathematical Representation
A GMM is represented as:

Parameters of GMM
The GMM has three key sets of parameters for each Gaussian component:

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
Parameter Estimation: Expectation-Maximization (EM) Algorithm
To fit a GMM to data, we use the Expectation-Maximization (EM) algorithm:

Advantages of GMM
• Soft clustering: Points can belong to multiple clusters with varying probabilities.
• Flexibility: Can model clusters with different shapes and densities by adjusting the
covariance structure.
• Probabilistic interpretation: Provides insights into the uncertainty of cluster
assignments.

Limitations of GMM
• Sensitive to Initialization: Poor initialization can lead to local optima.
• Assumption of Gaussian Distribution: May not perform well if the data doesn't fit a
Gaussian distribution.
• Number of Components: Requires the number of components (K) to be specified
beforehand.

Applications
• Anomaly Detection: Identifying outliers based on low probability density.
• Image Processing: Segmenting regions with different textures or intensities.
• Customer Segmentation: Grouping customers based on purchasing behavior.
• Speech Recognition: Modeling acoustic feature distributions.

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
Dimensionality Reduction

Dimensionality reduction is a technique used in machine learning and data science to reduce
the number of input features while preserving as much important information as possible. This
helps in overcoming computational inefficiencies and improving model performance.

The Curse of Dimensionality:

The curse of dimensionality refers to the problems that arise when working with high-
dimensional data. As the number of dimensions (features) increases, data points become
sparser, making it harder to analyze patterns.
Problems Caused by High Dimensionality
1. Increased Computational Cost: More dimensions mean more computations, making
models slower.
2. Data Sparsity: In high dimensions, data points are spread far apart, making clustering
and classification difficult.
3. Overfitting: More features can lead to models capturing noise instead of useful
patterns.
4. Distance Measure Distortion: Many machine learning algorithms rely on distance
metrics (e.g., Euclidean distance). In high dimensions, all points tend to appear
equidistant.
Example of Curse of Dimensionality
Consider a 1D space where points are randomly distributed between 0 and 1. If we have 100
data points, they are likely well distributed.
Now, if we increase the dimensions:
• 2D space: The same 100 points are spread in a unit square (0,1) × (0,1).
• 3D space: The points are spread in a unit cube (0,1) × (0,1) × (0,1).
• 100D space: The points are in a unit hypercube, and the data density becomes extremely
sparse.
This causes problems because:
• Most machine learning models struggle to learn useful patterns.
• Distance-based methods like k-NN become ineffective.

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
Let’s take an example to explain this better:
Imagine you are building a machine learning model to predict house prices based on features
like the number of bedrooms, square footage, location, age of the house, number of bathrooms,
and so on. If you have too many features like additional ones for each room’s condition,
flooring type, or neighborhood amenities, your dataset can become very large and complex.
With too many features, your model may become slow to train, and it might also pick up
unnecessary details or noise. For example, suppose the flooring type doesn’t significantly
impact house prices. In that case, it might lead the model to make less accurate predictions,
especially when the data is noisy or when there are many irrelevant features.

How Dimensionality Reduction Works?

Lets understand how dimensionality Reduction is used with the help of the figure below:

On the left, data points exist in a 3D space (X, Y, Z), but the Z-dimension appears unnecessary
since the data primarily varies along the X and Y axes. The goal of dimensionality reduction is
to remove less important dimensions without losing valuable information.

On the right, after reducing the dimensionality, the data is represented in lower-dimensional
spaces. The top plot (X-Y) maintains the meaningful structure, while the bottom plot (Z-Y)
shows that the Z-dimension contributed little useful information. This process makes data
analysis more efficient, improving computation speed and visualization while minimizing
redundancy.

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
Main Approaches for Dimensionality Reduction

There are two main approaches to dimensionality reduction:

1. Feature Selection.
2. Feature Extraction
1. Feature Selection (Selecting Relevant Features):
Feature selection chooses the most relevant features from the dataset without altering them.
It helps remove redundant or irrelevant features, improving model efficiency. There are
several methods for feature selection including filter methods, wrapper methods, and
embedded methods.

1. Filter Methods: Use statistical tests to rank features by importance.

Example Techniques: Correlation, Mutual Information, Chi-Square Test.
2. Wrapper Methods: Train models iteratively with different feature subsets and evaluate
performance.
Example Techniques: Recursive Feature Elimination (RFE), Forward Selection,
Backward Elimination
3. Embedded Methods: Feature selection is integrated into the model training process.
Example Techniques: LASSO Regression (L1 regularization), Decision Tree Feature
Importance.
2. Feature Extraction (Transforming Features):
Feature extraction creates new, lower-dimensional features from the original ones. These
methods combine or project features while preserving important patterns.
A. Linear Methods
1. Principal Component Analysis (PCA): Finds new axes that maximize
variance, transforming correlated features into uncorrelated components.
Best Used for High-dimensional, correlated data.
2. Linear Discriminant Analysis (LDA): Maximizes class separation while
reducing dimensions. Best Used for Classification problems.
3. Singular Value Decomposition (SVD): Factorizes data matrices into simpler
structures. Best Used for Text data (Latent Semantic Analysis in NLP).

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
B. Non-Linear Methods
1. t-SNE (t-Distributed Stochastic Neighbor Embedding): Maps high-
dimensional data to a lower-dimensional space while preserving local relationships.
Best Used for Visualization
2. UMAP (Uniform Manifold Approximation and Projection): Similar to t-
SNE but faster and better at preserving global structure. Best Used for Large datasets
and Visualization.
3. Autoencoders (Neural Networks): Uses an encoder-decoder architecture to
learn compact representations. Best Used for Deep Learning Applications.

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
Principal Component Analysis (PCA)

Principal Component Analysis (PCA) is a linear dimensionality reduction technique that

transforms high-dimensional data into a lower-dimensional space while preserving as much
variance (information) as possible.
• It finds a new set of orthogonal (uncorrelated) axes, called principal components,
that maximize the variance in the data.
• These components are ranked, with the first principal component (PC1) capturing the
most variance, the second (PC2) capturing the second most, and so on.
• PCA is widely used in machine learning, data visualization, and noise reduction.

PCA is useful in many scenarios, such as:

• Reducing dimensionality to improve computational efficiency.
• Avoiding overfitting in machine learning models.
• Visualizing high-dimensional data in 2D or 3D.
• Decorrelating features (removing redundancy).

PCA Working:
Step 1: Standardize the Data
Since PCA is sensitive to feature scaling, we first standardize the data by converting each
feature into a zero-mean and unit variance form:

where:
• μ is the mean of each feature.
• σ is the standard deviation

Step 2: Compute the Covariance Matrix

The covariance matrix captures relationships between features. It is computed as:

where:
• C is the d × d covariance matrix.
• XT X is the dot product of the data matrix.

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
Example covariance matrix for 3 features:

A high covariance means two features are correlated.

Step 3: Compute Eigenvalues and Eigenvectors

We solve the eigenvalue decomposition problem: Cv = λv
where:
• v are eigenvectors (principal components).
• λ are eigenvalues (amount of variance explained).
• The largest eigenvalue corresponds to the most important principal component.
• The number of components chosen depends on the percentage of variance
explained.

We solve for eigenvalues λ: det(C−λI)=0

Eigenvectors satisfy: (C−λI)v=0

Step 4: Select Top k Principal Components

We choose the top k eigenvectors (those with the highest eigenvalues) to form a projection
matrix: W= [v1,v2, ...,vk]
where:
• W is the d × k transformation matrix.
• k is the number of components retained.

Step 5: Transform Data into the New Space

We project the original data onto the new principal component space: Xreduced=X W
This gives us a lower-dimensional representation of the data.

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
Example of PCA

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
For Eigenvectors:

Substitute Eigenvalue λ1=10.8224

Solve the System of Equations

Normalize the Eigenvector

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
Solve for Eigenvector λ2=0.1151

Solve the System of Equations

Normalize the Eigenvector

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
Step 6: Select the Principal Component
Since we want to reduce our 2D data to 1D, we select the eigenvector corresponding to the
largest eigenvalue, which captures the most variance.
Our eigenvalues are: λ1=10.8224, λ2=0.1151 Since λ1 is the largest eigenvalue, its
corresponding eigenvector will be our principal component:

Step 7: Transform the Data

We project our centered data onto the principal component using:
Xtransformed = Xcentered * v1

Final Result of PCA Transformation in Matrix Format (Reducing 2D to 1D)

The final transformed data after applying PCA (projecting onto the principal component) is:

These are the projections of the original points onto the principal component (the direction
of maximum variance).

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
Using Scikit-Learn

Scikit-Learn (or sklearn) is a popular Python machine learning library used for:
1. Data Preprocessing (e.g., Standardization, Normalization)
2. Dimensionality Reduction (e.g., PCA, t-SNE)
3. Machine Learning Algorithms (e.g., SVM, Decision Trees, K-Means)
4. Model Evaluation (e.g., Accuracy, Precision-Recall)

Install and Import Scikit-Learn

pip install scikit-learn

Example : Scikit-Learn provides a simple way to apply Principal Component Analysis.

Program

import numpy as np

import matplotlib.pyplot as plt

from sklearn.decomposition import PCA

from sklearn.preprocessing import StandardScaler

# Step 1: Define the 2D Dataset

#Here:
• Column 1: Feature 1
• Column 2: Feature 2

X = np.array([
[2, 3],
[3, 5],
[5, 8],
[7, 10]])

# Step 2: Standardize the Data (mean = 0, variance = 1)

#Before applying PCA, it's crucial to standardize the dataset to ensure all
features have the same scale. This is done using Z-score normalization, which
centers the data around mean = 0 and scales it so that its variance = 1.

scaler = StandardScaler() #StandardScaler normalizes the data.

X_scaled = scaler.fit_transform(X)

# Step 3: Apply PCA (Reduce from 2D to 1D)

pca = PCA(n_components=1) # PCA(n_components=k) reduces the dimensions to k.

X_pca = pca.fit_transform(X_scaled)

# fit_transform() computes principal components and projects data.

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
# Step 4: Print Results

print("Original Data (2D):\n", X)

print("\nTransformed Data (1D):\n", X_pca)

print("\nPrincipal Component:\n", pca.components_)

# Step 5: Plot the Original Data and Principal Component

plt.figure(figsize=(6, 4))

plt.scatter(X[:, 0], X[:, 1], label="Original Data", color='blue')

# Get the principal component direction

vector = pca.components_[0] # Eigenvector

origin = np.mean(X, axis=0) # Mean of original data

# Plot principal component (red arrow)

plt.quiver(origin[0], origin[1], vector[0], vector[1], scale=3,

color='red', angles='xy', scale_units='xy', label="Principal Component")

plt.xlabel("Feature 1")

plt.ylabel("Feature 2")

plt.legend()

plt.title("PCA: Original Data & Principal Component")

plt.grid()

plt.show()

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
Randomized PCA

Randomized PCA is a faster approximation of standard PCA, particularly useful for large
datasets with high dimensions. Instead of computing all eigenvalues and eigenvectors, it
uses a stochastic algorithm to estimate the most important components efficiently.

Advantages:
• Faster than Standard PCA for high-dimensional data.
• Good Approximation of the principal components.
• Works Well with Sparse Data.
• Useful for Large Image and Text Datasets

Example : Compare Standard PCA vs. Randomized PCA with a 2D dataset.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
X = np.array([
[2, 3],
[3, 5],
[5, 8],
[7, 10]
])
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

pca_standard = PCA(n_components=1) # Reduce to 1D

X_pca_standard = pca_standard.fit_transform(X_scaled)
print("Standard PCA Result:\n", X_pca_standard)

pca_randomized = PCA(n_components=1, svd_solver="randomized")

X_pca_randomized = pca_randomized.fit_transform(X_scaled)

print("Randomized PCA Result:\n", X_pca_randomized)

print("Standard PCA Components:\n", pca_standard.components_)
print("Randomized PCA Components:\n", pca_randomized.components_)

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
plt.figure(figsize=(6, 4))
plt.scatter(X[:, 0], X[:, 1], label="Original Data", color='blue')

# Get the principal component direction

vector = pca_randomized.components_[0] # Eigenvector
origin = np.mean(X, axis=0) # Mean of original data

# Plot principal component (red arrow)

plt.quiver(origin[0], origin[1], vector[0], vector[1], scale=3,
color='red', angles='xy', scale_units='xy', label="Principal Component")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.legend()
plt.title("Randomized PCA: Original Data & Principal Component")
plt.grid()
plt.show()

Difference Between Standard PCA and Randomized PCA

Both Standard PCA and Randomized PCA are used for dimensionality reduction, but
they differ in terms of computation, speed, and use cases.

Feature Standard PCA Randomized PCA

Computes exact eigenvalues & Uses approximate randomized

Computation
eigenvectors SVD
Speed Slow for large datasets Fast for high-dimensional data
Accuracy Precise Good approximation
Scalability Not efficient for big data Scales well for large datasets
Memory High (stores full covariance Low (uses a subset of data for
Usage matrix) approximation)
Small to medium datasets where Large datasets (big data, images,
Best For
precision is needed NLP) where speed is needed
svd_solver="full" (exact svd_solver="randomized" (fast
SVD Solver
decomposition) approximation)

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
Kernel PCA

Kernel PCA (KPCA) is an extension of Principal Component Analysis (PCA) that allows us to
find nonlinear patterns in data. While standard PCA only works with linear transformations,
Kernel PCA uses the kernel trick to map data into a higher-dimensional space, where it can
find nonlinear principal components.

Why Use Kernel PCA?

• Handles Nonlinear Data: Standard PCA only captures linear patterns, while KPCA
can map data to a higher-dimensional space where patterns become linear.
• Uses the Kernel Trick: Instead of explicitly computing higher dimensions, KPCA
applies kernels to find nonlinear relationships efficiently.
• Useful for Complex Datasets: Works well for datasets with curved or nonlinear
structures, such as image processing, NLP, and clustering.

Mathematical Explanation:
1. Standard PCA finds principal components by computing eigenvectors of the covariance
matrix.
2. Kernel PCA applies a nonlinear transformation Φ(X) to map data into a higher-
dimensional space.
3. Instead of computing this higher-dimensional transformation explicitly, KPCA uses the
kernel trick: K(xi,xj)=Φ(xi)⋅Φ(xj) where K is a kernel function.
4. KPCA finds eigenvectors of the kernel matrix K, allowing for dimensionality reduction
in the transformed space.

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
Kernel PCA in Scikit-Learn

Let's apply Kernel PCA to a nonlinear dataset.

#Step 1: Import Required Libraries

import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import KernelPCA
from sklearn.datasets import make_moons
from sklearn.preprocessing import StandardScaler

# Step 2: Create a Nonlinear Dataset (Moons Dataset)

X, y = make_moons(n_samples=100, noise=0.1, random_state=42)

# Step 3: Standardize the Data

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Step 4: Apply Standard PCA

from sklearn.decomposition import PCA

pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

#Step 5: Apply Kernel PCA with RBF Kernel

kpca = KernelPCA(n_components=2, kernel="rbf", gamma=15)

X_kpca = kpca.fit_transform(X_scaled)

#Step 6: Visualize the Results

fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Standard PCA
axes[0].scatter(X_pca[:, 0], X_pca[:, 1], c=y, map="coolwarm")
axes[0].set_title("Standard PCA")

# Kernel PCA
axes[1].scatter(X_kpca[:, 0],X_kpca[:,1],c=y,cmap="coolwarm")
axes[1].set_title("Kernel PCA (RBF)")
plt.show()

Department of Information Technology, SRKREC(A)

Artificial Intelligence and Data Science (AI & DS)

Home
Differences Between PCA, Randomized PCA, and Kernel PCA
PCA (Standard
Feature Randomized PCA Kernel PCA
PCA)
Uses randomized
Computes exact Singular Value Applies kernel trick to
Approach eigenvalues from Decomposition map data to a higher-
covariance matrix (SVD) for dimensional space
approximation
Handles
No, only linear Yes, captures nonlinear
Nonlinear No, still linear
patterns structures
Data?
Fast, optimized for
Slow for large Moderate, depends on
Speed high-dimensional
datasets kernel choice
data
Memory High (stores full Low (works with a
High (stores kernel matrix)
Usage covariance matrix) subset of data)
Efficient for large Can be expensive for large
Scalability Poor for big data
datasets datasets
Yes (uses kernels like
Kernel Trick? No No
RBF, polynomial)
Small to medium
Works Best Large, high- Nonlinear datasets with
datasets with linear
For dimensional datasets complex patterns
relationships
Image compression, Big data applications Clustering, face
Example Use
eigenfaces, financial (e.g., NLP, recognition, non-
Cases
data analysis genomics) Euclidean data
Implementati PCA(n_components
PCA(n_components KernelPCA(n_components
on in Scikit- =k, svd_solver
=k) =k, kernel="rbf")
Learn ="randomized")

Department of Information Technology, SRKREC(A)

About Financial Accounting Volume 2 8th Doussy
100% (10)
About Financial Accounting Volume 2 8th Doussy
503 pages
CONSTI LAW I - Council of Teachers V Secretary of Education PDF
100% (6)
CONSTI LAW I - Council of Teachers V Secretary of Education PDF
4 pages
Clustering
No ratings yet
Clustering
34 pages
Graph Partitioning Advance Clustering Technique
No ratings yet
Graph Partitioning Advance Clustering Technique
14 pages
Fds Unit03
No ratings yet
Fds Unit03
11 pages
Chapter 5 Clustering
No ratings yet
Chapter 5 Clustering
40 pages
Prasanna Hebbar @govt First Grade College Honnavar
No ratings yet
Prasanna Hebbar @govt First Grade College Honnavar
11 pages
DWMModule 4
No ratings yet
DWMModule 4
31 pages
Unit 4
No ratings yet
Unit 4
40 pages
Final ML Unit3 May24
No ratings yet
Final ML Unit3 May24
154 pages
DWDM Unit-5
No ratings yet
DWDM Unit-5
52 pages
E-Note 28966 Content Document 20241211091351PM
No ratings yet
E-Note 28966 Content Document 20241211091351PM
69 pages
Unit 4
No ratings yet
Unit 4
74 pages
Sathyabama Institute of Science and Technology SIT1301-Data Mining and Warehousing
No ratings yet
Sathyabama Institute of Science and Technology SIT1301-Data Mining and Warehousing
22 pages
DW & DM Unit 4 Notes
No ratings yet
DW & DM Unit 4 Notes
40 pages
Clustering
No ratings yet
Clustering
29 pages
DMW Unit-V
No ratings yet
DMW Unit-V
47 pages
Clustering Unit4
No ratings yet
Clustering Unit4
9 pages
Clustering
No ratings yet
Clustering
104 pages
Artificial Intelligence Lec 5
No ratings yet
Artificial Intelligence Lec 5
20 pages
5 - Clustering
No ratings yet
5 - Clustering
13 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
17 pages
Clustering Notes
No ratings yet
Clustering Notes
17 pages
Week 9 Part 1 Clustering
No ratings yet
Week 9 Part 1 Clustering
44 pages
Genedata
No ratings yet
Genedata
67 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
42 pages
Cluster Analysis
No ratings yet
Cluster Analysis
21 pages
Unit IV Cluster Analysis
No ratings yet
Unit IV Cluster Analysis
7 pages
DMW Unit 5
No ratings yet
DMW Unit 5
10 pages
Unit 2 - Introduction To Cluster Analysis
No ratings yet
Unit 2 - Introduction To Cluster Analysis
53 pages
ML Unit-4
No ratings yet
ML Unit-4
23 pages
CLUSTRING
No ratings yet
CLUSTRING
13 pages
Unit-5 DM
No ratings yet
Unit-5 DM
11 pages
Lecture 6
No ratings yet
Lecture 6
14 pages
Unit 15
No ratings yet
Unit 15
26 pages
Clustering-Part 1
No ratings yet
Clustering-Part 1
35 pages
Cluster Analysis: Basic Concepts Partitioning Methods Hierarchical Methods Density-Based Methods Grid-Based Methods Evaluation of Clustering
No ratings yet
Cluster Analysis: Basic Concepts Partitioning Methods Hierarchical Methods Density-Based Methods Grid-Based Methods Evaluation of Clustering
38 pages
Unit 3 Clustering Algorithm
No ratings yet
Unit 3 Clustering Algorithm
44 pages
Data Mining - UNIT-IV
No ratings yet
Data Mining - UNIT-IV
24 pages
Clustering
No ratings yet
Clustering
32 pages
Clustering and K-Means Algorithm
No ratings yet
Clustering and K-Means Algorithm
81 pages
Machine Learning4
No ratings yet
Machine Learning4
39 pages
M5
No ratings yet
M5
40 pages
JNTUK R20 B.Tech CSE 3-2 Machine Learning Unit 4 Notes
No ratings yet
JNTUK R20 B.Tech CSE 3-2 Machine Learning Unit 4 Notes
23 pages
Classify Clustering
No ratings yet
Classify Clustering
31 pages
Data Clustering Seminar
No ratings yet
Data Clustering Seminar
34 pages
Clustering
No ratings yet
Clustering
11 pages
Module 5
No ratings yet
Module 5
91 pages
Clustering in Data Mining
No ratings yet
Clustering in Data Mining
14 pages
Data Mining Clustering
No ratings yet
Data Mining Clustering
76 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
Unit 4
No ratings yet
Unit 4
4 pages
Clustering
No ratings yet
Clustering
7 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
Clustering Data Mining
No ratings yet
Clustering Data Mining
27 pages
DM Chapter 5 (Clustering)
No ratings yet
DM Chapter 5 (Clustering)
40 pages
Unit-Iv Material
No ratings yet
Unit-Iv Material
24 pages
7.introduction To Clustering
No ratings yet
7.introduction To Clustering
11 pages
Cluster Analysis: Dr. Bernard Chen Ph.D. Assistant Professor
No ratings yet
Cluster Analysis: Dr. Bernard Chen Ph.D. Assistant Professor
43 pages
Clustering
No ratings yet
Clustering
8 pages
Unit - 4 (ML)
No ratings yet
Unit - 4 (ML)
13 pages
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
ML Unit-2
No ratings yet
ML Unit-2
55 pages
ML Unit-1
No ratings yet
ML Unit-1
43 pages
Unit 2
No ratings yet
Unit 2
20 pages
Unit 1
No ratings yet
Unit 1
36 pages
Assignment 1
No ratings yet
Assignment 1
1 page
8A Workksheets
No ratings yet
8A Workksheets
20 pages
RD Rigidsteelconduitimc
No ratings yet
RD Rigidsteelconduitimc
1 page
New Misc Mod
No ratings yet
New Misc Mod
36 pages
Try Free Fortinet NSE 6 - FortiMail 6.2 NSE6-FML - 6.2 Real Dumps PDF
No ratings yet
Try Free Fortinet NSE 6 - FortiMail 6.2 NSE6-FML - 6.2 Real Dumps PDF
11 pages
FD Revised 5 - Asf Devastation and Financial Performance of Pork Suppliers in Davao City
No ratings yet
FD Revised 5 - Asf Devastation and Financial Performance of Pork Suppliers in Davao City
53 pages
012 Cleanliness
No ratings yet
012 Cleanliness
34 pages
Computerized System Validation
No ratings yet
Computerized System Validation
14 pages
Assessment Task 2
No ratings yet
Assessment Task 2
16 pages
AUT International Scholarships - South Asia - Regulations S1 2025 Final Version
No ratings yet
AUT International Scholarships - South Asia - Regulations S1 2025 Final Version
5 pages
Optimum Equipment Management Through: Life Cycle Costing
No ratings yet
Optimum Equipment Management Through: Life Cycle Costing
4 pages
THHDH
No ratings yet
THHDH
56 pages
2 - Focus II C-307 DC Motor Sigma - Ing
No ratings yet
2 - Focus II C-307 DC Motor Sigma - Ing
127 pages
8 Gabriel
No ratings yet
8 Gabriel
22 pages
Web Design For Everyone Using Wordpress: Golam Morshed
No ratings yet
Web Design For Everyone Using Wordpress: Golam Morshed
31 pages
Project 2
No ratings yet
Project 2
7 pages
FII and DII in Indian Stock Market: A Behavioural Study
No ratings yet
FII and DII in Indian Stock Market: A Behavioural Study
9 pages
Inglés Examen
No ratings yet
Inglés Examen
12 pages
ACFAKFAFM
No ratings yet
ACFAKFAFM
27 pages
647e1269017b6a1e238c38b6 EIR2023-Ethiopia
No ratings yet
647e1269017b6a1e238c38b6 EIR2023-Ethiopia
27 pages
Experiment #2 - Introduction To TI C2000 Microcontroller, Code Composer Studio (CCS) and Matlab Graphic User Interface (GUI)
No ratings yet
Experiment #2 - Introduction To TI C2000 Microcontroller, Code Composer Studio (CCS) and Matlab Graphic User Interface (GUI)
18 pages
1 Introduction To Environmental Science
No ratings yet
1 Introduction To Environmental Science
16 pages
Dicom Communication Protocols
No ratings yet
Dicom Communication Protocols
23 pages
Cat - D8T Dozer Specs, Videos & 360 Views - D8 Dozer - Caterpillar
No ratings yet
Cat - D8T Dozer Specs, Videos & 360 Views - D8 Dozer - Caterpillar
17 pages
Split Learning Over Wireless Networks Parallel Design and Resource Management
No ratings yet
Split Learning Over Wireless Networks Parallel Design and Resource Management
30 pages
Broschure Eudragit Web
No ratings yet
Broschure Eudragit Web
4 pages
Urine Tests
100% (1)
Urine Tests
398 pages
Unit 3
No ratings yet
Unit 3
3 pages
Total Productive Maintenance
No ratings yet
Total Productive Maintenance
53 pages