3C's of Machine Learning PDF
3C's of Machine Learning PDF
March 4, 2023
Jimma, Ethiopia
3C’s of Machine Learning
Both Classification and Clustering is used for the categorization of objects into one or more
classes based on the features. They appear to be a similar process as the basic difference is
minute. In the case of Classification, there are predefined labels assigned to each input instance
according to their properties whereas in clustering those labels are missing.
Classification
As the name suggests, Classification is the task of “classifying things” into sub-categories. In
Machine Learning, Classification is the problem of identifying to which of a set of categories, a
new observation belongs, on the basis of a training set of data containing observations and whose
categories membership is known. Classification is a machine learning task that involves
assigning a class label to a given input based on a set of training data. The goal of classification
is to build a model that can accurately predict the class label for new, unseen data. Classification
is used for supervised learning. The process of classifying the input instances based on their
corresponding class labels is known as classification. As Classification have labels so there is
need of training and testing dataset for verifying the model created but there is no need for
training and testing dataset in clustering. Classification is more complex as compared to
clustering as there are many levels in the classification phase whereas only grouping is done in
clustering.
Examples are Logistic regression, Naive Bayes classifier, Support vector machines, etc.
Clustering
Clustering is one type of unsupervised learning where the goal is to partition the set of objects into
groups called clusters. Clustering is the collection of objects based on resemblance and distinction
between them. In simple words, we can say that it is an approach of collection of objects, so that
objects with similar functionalities come together and objects with different attributes move apart.
It is used to set the instances on the basis of their resemblance without class labels. It does not
prefer a training dataset and less complex as compared to the classification. Clustering is structure-
finding, typically among dense data of low or moderate dimension in a continuous space. It's really
defined by a distance function among data points. It typically employs some form of expectation-
maximization-style algorithm.
Examples are k-means clustering algorithm, Fuzzy c-means clustering algorithm, Gaussian (EM)
clustering algorithm, etc.
Collaboration
There are two ways the model can identify whether or not a user enjoyed a product. The user can
be asked to give a numerical rating or the system can assume that the user likes whatever product
they use. Once user interests have been established, recommendations can be made. Collaborative
filtering is in general a ranking problem. Depending on how you look at it, the data are sparse,
high-dimensional and in a continuous space. It amounts to inferring which missing dimension has
the highest value. It typically proceeds via a matrix completion algorithm like low-rank
factorization.