Machine Learning Examples With R
Machine Learning Examples With R
with R
What is Machine Learning?
CAR
CAR
BIKE
It is a CAR
BIKE
Let us ask the same question to him
? What is this object?
Let us make him learn
CAR
show him
CAR
BIKE
BIKE
What is this object?
CAR
CAR
BIKE
BIKE
Past experience
CAR What is this object?
CAR
CAR
BIKE
BIKE
What about a Machine ?
We can ask a machine
• Addition
• Multiplication
• Division
Price in 2025?
+ + +
Dataset
In summary, what is machine learning?
Given a machine learning problem
• Identify and create the appropriate dataset
• Unsupervised Learning
• Reinforcement learning
Machine Learning Algorithms
What is Supervised Learning?
CAR
Classification
CAR
+ BIKE
= Training Dataset 𝑓( , )= CAR
BIKE
Training Set: This is a subset of the dataset that is used to train a
machine learning model.
Samples Labels
Test Set: Another subset of the dataset that is used to evaluate
the performance of the trained model
Regression
• Regression
•𝑓( , )= 20500.50
Dataset
What is Unsupervised Learning
CAR
CAR
BIKE
BIKE
Clustering
Dataset
What is Unsupervised Learning
Dataset
What is Reinforcement Learning
Supervised Learning Example : K-Nearest
Neighbors
• K-Nearest Neighbors (KNN) is a supervised machine learning model
that can be used for both regression and classification tasks. The
algorithm is non-parametric, which means that it doesn't make any
assumption about the underlying distribution of the data.
• The KNN algorithm predicts the labels of the test dataset by looking
at the labels of its closest neighbors in the feature space of the
training dataset. The “K” is the most important hyperparameter that
can be tuned to optimize the performance of the model.
How does KNN Classification work?
• The KNN classification algorithm works by finding K neighbors (closest
data points) in the training dataset to a new data point. Then, it assigns
the label of the majority class among neighbors to new data points.
Distance Measures
• Euclidean is probably the most intuitive one and
represents the shortest distance between two points.
• That’s how you can imagine that the K value has a powerful effect on
KNN performance.
Then how to select the optimal K value?
• There are no pre-defined statistical methods to find the most
favorable value of K.
• Initialize a random K value and start computing.
• Choosing a small value of K leads to unstable decision boundaries.
• The substantial K value is better for classification as it leads to
smoothening the decision boundaries.
• Derive a plot between error rate and K denoting values in a defined
range. Then choose the K value as having a minimum error rate.
KNN in R - data
Unupervised Learning Example : K-Means
• K-means is a popular unsupervised machine learning technique that
allows the identification of clusters (similar groups of data points)
within the data.
• Clustering models aim to group data into distinct “clusters” or groups.
This can be used an analysis by itself, or can be used as a feature in a
supervised learning algorithm.
K-Means
The k-means algorithm
Kmeans example - Airbnb Listings Dataset
Components of Kmeans in R
In R, when you perform KMeans clustering using the `kmeans()` function, several components are
available in the resulting KMeans object. These components provide various information about the
clustering result.
1. cluster: A vector indicating the cluster to which each point is assigned.
2. centers: A matrix containing the final cluster centers.
3. totss: The total sum of squares.
4. withinss: A vector of the within-cluster sum of squares for each cluster.
5. tot.withinss: The total within-cluster sum of squares.
6. betweenss: The between-cluster sum of squares.
7. size: The number of points in each cluster.
8. iter: The number of iterations the algorithm took to converge.
You can access these components using the `$` operator. For example, `kmeans_obj$cluster` gives
you the cluster assignments, and `kmeans_obj$centers` gives you the cluster centers.