0% found this document useful (0 votes)
2 views8 pages

WEEK 5 Machine Learning

Machine learning involves algorithms that perform tasks without explicit programming, combining statistics and computing. It is categorized into supervised learning, which uses labeled data to train models, and unsupervised learning, which identifies patterns in unlabeled data. Key techniques include classification, clustering, and dimensionality reduction, with decision trees being a common method for making predictions based on input attributes.

Uploaded by

titoelgatito381
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views8 pages

WEEK 5 Machine Learning

Machine learning involves algorithms that perform tasks without explicit programming, combining statistics and computing. It is categorized into supervised learning, which uses labeled data to train models, and unsupervised learning, which identifies patterns in unlabeled data. Key techniques include classification, clustering, and dimensionality reduction, with decision trees being a common method for making predictions based on input attributes.

Uploaded by

titoelgatito381
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Machine Learning

Algorithms which are able to perform certain tasks without explicitly


being programmed.
• Machine learning combines statistics and Computing to enable
Computers
to perform a given task without being programmed to do so.

• Machine learning algorithms tend to improve on doing certain tasks.

• Machine learning can be divided into Supervised, Unsupervised

Supervised Learning

• Supervised learning algorithm are trained on labelled data.

• Labelled data – Data for which the target answer is known. For
example,
if you are shown a picture of a cat and you are told it’s a cat. That is
labelled data.

• Unlabelled data – Data for which the target answer is not known. For
example, if you are shown an image but you are not given information
about the image description.

Input and Output:

 The input is the data we want to learn from (e.g., pictures of


animals).

 The output is the correct answer or label (e.g., "dog", "cat").

Training Process:

 The algorithm tries to learn the relationship between inputs and


outputs by adjusting its internal parameters.

 It gets feedback on how well it’s doing by comparing its


predicted outputs with the actual labels.
 Goal:

 The goal is to generalize well, so it can make accurate


predictions on new, unseen data.

Example: Suppose you want to teach a computer to recognize spam


emails:

 You give it a dataset of emails (inputs) that are labeled as


"spam" or "not spam" (outputs).

 The model learns patterns in the data that distinguish spam


from non-spam.

 Once trained, it can predict whether a new email is spam.

this is used to make classification. For example of spam email to know


what attributes has that distingues from normal emails and so they
will know later which are spams.

For example we want to do a reptile classification.

The table shows sample data for classifying vertebrates into


mammals, reptiles, birds, fish, and amphibians. The attribute set
includes characteristics of vertebrate such as its body temperature,
skin cover and ability to fly. The data set can also be used for binary
classification task such as mammal classification, by grouping the
reptiles, birds, fish, and amphibians into a single category called

nonmammals.

Unsupervised Learning

Unsupervised learning is a type of machine learning where the model


is not given any labels. It only sees the input data, and it tries to
find patterns, structures, or groupings on its own.

Key Concepts:

1. No Labeled Output:

o The data has no predefined labels.

o The model must explore the data and find hidden


patterns.

2. Goal:

o The goal is to discover structure in the data—like


grouping similar items together or reducing the
complexity of the data.

Common Tasks:

1. Clustering – Grouping similar items.

o Example: Grouping customers into segments based on


buying behavior.
o Algorithms: K-Means, Hierarchical Clustering, DBSCAN

2. Dimensionality Reduction – Simplifying data by reducing the


number of features.

o Example: Reducing a 100-feature dataset to 2 or 3


features for visualization.

o Algorithms: PCA (Principal Component Analysis), t-SNE

Example:

Imagine you have a big pile of customer data (age, purchase history,
website visits), but you don’t know anything about them. You want to
group similar customers together to send them tailored marketing
emails.

You give this data to an unsupervised learning algorithm, and it finds


3 natural customer groups:

 Group 1: Young, low spenders

 Group 2: Middle-aged, frequent buyers

 Group 3: Older, high-value customers

You didn’t tell the algorithm what to look for—it found those
patterns by itself.
To know when to use supervised and unsupervised

Decision Tree

Decision tree learning example:


If in a branch is
all negative this means that the classification will be N, otherswise it
will be Y

Which attribute best to choose: We want to choose best attributes


cuz we want the tree as short as possible so it doesn’t get too large.

Values from branch:

In Sunny for example he has [2+, -3] because  Yes (positive): D9,
D11 → 2

 No (negative): D1, D2, D8 → 3

So [2+, -3]
Then to measure uncertainty is

For example in overcast is

[4+, 0] so it is completely certain

(100%)(these are good)

Entropy formula
Entropy tells us how mixed a set of examples is. If a set is pure (all yes or all no)
ex. 10Y, 0N, entropy is 0. If it’s 50/50 (ex. 5Y , 5N, entropy is 1 — maximum
uncertainty.

We did it like this. We divide 6/8. Is 8 cuz 6 + 2 =8. Then follow the formula

When you get 3 branches?


When you have now all the GAIN values the best attribute is one with highest value
and worst lowest value.

Gini Index

Gini Index is another index like Entropy which is used to decide the splitting of an
attribute on a decision tree

You might also like