0% found this document useful (0 votes)
3 views32 pages

[CreativeProgramming]Lecture14_Machine Learning

The document provides an overview of machine learning, contrasting it with traditional programming, and introduces key concepts such as supervised and unsupervised learning. It discusses the importance of feature representation and distance measures in classifying data, particularly in the context of classifying reptiles. An exercise is included to develop a reptile classification model using Euclidean distance to measure similarities between animal features.

Uploaded by

allrounderguno
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views32 pages

[CreativeProgramming]Lecture14_Machine Learning

The document provides an overview of machine learning, contrasting it with traditional programming, and introduces key concepts such as supervised and unsupervised learning. It discusses the importance of feature representation and distance measures in classifying data, particularly in the context of classifying reptiles. An exercise is included to develop a reptile classification model using Euclidean distance to measure similarities between animal features.

Uploaded by

allrounderguno
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Creative Programming

Spring 2025
CUL1122 Lecture #14
Statistical Problems:

Introduction to Machine Learning


Today

❖Machine Learning
▪ Traditional Programming vs. Machine Learning: A Comparison
▪ The Basic Paradigm of Machine Learning
❖Understanding Distance Measures
❖Feature Representation and Engineering
❖Exercise: Classifying Reptiles

3
Machine Learning

❖A computer program that ‘automatically learns’ something.


❖Early definition of machine learning:
▪ “A field of study that gives computers the
ability to learn without being explicitly
programmed.” – Arthur Samuel (1959)
❖Arthur Samuel, a computer pioneer,
wrote the first self-learning program,
which played checkers and learned from experience.
❖In 1956, his checker program was developed for play on the IBM 701
computer, and it was demonstrated to the public on television.
4
Traditional Programming vs. Machine Learning

❖In traditional programming, a programmer provides instructions to the


computer.
❖A program consists of a series of commands that tell the computer
what to do and in what order.

5
Traditional Programming vs. Machine Learning

❖Machine learning is an automated process enabling computers to solve


problems through data analysis, not preset programs.
❖In machine learning, we provide a sample set of input-output pairs,
which allows the system to learn a method for mapping inputs to correct
outputs, effectively creating a program.

6
Basic Paradigm of Machine Learning

❖Observe a set of samples known as training data.


❖Infer something about the process that generated the data.
❖Use this inference to make predictions on previously unseen test data.

7
Basic Paradigm of Machine Learning

❖Variations on the Paradigm:


▪ 1) Supervised Learning: Given a set of feature-label pairs, the goal is to find a
rule that predicts the label associated with a previously unseen input.
▪ 2) Unsupervised Learning: Given a set of feature vectors without labels, the
objective is to group them into “natural clusters.”
Supervised (w/ labels) Unsupervised (w/o labels)

8
Basic Paradigm

❖Examples of Two Variations in Machine Learning Techniques:

9
Supervised Learning

❖1) Classification: Predict a discrete value (label) associated with a


feature vector.
❖2) Regression: Predict a continuous value (real number) associated with
a feature vector.

10
How Should We Classify the Data?

❖We aim to determine the “similarity” of examples, with the goal of


predicting the label associated with a previously unseen input.
❖Similarity refers to the quality or
state of being similar, characterized
by likeness or resemblance, such as
a similarity of features.
❖While it is challenging to define,
similarity functions as a distance
measure in machine learning.

11
Defining Distance Measures
Definition: Let O1 and O2 be two objects from the universe of possible objects.
The distance (dissimilarity) between O1 and O2 is a real number denoted by D(O1, O2)

gene1
gene2

0.23 3 342.7

12
Defining Distance Measures

❖Euclidian distance ❖Minkowski distance

13
Attribute-Based Labeling through Distance Measure

❖For example, in the following scenario, similarity is determined based


on ear shape and nose size.

14
Feature Representation

❖Features never fully describe a situation.


▪ For example, ear shape and nose size alone cannot fully describe dogs or cats.
❖Feature engineering involves representing examples using feature
vectors, which facilitates generalization.
❖For instance, suppose you want to use 100 existing samples to predict
which students will receive an A in this course.
▪ Some features are undoubtedly helpful, such as GPA and prior programming
experience (though not perfect predictors).
▪ However, others might lead to overfitting, such as birth month or eye color.

15
An Example Process of Feature Representation

❖Initial model with 5 features


Features Label
Name Egg-laying Scales Poisonous Cold- # legs Reptile
blooded

Cobra True True True True 0 Yes


Rattlesnake True True True True 0 Yes
Boa False True False True 0 Yes The Boa does not
constrictor
fit this model.
Chicken True True False False 2 No
Alligator True True False True 4 Yes
Dart frog True False True False 4 No
Salmon True True False True 0 No
Python True True False True 0 Yes
16
An Example Process of Feature Representation

❖Refined model with 3 features: scales, cold-blooded, 0 leg


Features Label
Name Egg-laying Scales Poisonous Cold- # legs Reptile
blooded

Cobra True True True True 0 Yes


Rattlesnake True True True True 0 Yes
Boa False True False True 0 Yes
constrictor

Chicken True True False False 2 No


Alligator True True False True 4 Yes The Alligator does
Dart frog True False True False 4 No not fit this model.
Salmon True True False True 0 No
Python True True False True 0 Yes
17
An Example Process of Feature Representation

❖Refined model with 3 features: scales, cold-blooded, 0 or 4 legs


Features Label
Name Egg-laying Scales Poisonous Cold- # legs Reptile
blooded

Cobra True True True True 0 Yes


Rattlesnake True True True True 0 Yes
Boa False True False True 0 Yes
constrictor

Chicken True True False False 2 No


Alligator True True False True 4 Yes
Dart frog True False True False 4 No
Salmon True True False True 0 No No (easy) way to
Python True True False True 0 Yes
classify salmon
and python.
18
An Example Process of Feature Representation

❖Current model: scales, cold-blooded; not perfect but no false negatives


Features Label
Name Egg-laying Scales Poisonous Cold- # legs Reptile
blooded

Cobra True True True True 0 Yes - Anything classified


as “not reptile” is
Rattlesnake True True True True 0 Yes
correctly labeled.
Boa False True False True 0 Yes
constrictor - Some animals may
be incorrectly
Chicken True True False False 2 No labeled as reptiles.
Alligator True True False True 4 Yes
Dart frog True False True False 4 No
Salmon True True False True 0 No
Python True True False True 0 Yes
19
Feature Engineering

❖We need to measure the distance between features.


❖This involves deciding which features to include and identifying those
that may add noise to the classifier.
❖Additionally, we must define how to measure distances between
training examples, which extends to classifiers and new instances.
❖Furthermore, we need to determine how to weigh the relative
importance of different dimensions of the feature vector, as this affects
the definition of distance.

20
Measuring Distance between Animals

❖We can consider our animal samples as consisting of four binary


features and one integer feature.
Name Egg-laying Scales Poisonous Cold- # legs Reptile
blooded

Rattlesnake True True True True 0 Yes


Boa False True False True 0 Yes
constrictor

Dart frog True False True False 4 No

Rattlesnake = [1,1,1,1,0]
Boa constrictor = [0,1,0,1,0]
Dart frog = [1,0,1,0,4]

21
Euclidean Distance between Animals

❖One way to distinguish between reptiles and non-reptiles is by


measuring the distance between pairs of samples and clustering nearby
samples into a common class for unlabeled data.
❖For example, using Euclidean distance, a rattlesnake and a boa
constrictor are much closer to each other than either is to a dart frog.
Rattlesnake = [1,1,1,1,0] Rattlesnake Boa constrictor Dart frog
Boa constrictor = [0,1,0,1,0] Rattlesnake - 1.414 4.243
Dart frog = [1,0,1,0,4]
Boa 1.414 - 4.472
constrictor
Dart frog 4.243 4.472 -

22
Add an Alligator

alligator = Animal(‘alligator’, [1,1,0,1,4])


animals.append(alligator)
compareAnimals(animals, 3)

23
Add an Alligator

❖The alligator is closer to the dart frog than to the snakes. Why?
▪ The alligator differs from the dart frog in three features, whereas it differs from
the boa constrictor in only two features.
▪ However, the scale for the “legs” feature ranges from 0 to 4, while the scales
for the other features range from 0 to 1.
▪ As a result, the “legs” dimension is disproportionately large.
Rattlesnake Boa constrictor Dart frog Alligator
Rattlesnake - 1.414 4.243 4.123
Boa constrictor 1.414 - 4.472 4.123
Dart frog 4.243 4.472 - 1.732
Alligator 4.243 4.123 1.732 -
24
Using Binary Features

❖Now, the alligator is closer to snakes than it is to the dart frog.


❖This highlights the importance of feature engineering!
Rattlesnake = [1,1,1,1,0] Rattlesnake = [1,1,1,1,0]
Boa constrictor = [0,1,0,1,0] Boa constrictor = [0,1,0,1,0]
Dart frog = [1,0,1,0,4] Dart frog = [1,0,1,0,1]
A11igator = [1,1,0,1,4] A11igator = [1,1,0,1,1]

Rattlesnake Boa constrictor Dart frog Alligator


Rattlesnake - 1.414 1.732 1.414
Boa constrictor 1.414 - 2.236 1.414
Dart frog 1.732 2.236 - 1.732
Alligator 1.414 1.414 1.732 -

25
Exercise: Classifying Reptiles
Exercise #1: Reptile Classification

❖Develop a reptile classification model.


▪ Create a script that categorizes animals as reptiles or non-reptiles based on five
features: egg-laying, scales, poisonous, cold-blooded, and number of legs.
▪ Use Euclidean distance to calculate the similarity between animals and display
the results in the table below.

27
Exercise #1: 1) Define the Animal Class

❖Define the Animal class, which should include a feature vector and a
method for measuring the distance between features.
class Animal(object):
def __init__(self, name, features):
# Assume name a string; features a list of numbers
self.name = name
self.features = numpy.array(features)

def distance(self, other):


# Return the Euclidean distance between feature vectors of self and other
return math.dist(self.getFeatures(), other.getFeatures())

28
Exercise #1: 2) Calculate Similarity between Animals

❖Define a function that computes the similarity between animals.


def compareAnimals(animals, precision):

# Get distances between pairs of animals
for a1 in animals: # For each row
row = []
for a2 in animals: # For each column
if a1 == a2:
row.append('--')
else:
distance = a1.distance(a2)
row.append(str(round(distance, precision)))
tableVals.append(row)
….
29
Exercise #1: 3) Add Samples

❖Provide animal samples to calculate their similarities.


rattlesnake = Animal('rattlesnake', [1, 1, 1, 1, 0])
boa = Animal(‘boa_constrictor', [0, 1, 0, 1, 0])
dartFrog = Animal('dart frog', [1, 0, 1, 0, 4])
animals = [rattlesnake, boa, dartFrog]
compareAnimals(animals, 3)

alligator = Animal('alligator', [1, 1, 0, 1, 4])


animals.append(alligator)
compareAnimals(animals, 3)

30
Exercise #1: 4) Improve Features

❖Change “Number of Legs” to a boolean indicating leg presence, and


recalculate the animal similarities.
rattlesnake = Animal('rattlesnake', [1, 1, 1, 1, 0])
boa = Animal(‘boa_constrictor', [0, 1, 0, 1, 0])
dartFrog = Animal('dart frog', [1, 0, 1, 0, 1])
alligator = Animal('alligator', [1, 1, 0, 1, 1])
animals = [rattlesnake, boa, dartFrog, alligator]
compareAnimals(animals, 3)

31
수고하셨습니다!

32

You might also like