0% found this document useful (0 votes)

41 views124 pages

Ch3-Machine Learning

Uploaded by

JinMan Kim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views124 pages

Ch3-Machine Learning

Uploaded by

JinMan Kim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 124

Ver.

2023/10/04

Introduction to Machine Learning

Jinyoung Yeo
Department of Artificial Intelligence
Yonsei University

AAI2250: Introduction to Artificial Intelligence

Outline

1. What is Machine Learning?

2. Type of Learning

3. Framing a Learning Problem

4. Clustering

5. Application Example

6. Linear Regression

7. Multivariable Linear Regression

8. Logistic Regression and Multinomial Classification

9. Application Example

2
What is Machine Learning?

3
What is Machine Learning?

• “Learning is any process by which a system improves

performance from experience.”
- Herbert Simon

• Definition by Tom Mitchell (1998):

• Machine Learning is the study of algorithms that
• improve their performance P
• at some task T
• with experience E.
• A well-defined learning task is given by <P, T, E>.

4
Traditional Programming

Data
Computer Output
Program

Machine Learning

Data
Computer Program
Output

5
Some more examples of tasks that are best solved by using a
learning algorithm

• Recognizing patterns:
– Facial identities or facial expressions
– Handwritten or spoken words
– Medical images
• Generating patterns:
– Generating images or motion sequences
• Recognizing anomalies:
– Unusual credit card transactions
– Unusual patterns of sensor readings in a nuclear power plant
• Prediction:
– Future stock prices or currency exchange rates

6
Sample Applications

• Web search
• Computational biology
• Finance
• E-commerce
• Robotics
• Information extraction
• Social networks
• Debugging software
• [Your favorite area]

7
Sample Applications

Image Classification

Document Categorization

Speech Recognition Protein Classification Spam Detection

Branch Prediction Fraud Detection Natural Language Processing

Playing Games Computational Advertising

8
Machine Learning is Changing the World
“Machine learning is the hot new thing”
(John Hennessy, President, Stanford)

“A breakthrough in machine learning would be worth ten

Microsofts” (Bill Gates, Microsoft)

“Web rankings today are mostly a matter of machine learning”

(Prabhakar Raghavan, VP Engineering at Google)

9
Defining the Learning Task
Improve on task T, with respect to
performance metric P, based on experience E
T: Playing checkers
P: Percentage of games won against an arbitrary opponent
E: Playing practice games against itself

T: Recognizing hand-written words

P: Percentage of words correctly classified
E: Database of human-labeled images of handwritten words

T: Driving on four-lane highways using vision sensors

P: Average distance traveled before a human-judged error
E: A sequence of images and steering commands recorded while
observing a human driver.

T: Categorize email messages as spam or legitimate.

P: Percentage of email messages correctly classified.
E: Database of emails, some with human-given labels
10
Type of Learning

11
Types of Learning

• Supervised (inductive) learning

– Given: training data + desired outputs (labels)
• Unsupervised learning
– Given: training data (without desired outputs)
• Semi-supervised learning
– Given: training data + a few desired outputs
• Reinforcement learning
– Rewards from sequence of actions

12
Supervised Learning: Regression

• Given (x 1 , y1), (x 2 , y2), ..., (x n , yn)

• Learn a function f ( x ) to predict y givenx
– y is real-valued == regression

9
8
September Arctic Sea Ice Extent

7
(1,000,000 sq km)

6
5
4
3
2
1
0
1970 1980 1990 2000 2010 2020
Year
13
Supervised Learning: Classification

• Given (x 1 , y1), (x 2 , y2), ..., (x n , yn)

• Learn a function f(x) to predict y givenx
– y is categorical == classification

Breast Cancer (Malignant / Benign)

1(Malignant)

0(Benign)
Tumor Size

14
Supervised Learning: Classification

• Given (x 1 , y1), (x 2 , y2), ..., (x n , yn)

• Learn a function f(x) to predict y givenx
– y is categorical == classification

Breast Cancer (Malignant / Benign)

1(Malignant)

0(Benign)
Tumor Size

Tumor Size 15
Supervised Learning: Classification

• Given (x 1 , y1), (x 2 , y2), ..., (x n , yn)

• Learn a function f(x) to predict y givenx
– y is categorical == classification

Breast Cancer (Malignant / Benign)

1(Malignant)

0(Benign)
Tumor Size

Tumor Size 16
Supervised Learning: Classification

• x can be multi-dimensional
– Each dimension corresponds to an attribute

- Clump Thickness
- Uniformity of Cell Size
- Uniformity of Cell Shape
Age …

Tumor Size

17
Supervised Learning: Classification

• Decide which emails are spam and which are important.

Supervised classification
Not spam spam

Goal: use emails seen so far to produce good prediction rule for
future data.
18
Supervised Learning: Classification
Represent each message by features. (e.g., keywords, spelling, etc.)

example label

Reasonable RULES:
+ + -
+ -
+
Predict SPAM if unknown AND (money OR pills)
- --
Predict SPAM if 2money + 3pills –5 known > 0 -
Linearly separable 19
Supervised Learning: Classification

Handwritten digit recognition

(convert hand-written digits to
characters 0..9)

Face Detection and Recognition

20
Supervised Learning: Classification
• Weather prediction

• Medicine:
– diagnose a disease
• input: from symptoms, lab measurements, test results, DNA tests, …
• output: one of set of possible diseases, or “none of theabove”
• examples: audiology, thyroid cancer, diabetes, …
– or: response to chemo drug X
– or: will patient be re-admitted soon?

• Computational Economics:
– predict if a stock will rise or fall
– predict if a user will click on an ad or not
• in order to decide which ad to show
21
Supervised Learning: Regression

Stock market

Weather prediction

Temperature
72° F

Predict the temperature at any given location

22
Unsupervised Learning
• Given x 1 , x 2 , ..., x n (without labels)
• Output hidden structure behind the x’s
– E.g., clustering

23
Unsupervised Learning
Genomics application: group individuals by genetic similarity
Genes

Individuals 24
Unsupervised Learning

Clustering: discovering structure in data (only unlabeled data)

• E.g, cluster users of social networks by interest (community detection).
Facebook Twitter
network Network

Social network analysis

Market segmentation Astronomical data analysis

25
Reinforcement Learning

• Given a sequence of states and actions with

(delayed) rewards, output a policy
– Policy is a mapping from states → actions that tells
you what to do in a given state
• Examples:
– Credit assignment problem
– Game playing
– Robot in a maze
– Balance a pole on your hand

26
The Agent-Environment Interface

Agent and environment interact at discrete timesteps : t = 0, 1, 2, K

Agent observes state at step t : st S
produces action at step t : at  A(st )
gets resulting reward : rt+1 
and resulting next state : st +1

... rt +1 s rt +2 s rt +3 s ...
st a t +1
at +1 t+2
at +2 t +3 at +3
t
27
Reinforcement Learning

https://www.youtube.com/watch?v=4cgWya-wjgY 28
Types of Learning

• Supervised learning • Unsupervised learning

– Decision tree – Clustering
– Linear regression – Dimensionality reduction
– Logistic regression • Reinforcement learning
– Support vector machines – Q learning
& kernel methods
– Model ensembles
– Neural networks & deep
learning

29
Framing a Learning Problem

30
Developing a Learning System
• Choose the training experience
• Choose exactly what is to be Training data Learner

learned Environment/
– i.e. the target function Experience Knowledge

• Choose how to represent the

Testing data
target function Performance
Element
• Choose a learning algorithm to
infer the target function from the
experience

• We generally assume that the training and test examples

are independently drawn from the same overall
distribution of data
– We call this “i.i.d” which stands for “independent and identically
distributed” 31
Developing a Learning System

• Understand domain, prior knowledge, and goals

• Data integration, selection, cleaning, pre-processing, etc.
• Learn models
• Interpret results
• Consolidate and deploy discovered knowledge 32
Developing a Learning System

• Every ML algorithm has three components:

– Representation
– Optimization
– Evaluation
33
Various Function Representations
• Numerical functions
– Linear regression
– Neural networks
– Support vector machines
• Symbolic functions
– Decision trees
– Rules in propositional logic
– Rules in first-order predicate logic
• Instance-based functions
– Nearest-neighbor
– Case-based
• Probabilistic Graphical Models
– Naïve Bayes
– Bayesian networks
– Hidden-Markov Models (HMMs)
– Probabilistic Context Free Grammars (PCFGs)
– Markov networks

34
Various Search/Optimization Algorithms

• Gradient descent
– Perceptron
– Backpropagation
• Dynamic Programming
– HMM Learning
– PCFG Learning
• Divide and Conquer
– Decision tree induction
– Rule learning
• Evolutionary Computation
– Genetic Algorithms (GAs)
– Genetic Programming (GP)
– Neuro-evolution

35
Evaluation

• Accuracy
• Precision and recall
• Squared error
• Likelihood
• Posterior probability
• Cost / Utility
• Margin
• Entropy
• K-L divergence
• etc.

36
Clustering

37
Example: Clusters & Outliers

x
x
xx x
x x
x x x x x
x x x x x
x xx x xx x
x x x x
x x

x x x
x x x x
x x x
x
Outlier Cluster

38
The Problem of Clustering

• Given a set of points, with a notion of distance between points, group the points into
some number of clusters, so that
✓ Members of a cluster are close/similar to each other
✓ Members of different clusters are dissimilar
• Usually:
✓ Points are in a high-dimensional space
✓ Similarity is defined using a distance measure
▪ Euclidean, Cosine, Jaccard, edit distance, …

39
Clustering is a hard problem!

40
Why is it hard?

• Clustering in two dimensions looks easy

• Clustering small amounts of data looks easy
• And in most cases, looks are not deceiving

• Many applications involve not 2, but 10 or 10,000 dimensions

• High-dimensional spaces look different: Almost all pairs of points are at about the
same distance

41
High Dimensional Data

• Given a cloud of data points we want to understand its structure

42
Clustering Problem: Documents

Finding topics:
• Represent a document by a vector
(x1, x2,…, xk), where xi = 1 iff the i th word
(in some order) appears in the document
✓ It actually doesn’t matter if k is infinite; i.e., we don’t limit the set of words

• Documents with similar sets of words may be about the same topic

43
Cosine, Jaccard, and Euclidean

• We have a choice when we think of documents as sets of words:

✓ Sets as vectors: Measure similarity by the cosine distance
✓ Sets as sets: Measure similarity by the Jaccard distance
✓ Sets as points: Measure similarity by Euclidean distance

44
Overview: Methods of Clustering

• Hierarchical:
✓ Agglomerative (bottom up):
▪ Initially, each point is a cluster
▪ Repeatedly combine the two
“nearest” clusters into one
✓ Divisive (top down):
▪ Start with one cluster and
recursively split it

• Point assignment:
✓ Maintain a set of clusters
✓ Points belong to “nearest” cluster

45
Hierarchical Clustering

• Key operation:
Repeatedly combine
two nearest clusters

• Three important questions:

1) How do you represent a cluster of more than one point?
2) How do you determine the “nearness” of clusters?
3) When to stop combining clusters?

46
Hierarchical Clustering

• Key operation: Repeatedly combine two nearest clusters

• (1) How to represent a cluster of many points?
✓ Key problem: As you merge clusters, how do you represent the “location” of each
cluster, to tell which pair of clusters is closest?
• Euclidean case: each cluster has a
centroid = average of its (data)points
• (2) How to determine “nearness” of clusters?
✓ Measure cluster distances by distances of centroids

47
Example: Hierarchical clustering

(5,3)
o
(1,2)
o
x (1.5,1.5) x (4.7,1.3)
x (1,1) o (2,1) o (4,1)
x (4.5,0.5)
o (0,0) o (5,0)

Data:
o … data point
x … centroid
Dendrogram
48
And in the Non-Euclidean Case?

What about the Non-Euclidean case?

• The only “locations” we can talk about are the points themselves
✓ i.e., there is no “average” of two points

• Approach 1:
✓ (1) How to represent a cluster of many points?
clustroid = (data)point “closest” to other points
✓ (2) How do you determine the “nearness” of clusters? Treat clustroid as if it were
centroid, when computing inter-cluster distances

49
“Closest” Point?

• (1) How to represent a cluster of many points?

clustroid = point “closest” to other points
• Possible meanings of “closest”:
✓ Smallest maximum distance to other points
✓ Smallest average distance to other points
✓ Smallest sum of squares of distances to other points
▪ For distance metric d clustroid c of cluster C is:
min
c
 d
xC
( x , c ) 2

Datapoint Centroid

X Centroid is the avg. of all (data)points

in the cluster. This means centroid is
Clustroid an “artificial” point.
Cluster on Clustroid is an existing (data)point
3 datapoints that is “closest” to all other points in
the cluster.
50
Defining “Nearness” of Clusters

• (2) How do you determine the “nearness” of clusters?

✓ Approach 2:
Intercluster distance = minimum of the distances between any two points, one from
each cluster
✓ Approach 3:
Pick a notion of “cohesion” of clusters, e.g., maximum distance from the clustroid
▪ Merge clusters whose union is most cohesive

51
Cohesion

• Approach 3.1: Use the diameter of the merged cluster = maximum distance between
points in the cluster
• Approach 3.2: Use the average distance between points in the cluster
• Approach 3.3: Use a density-based approach
✓ Take the diameter or avg. distance, e.g., and divide by the number of points in the
cluster

52
Implementation

• Naïve implementation of hierarchical clustering:

✓ At each step, compute pairwise distances between all pairs of clusters, then merge
✓ O(N 3)

• Careful implementation using priority queue can reduce time to O(N 2 log N)
✓ Still too expensive for really big datasets that do not fit in memory

53
K–means Algorithm(s)

• Assumes Euclidean space/distance

• Start by picking k, the number of clusters

• Initialize clusters by picking one point per cluster

✓ Example: Pick one point at random, then k-1 other points, each as far away as
possible from the previous points

54
Populating Clusters

• 1) For each point, place it in the cluster whose current centroid it is nearest

• 2) After all points are assigned, update the locations of centroids of the k clusters

• 3) Reassign all points to their closest centroid

✓ Sometimes moves points between clusters

• Repeat 2 and 3 until convergence

✓ Convergence: Points don’t move between clusters and centroids stabilize

55
Example: Assigning Clusters

x
x
x
x
x

x x x x x x

x … data point
… centroid Clusters after round 1
56
Example: Assigning Clusters

x
x
x
x
x

x x x x x x

x … data point
… centroid Clusters after round 2
57
Example: Assigning Clusters

x
x
x
x
x

x x x x x x

x … data point
… centroid Clusters at the end
58
Getting the k right

How to select k?
• Try different k, looking at the change in the average distance to centroid as k
increases
• Average falls rapidly until right k, then changes little

Best value
of k
Average
distance to
centroid k

59
Example: Picking k

Too few; x
x
many long
xx x
distances
x x
to centroid. x x x x x
x x x x x
x xx x xx x
x x x x
x x

x x x
x x x x
x x x
x

60
Example: Picking k

x
Just right; x
distances xx x
rather short. x x
x x x x x
x x x x x
x xx x xx x
x x x x
x x

x x x
x x x x
x x x
x

61
Example: Picking k

Too many; x
little improvement x
in average xx x
distance. x x
x x x x x
x x x x x
x xx x xx x
x x x x
x x

x x x
x x x x
x x x
x

62
Application Example: Landmark Engine

63
Landmark Engine

• Google developed a landmark recognition engine that identifies specific landmarks by

clustering different images of the same landmark.

64
Why photos?

• People tend to take many photos when they visit popular places.
• As user-generated content (UGC), photos have various metadata, location taken (geo-
coordinate), taken time, scene (visual features), tags, …

User-generated contents (UGC)

Mention Tag Taken time Location

65
Why photos?

• Diverse information of venues can be extracted from photos.

• As always 1% of venues are emerging places, we need automatic completion of
knowledge base.

66
Google Data

• Meta-info: a photo is a tuple containing the unique photo ID, tagged GPS coordinates
in terms of latitude and longitude, text tag, and uploader id.
• World-scale: 2240 landmarks from 812 cities in 104 countries
• Large-scale: 21.4 million potential landmark images

67
Our Dataset

68
Our Dataset

• City-scale and intermediate-scale of photos that we can handle in your PC.

• We will use thousands of photos taken in Seattle!

69
Basic Method

• Pipeline
✓ Geographical clustering: The photos taken in the same landmark are likely to be
geographically close.
✓ Visual clustering: The photos taken in the same landmark are likely to be visually
similar, sharing the similar scene.
✓ Textual clustering or matching: The photos taken in the same landmark are likely to
share the similar tags.
✓ You can change this pipeline or add another phases here. This overview is a simple
guidance for your warm start.

Landmark
photos clusters
Textual
Geographical Visual
clustering/
clustering clustering
matching

70
Phase I: Geographical clustering

• The photos taken in the same landmark are likely to be geographically close.
• What are clustering algorithms suitable for this?

71
Phase I: Geographical clustering

• The photos taken in the same landmark are likely to be geographically close.
• What are clustering algorithms suitable for this?

72
Phase I: Geographical clustering

• Recommended algorithm: Meanshift

• Does not require info for the number of clusters
• One of density-based algorithms (e.g., Meanshift, DBSCAN)
• This algorithm tries to track the most dense clusters

73
Phase I: Geographical clustering

• Recommended algorithm: Meanshift

• Sklearn (python ML library) provides tutorial and implementation

74
Phase I: Geographical clustering

• Why is K-means not recommended?

✓ If you want, it is also available.
✓ But, you have to pre-define the number of clusters, which is a challenge issue.
✓ And, note that you don’t need to consider all of outlier photos (maybe these photos
are not landmark photos).

75
Phase I: Geographical clustering

• Limitation of geographical clustering

✓ Despite their similar geo-coordinates, two photos may depict different landmarks
✓ Despite their distant geo-coordinates, three photos may depict the same landmark

(a) Landmark disambiguation is necessary (b) Landmark resolution is necessary

76
Phase II: Visual clustering

• We can group visually-similar photos as visual clusters in each of geo-clusters.

✓ It is computationally efficient compared to the visual clustering on the whole photo set.
✓ It is more accurate compared to the visual clustering on the whole photo set.

• How can we perform visual clustering? (guide)

✓ Compute pairwise visual similarities among photos (in one geo-cluster)
✓ Perform graph clustering (avoiding high-dimensional issue)

77
Phase II: Visual clustering

• How can we perform visual clustering? (guide)

✓ Compute pairwise visual similarities among photos (in one geo-cluster)

Identify shared objects between images, using Microsoft Bundler*

Construct an adjacency graph between images with the shared object.

*Bundler : http://phototour.cs.washington.edu/bundler/

78
Phase II: Visual clustering

• How can we perform visual clustering? (guide)

✓ Compute pairwise visual similarities among photos (in one geo-cluster)
For each image, SIFT* generates a set of key points that describe the image.
Bundler reconstructs 3D structure for the images using key points from SIFT.

3D reconstruction of object

photo 3

photo 1
photo 2

*SIFT : http://www.cs.ubc.ca/~lowe/keypoints/

79
Phase II: Visual clustering

• How can we perform visual clustering? (guide)

✓ Compute pairwise visual similarities among photos (in one geo-cluster)

object
(Eiffel tower??)

80
Phase II: Visual clustering

• How can we perform visual clustering? (guide)

✓ Compute pairwise visual similarities among photos (in one geo-cluster)
✓ Perform graph clustering (avoiding high-dimensional issue)

A set of images, |N|

Pairwise similarities, < |N|x |N|

i.e., graph

Cut based graph partitioning

(optional)
81
https://www.cs.cornell.edu/~snavely/bundler/bundler-v0.4-manual.html
Phase II: Visual clustering

• One geo-cluster (before visual clustering)

82
Phase II: Visual clustering

• Visual clusters in one geo-cluster

83
Phase III: Textual clustering/matching

• The photos taken in the same landmark are likely to share the similar tags

Bag of words

Clustering or merging

Bag of words

84
Phase III: Textual clustering/matching

• Technique 1: Naïve similarities

✓ Jaccard similarity
✓ Cosine similarity

85
Phase III: Textual clustering/matching

• Technique 2: Stopword removal

✓ Tags frequently occurred in many clusters negatively affect similarity matching
✓ Travel
✓ People
✓ Seattle
✓…

86
Phase III: Textual clustering/matching

• Technique 3: TFIDF scoring

✓ Typical scoring method for search engine
✓ In our work, document = visual cluster

87
Phase III: Textual clustering/matching

• Technique 4: Word embedding

✓ We can more accurately compute the semantic similarity between two tags

88
Phase III: Textual clustering/matching

• Technique 4: Word embedding

✓ We can more accurately compute the semantic similarity between two tags

https://radimrehurek.com/gensim/models/word2vec.html

89
Linear Regression

90
Predicting exam score: regression

x (hours) y (score)

10 90

9 80

3 50

2 30

91
Regression (data)

3
x y

1 1 2

Y
2 2
1

3 3
0
0 1 2 3
X

92
(Linear) Hypothesis

2
Y

0
0 1 2 3
X

93
(Linear) Hypothesis

H (x ) = W x + b
3

2
Y

0
0 1 2 3
X

Which hypothesis is better?

94
Cost Function

• How fit the line to our (training) data

H (x ) — y

2
Y

0
0 1 2 3
X

95
Cost Function and Optimization

96
Optimization

• 𝐻(𝑊)∗ = 𝑎𝑟𝑔𝑚𝑖𝑛𝐻(𝑊) 𝑐𝑜𝑠𝑡(𝐻 𝑊 , 𝑦)

✓ 𝑦 ∼ 𝑦ො
• Update 𝑊 → 𝑊 + Δ𝑊 only if cost 𝑊 + Δ𝑊 < cost(𝑊)
• Finish it when cost 𝑊 + Δ𝑊 == cost(𝑊)
• How can we find Δ𝑊 so that cost 𝑊 + Δ𝑊 < cost(𝑊)?
✓ Gradient Descent

97
Optimization

• What cost(W) looks like?

x y • W= 1, cost(W) = 0

1 1

2 2 • W= 0, cost(W)=4.67
3 3

• W= 2, cost(W)=4.67
98
Optimization

99
Gradient descent algorithm

• Minimize cost function

✓ Gradient descent is used many minimization problems
✓ For a given cost function, cost(W,b), it will find W,b to
minimize cost
✓ It can be applied to more general function: cost(w1, w2, …)

How would you find the lowest point? 100

Gradient descent algorithm

101
Gradient descent algorithm

102
Gradient descent algorithm

103
Gradient descent algorithm

+
+
+
+
104
Convex function

105
Multivariable Linear Regression

106
Predicting exam score:
regression using one input (x)

x (hours) y (score)

10 90
one-variable
one-feature 9 80

3 50

2 60

11 40

107
Predicting exam score:
regression using three inputs (x1, x2, x3)

multi-variable/feature
x1 (quiz 1) x2 (quiz 2) x3 (midterm 1) Y (final)
73 80 75 152
93 88 93 185
89 91 90 180
96 98 100 196
73 66 70 142
Test Scores for General Psychology

108
Hypothesis and Cost Function

109
Multi-variable

110
Matrix Multiplication

111
Matrix Multiplication

x1 x2 x3 Y
73 80 75 152
93 88 93 185
89 91 90 180
96 98 100 196
73 66 70 142
Test Scores for General Psychology

112
Matrix Multiplication

x1 x2 x3 Y
73 80 75 152
93 88 93 185
89 91 90 180
96 98 100 196
73 66 70 142
Test Scores for General Psychology

113
Matrix Multiplication

• instances

[5, 3] [3, 1] [5, 1]

114
Matrix Multiplication

• n output

?
[n, 3] [?, ?] [n, 2]

115
Matrix Multiplication

• n output

[n, 3] [3, 2] [n, 2]

116
Logistic Regression and Multinomial Classification

117
9
8

September Arctic Sea Ice Extent

(1,000,000 sq km)
6

Regression 5
4
3
2
1
0
1970 1980 1990 2000 2010 2020
Year

Breast Cancer (Malignant / Benign)

Classification 1(Malignant)

0(Benign)
Tumor Size 118
Sigmoid

Y= 𝑾𝑻 X
𝟏
𝑯 𝒙 = 𝑻𝑿
𝟏 + 𝒆−𝑾

119
— log(H (x)) :y= 1
Cost(H (x), y) =
— log(1 — H (x)) : y = 0

cost (H (x), y) = — ylog(H (x)) — (1 — y)log(1 — H (x)) 120

Softmax

121
Regression and Classification Loss Functions
Mean Squared Error (MSE)

Cross Entropy

122
Application Example

123
Thank you!

124

Lecture 1 - Introduction To Machine Learning
No ratings yet
Lecture 1 - Introduction To Machine Learning
35 pages
Machine Learning
No ratings yet
Machine Learning
26 pages
L02 Fundamentals of ML
No ratings yet
L02 Fundamentals of ML
39 pages
1 - Introduction
No ratings yet
1 - Introduction
82 pages
WEEK 01 Merged
No ratings yet
WEEK 01 Merged
606 pages
UNIT I 1 ML Introduction To ML Well Posed Learning Problem
No ratings yet
UNIT I 1 ML Introduction To ML Well Posed Learning Problem
48 pages
Lecture 1 - Introduction (DONE!!)
No ratings yet
Lecture 1 - Introduction (DONE!!)
33 pages
01 Introduction 1
No ratings yet
01 Introduction 1
71 pages
Introduction To ML P2
No ratings yet
Introduction To ML P2
30 pages
01 Introduction
No ratings yet
01 Introduction
43 pages
ML - 1 - Sovan - Introduction To ML
No ratings yet
ML - 1 - Sovan - Introduction To ML
83 pages
Ch7 Introduction To Machine Learning
No ratings yet
Ch7 Introduction To Machine Learning
29 pages
Unit-1 ML
No ratings yet
Unit-1 ML
19 pages
01 Introduction
No ratings yet
01 Introduction
51 pages
Module 1
No ratings yet
Module 1
175 pages
Lecture 1
No ratings yet
Lecture 1
47 pages
01 - ML - Introduction
No ratings yet
01 - ML - Introduction
65 pages
Day 2 Part 1
No ratings yet
Day 2 Part 1
52 pages
Lec-1 Introduction
No ratings yet
Lec-1 Introduction
65 pages
01 Introduction
No ratings yet
01 Introduction
50 pages
Presentation On ML
No ratings yet
Presentation On ML
469 pages
Chapter 2
No ratings yet
Chapter 2
35 pages
Unit 01
No ratings yet
Unit 01
32 pages
Machine Learning INTRO
No ratings yet
Machine Learning INTRO
12 pages
Unit-1 MLT
No ratings yet
Unit-1 MLT
51 pages
Chapter 5 Machine Learning
No ratings yet
Chapter 5 Machine Learning
96 pages
AML All Merged PDF Class 1 To 8
No ratings yet
AML All Merged PDF Class 1 To 8
423 pages
Machine Learning Notes
100% (3)
Machine Learning Notes
134 pages
ML Introduction
No ratings yet
ML Introduction
54 pages
Introduction 1175
No ratings yet
Introduction 1175
58 pages
5 Le
No ratings yet
5 Le
36 pages
Unit1 2
No ratings yet
Unit1 2
101 pages
ML Lecture#1
No ratings yet
ML Lecture#1
52 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
45 pages
1.0 Introduction
No ratings yet
1.0 Introduction
50 pages
CHP 1
No ratings yet
CHP 1
47 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
15 pages
Unit I
No ratings yet
Unit I
132 pages
ML Doc1
No ratings yet
ML Doc1
14 pages
Chapter 01 Introduction To Machine Learning
No ratings yet
Chapter 01 Introduction To Machine Learning
59 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
48 pages
86 37 196 Mod 5
No ratings yet
86 37 196 Mod 5
52 pages
Military AI-Week 02-Key Concept Machine Learning
No ratings yet
Military AI-Week 02-Key Concept Machine Learning
84 pages
2021 Machine Learning Intro
No ratings yet
2021 Machine Learning Intro
43 pages
Practicalintroductiontomachinelearning1561472049990 PDF
No ratings yet
Practicalintroductiontomachinelearning1561472049990 PDF
110 pages
AI Chapter 5
No ratings yet
AI Chapter 5
31 pages
Lecture 1 - Introduction
No ratings yet
Lecture 1 - Introduction
49 pages
Chapter 1 Introduction To Machine Learning
No ratings yet
Chapter 1 Introduction To Machine Learning
29 pages
Intro To ML
No ratings yet
Intro To ML
107 pages
Week 01
No ratings yet
Week 01
37 pages
Machine Learning Week2
No ratings yet
Machine Learning Week2
51 pages
Unit I MACHINE LEARNING
No ratings yet
Unit I MACHINE LEARNING
87 pages
Mlintro 2
No ratings yet
Mlintro 2
28 pages
DS - NLP
No ratings yet
DS - NLP
39 pages
Lec1 - Introduction
No ratings yet
Lec1 - Introduction
55 pages
AML - Mid Term - Merged
No ratings yet
AML - Mid Term - Merged
192 pages
Lect3 Machine Learning
No ratings yet
Lect3 Machine Learning
27 pages
Lecture 1.1. Introduction
No ratings yet
Lecture 1.1. Introduction
48 pages
Unit 1 ML
No ratings yet
Unit 1 ML
70 pages
The AI Artificial Intelligence Course From Beginner to Expert
From Everand
The AI Artificial Intelligence Course From Beginner to Expert
Asomoo Ebooks
No ratings yet
Obciążenie Oblodzeniem
No ratings yet
Obciążenie Oblodzeniem
14 pages
Lexicology Summary 1
No ratings yet
Lexicology Summary 1
1 page
Reflective Essay 1
No ratings yet
Reflective Essay 1
2 pages
Gmail - 1st International Conference On Advances in Computing, Communication and Networking (ICAC2N2024) - Submission (295) Has Been Created
No ratings yet
Gmail - 1st International Conference On Advances in Computing, Communication and Networking (ICAC2N2024) - Submission (295) Has Been Created
2 pages
CTPAT Job Aid - Personnel Training Checklist Sample - October 2021
No ratings yet
CTPAT Job Aid - Personnel Training Checklist Sample - October 2021
4 pages
On Case Study Method of Teaching
No ratings yet
On Case Study Method of Teaching
36 pages
Price-Rexroth Hydraulics Division
78% (9)
Price-Rexroth Hydraulics Division
512 pages
ESD Assignment
No ratings yet
ESD Assignment
14 pages
HDBS Parameters
No ratings yet
HDBS Parameters
1 page
Modul English PSPK
No ratings yet
Modul English PSPK
139 pages
Unity TCP Open Block Library Users Manual
No ratings yet
Unity TCP Open Block Library Users Manual
124 pages
Impact of HL On QOL
No ratings yet
Impact of HL On QOL
8 pages
Ans-C01 7
No ratings yet
Ans-C01 7
17 pages
Four Dimension of Cloud Cube Model
No ratings yet
Four Dimension of Cloud Cube Model
2 pages
Liebherr LTM 1150-6.1 Product Advantages
100% (2)
Liebherr LTM 1150-6.1 Product Advantages
18 pages
BDA Lab Manual R22
0% (1)
BDA Lab Manual R22
70 pages
DE-13 - Quiz 8
No ratings yet
DE-13 - Quiz 8
2 pages
Kinematic Diagrams
No ratings yet
Kinematic Diagrams
16 pages
D-155 - 3 Cylinder Diesel Engine (01/75 - 12/85) 00 - Complete Machine 04-02 - Piston and Cylinder Sleeve
No ratings yet
D-155 - 3 Cylinder Diesel Engine (01/75 - 12/85) 00 - Complete Machine 04-02 - Piston and Cylinder Sleeve
4 pages
Expression of Interest Bhushan - 1
No ratings yet
Expression of Interest Bhushan - 1
6 pages
Unit 13 Listening
No ratings yet
Unit 13 Listening
1 page
How To Be Secure From Social Engineering Attack
No ratings yet
How To Be Secure From Social Engineering Attack
3 pages
Anaesthetic Considerations Patients With Mitral Valve Prolapse
No ratings yet
Anaesthetic Considerations Patients With Mitral Valve Prolapse
7 pages
Mechanical Engineering Seminars
No ratings yet
Mechanical Engineering Seminars
1 page
CASA - Advisory-Circular-119-12-Human-Factors-Principles-Non-Technical-Skills-Training-Assessment-Air-Transport-Operations
No ratings yet
CASA - Advisory-Circular-119-12-Human-Factors-Principles-Non-Technical-Skills-Training-Assessment-Air-Transport-Operations
34 pages
IELTS Simon Speaking Part 3 9dee133876
No ratings yet
IELTS Simon Speaking Part 3 9dee133876
37 pages
The Manual For The Quality Management of Educational Programmes in Myanmar
100% (1)
The Manual For The Quality Management of Educational Programmes in Myanmar
160 pages
e173e01748436895588d98e68888233a
No ratings yet
e173e01748436895588d98e68888233a
10 pages
CTSD-Lab Mannual Final - 241204 - 102238
No ratings yet
CTSD-Lab Mannual Final - 241204 - 102238
54 pages
Group 2 - Aspects of Connected Speech
No ratings yet
Group 2 - Aspects of Connected Speech
31 pages