Machine Learning
Machine Learning
• J Carbonell, R. Michalski and T. Mitchell, Machine Learning: A Historical and Methodological Analysis, AI magazine,
1983.
Pioneers of AI
• John McCarthy (MIT and Stanford)
(September 4, 1927 – October 24, 2011)
• LISP (Garbage Collection)
• idea of using logical reasoning to decide on
actions
• Non-Monotonic Reasoning(Circumscription)
APProx, Efficient
Intelligence Information
Randomized Theory Entropy
DEEP LEARNING
Search is Ubiquitous
1. Exact Search
• Popular in Databases: answer to a query
• In Operating Systems: Grep
• Compilers: Symbol table
2. In AI/ML/DM search for getting an appropriate:
• Solution to a problem (Search for Problem Solving)
• Path to a goal in AI: Heuristic Search
• Representation (Representation Learning (DL) is search)
• Classification Model (Model Estimation is Search)
• Proximity Measure in Clustering (Constrained Search)
• Right Model for Regression (Model Learning is Search)
• Documents for a Query in Information retrieval (Search Engines)
3. Mathematical Models
• Logic and Probability: Inference is search
• Linear Algebra: Matrix factorization is search
• Optimization and Regularization: Search for a solution
• Information Theory: Search for purity (no entropy)
Vector-Space Models
• Input is an Abstraction and Output is also an abstraction in ML
• Chair and Human are represented as 2-D vectors (Height, Weight)
• Input is an abstracted Pattern: Vector; height is 5’ 10” is an abstraction
• Output could be an abstraction like a decision tree or a weight vector W.
• Lossy representation: we cannot get back the original pattern from the
vector
Lossy
Chair
Human
DATA TREND
Software Code
Bug Reports
Multi-Media Docs
Example: Handwritten Digit Recognition
3
8 8
and many more
Example: Signature Verification
• Preprocessing
• Different Features
• Wavelets
• Moments
• GA based weights for features
• Combine classifiers
• KNNC
• Threshold based
• More details in Ramesh and Murty, Patt.
Recog., 1999, pp. 217-233.
Genuine Simple forgeries Skilled forgeries
82% 100% 75%
(Expertise from SBI and Canara Bank
branches at IISc; Machines were
Sample Signatures comparable on Genuine and skilled)
Example: Credit Risk Assessment
If Delinquent A/Cs > 2 and Delay in payment > 1 Then Repay loan = No
If Delinquent A/Cs = 0 and ((Income > 50000) or (Years-of- Credit > 3)) Then
Repay loan = Yes
Examples of Applications
• Optical Character Recognition
• Handwritten: sorting letters by postal code, input device for PDA‘s
• Printed texts: reading machines for blind people, digitalization of text documents
• Biometrics
• Face recognition, verification, retrieval,
• Fingerprints recognition
• Speech recognition
• Health, Education, Transport, Agriculture, Finance
• Normal/Abnormal
• Easy/Difficult concepts
• Traffic violation
• Crop disease, crop yield
Practical Learning System
Application Domain Data Acquisition
Year
Rainfall
Evaporation
Temp
RelativeHumd
Chair Height
Human
Test Pattern
Matching
• In databases exact match is typically required.
• In ML approximations are the order of the day.
• Approximation algorithms area is popular.
• Matching is done with the help of wordnet, Wikipedia,
Twitter and FaceBook.
• In information retrieval, typically matching is based on
tf-idf weights; a band pass filter.
• Most of the phrases like machine learning, Pattern
Recognition, Data Mining, Indian Institute of Science
have mid-frequency words. Both rare words and
frequent words are ignored.
• A fundamental notion here is Zipf’s law.
Dimensionality Reduction
Class
Politics
Sports
Sports
(21, 1) – HipHop
(22, 0) – Dance
(28, 0) -- Acoustic
Representation using PCs
c *
E E E 2 −
* *
E
c −1
Classification
Learn a method for predicting the instance class from pre-labeled
(classified) instances
• W controls orientation
• b controls location
Chair Height
Human
Test Pattern: Human
k-Nearest Neighbor Classifier (KNNC)
We are given:
• A set of training patterns
• A proximity measure
• The value of K: Number of nearest neighbors
Algorithm:
Neighborhood Based Learning
NNC: T is in O
O 3NNC: T is in X
O
X Purity of Clustering:
O 7 Clusters: 5 are pure
T O
Feature 1
O
O 2 clusters have impurity:
X X 1/3 and 1/4
X X
O O Purity: 17/19
X O
O
X
X On cluster leaders
X
Feature 1 NNC: T is in O
19 patterns into 3NNC: T is in X
7 Clusters
Neighborhood Based Learning
On 7 cluster leaders
T O
Feature 1
O
X
X NNC: T is in O
O 3NNC: T is in X
X
Feature 1
19 patterns into
7 Clusters
K-Means Clustering Algorithm: Cluster 1
K-Means Clustering Algorithm: Cluster 2
2-d Plot of the Images
K-Means Clustering Algorithm: Misclassified
55
Neighborhood Based Learning: Regression
• The data is generated using y = 1 + x + x 2
• Suppose we select a random point in the interval
[0,1]; say 0.16
• We take 3 NNs of 0.16 among the x values; they are
0.2, 0.1, and 0.3, the respective values of y are
1.24, 1.11, 1.39. The average value, 𝑦ො is
approximately 1.25 ( this is the predicted value of y
for 0.16, even 0.17 and many more.)
• The target value is 1.1856; there is an error
• If we take the value of K=5, the NNs are 0.2, 0.1,
0.3, 0, 0.4 with y values 1.24, 1.11, 1.39, 1, 1.56
respectively. The average value of y is 1.26
• The target value is 1.1856
Weighted Neighborhood Based Learning: Regression
• The data is generated using y = 1 + x + x 2
• Suppose we select a random point in the interval
[0,1]; say 0.16
• We take 3 NNs of 0.16 among the x values; they are
0.2, 0.1, and 0.3, the respective values of y are
1.24, 1.11, 1.39. The weighted average value, 𝑦ො is
1.24 1.11 1.39
approximately 1.22 ( + + ) / 48.8
0.04 0.06 0.14
• The target value is 1.1856; there is an error
• Nearer neighbours contribute more and farther
neighbours less.
𝑑𝑚𝑎𝑥 − 𝑑𝑖
• In general, 𝑊𝑖 =
𝑑𝑚𝑎𝑥 − 𝑑𝑚𝑖𝑛
• 𝑊1 = 1 and 𝑊𝑘 = 0 for kNNC