0% found this document useful (0 votes)
5 views

Machine Learning

Uploaded by

vijjunulu249
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Machine Learning

Uploaded by

vijjunulu249
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

Introduction

CHAPTER 1: MACHINE LEARNING – THEORY & PRACTICE


Artificial Intelligence ( Deep Learning?)

Databases; Man-Machine Interaction

Learning from examples


(supervised learning;
Machine
classification and regression) Pattern
Learning
Learning from observations Recognition
(unsupervised, clustering)

Data Mining (IBM) • Conventional AI


• LOGIC for AI
Rakesh • RULE BASED SYSTEMS
Agrawal • EXPERT SYSTEMS
• LISP, PROLOG
What is Machine Learning?
• Memorizing (reciting multiplication tables, chanting hymns, Coaching for JEE) – Rote
Learning (Google and Deep Learning); Chess masters spend a lot of their time
memorizing the great games of the past. It is this rote learning that teaches them
how to “think” in chess.
• Deduction (From A and A→ B deduce B) Computers can do only deduction- Learning
by Deduction (Game Playing )
• Abduction: taking your best guess based upon your current knowledge, observations
and context (from B and A→B infer A)- Learning by Abduction
• Induction (specifics to general) - Inductive learning
• 1. Learning from Examples- Supervised learning
• 2. Learning from observations – Clustering
• 3. Semi-Supervised Learning 4. Reinforcement Learning- agent, environment, reward
• Machine Learning label- 1959 by Arthur Samuel
• Learning Machines Book- Nils Nilsson in 1965
• Machine Learning – First course at IISc by PS Sastry in 1990.
• Never Ending Learning – Tom Mitchell’s current lab at CMU
J Carbonell, R. Michalski and T. Mitchell, Machine Learning: A Historical and Methodological Analysis, AI
magazine, 1983.
What is Machine Learning?
• Machine Learning is
• Induction (specifics to general) - Inductive learning
• Learning from Examples- Supervised learning
• Learning from observations - Clustering
• Journal of Machine Learning started in 1986; it is a springer publication
• Logic is AI and AI is logic was the slogan!
• Representation of data is crucial

• J Carbonell, R. Michalski and T. Mitchell, Machine Learning: A Historical and Methodological Analysis, AI magazine,
1983.
Pioneers of AI
• John McCarthy (MIT and Stanford)
(September 4, 1927 – October 24, 2011)
• LISP (Garbage Collection)
• idea of using logical reasoning to decide on
actions
• Non-Monotonic Reasoning(Circumscription)

• Marvin Minsky (MIT)


(August 9, 1927 – January 24, 2016)
• Co-Founder of AI Lab at MIT (CSAIL)
• Co-authored Perceptrons book which
analyses artificial neural networks
• FRAME as a knowledge structure

• Herbert Simon (CMU)


(June 15, 1916 – February 9, 2001)
• Logic Machines and General Problem Solver
• Preferential Attachment that can explain
power law degree distribution of social nets
Pioneers of AI
• Arthur Samuel (MIT, IBM, Stanford)
(December 5, 1901 – July 29, 1990)
• Coined the term Machine Learning in 1959
• AI for Game Playing

• Allen Newell (CMU)


(March 19, 1927 – July 19, 1992)
• Logic Theorist (Logic based Reasoning)
• GPS: Knowledge and Reasoning
• Chess Machine

• Nils Nilsson (Stanford)


• A* Heuristic Search Algorithm
• Planning and Robotics
• Early influential books on ML
Recent Developments
In the early days of ML, data fits in the main memory is the basis!
• More data is generated:
– Web, text, images ...
– Business transactions, calls, ...
– Scientific data: astronomy, biology, etc
• More data is captured:
– Storage technology faster and cheaper
– DBMS can handle bigger DB
• Data generation is faster than its Consumption!
• Better Computing power
• Big Data!
Machine Learning: Literature
• International Conference on Machine Learning (ICML) is the leading conference;
the first one in 1980. Need to submit code also.
• Recent one in July 2021
• Journal of Machine Learning Research (JMLR) was established in 2000; the first
issue came out in Oct 2000.
– The journal was established as an open-access alternative to the
journal Machine Learning; this was started in 1986.
– In 2000, forty editorial board members of Machine Learning resigned, saying
that in the era of the Internet, it was detrimental to publish in expensive journals
with pay-access.
• International Conference on Learning Representations (ICLR), recent one in May
2021: Open Review
– There is a challenge to investigate reproducibility of empirical results submitted to the
conference. Selected reproducibility efforts will be published as a special journal issue.
History
Representation of Datasets
A dataset can be represented in multiple ways:
1. A set of
• Vectors
• Sequences
2. A data matrix, D, of n rows and l columns
• Each row corresponds to a pattern/data point
• Each column corresponds to a feature
3. A data set is a collection of probability distributions
• Each class/group of patterns is a probability distribution
4. A dataset is represented as a network
• In neural networks the weights are learnt; in fact learning is change of weights
• In a social network the datapoints/entities are nodes and a link exists
between a pair of nodes if the pair of entities are related.
Dependence on Multiple Topics
• We test on Linear Algebra, Probability, Optimization, and Discrete
Maths before admitting into INTELLIGENT SYSTEMS at IISc
Clustering
Theorem Proving
Group, Relation
Linear
Logic Function, Graph Algebra Bayes, Max
Likelihood
Inverted Index
Data Discrete Probability
Structures Structures Statistics
Max Margin, Backpropagation

Algorithms Artificial Optimization

APProx, Efficient
Intelligence Information
Randomized Theory Entropy
DEEP LEARNING
Search is Ubiquitous
1. Exact Search
• Popular in Databases: answer to a query
• In Operating Systems: Grep
• Compilers: Symbol table
2. In AI/ML/DM search for getting an appropriate:
• Solution to a problem (Search for Problem Solving)
• Path to a goal in AI: Heuristic Search
• Representation (Representation Learning (DL) is search)
• Classification Model (Model Estimation is Search)
• Proximity Measure in Clustering (Constrained Search)
• Right Model for Regression (Model Learning is Search)
• Documents for a Query in Information retrieval (Search Engines)
3. Mathematical Models
• Logic and Probability: Inference is search
• Linear Algebra: Matrix factorization is search
• Optimization and Regularization: Search for a solution
• Information Theory: Search for purity (no entropy)
Vector-Space Models
• Input is an Abstraction and Output is also an abstraction in ML
• Chair and Human are represented as 2-D vectors (Height, Weight)
• Input is an abstracted Pattern: Vector; height is 5’ 10” is an abstraction
• Output could be an abstraction like a decision tree or a weight vector W.
• Lossy representation: we cannot get back the original pattern from the
vector

Lossy

Chair
Human

• Representation is still an open problem. (Even with DEEP LEARNING)


Representation is Crucial
Height of the Tallest Average Height of Crop Yield
Hillock people
2 meters 5 feet L
5 cm 5 feet 2 inches H
1 meter 6 feet L
10 meters 4 feet 5 inches H
7.5 meters 5 feet 1 inch L

Average height of people is 4 feet 8 inches; is the yield L or H?


• NNC declares that the class is H as the NN is the fourth pattern
• 5-NNC decides that the class is L for any test pattern
Examples of Data
• Handwritten characters:

• BAR CODE FINGERPRINT ANIMAL FOOTPRINT

DATA TREND
Software Code
Bug Reports
Multi-Media Docs
Example: Handwritten Digit Recognition

Wide variability of the same numeral!

• Handcrafted rules will result in large number of rules and


Exceptions
• Better to have a machine that learns from a large training
set
Classification

Training Examples New Example


of the form
1
0
Learning
6 Algorithm
Classifier

3
8 8
and many more
Example: Signature Verification
• Preprocessing
• Different Features
• Wavelets
• Moments
• GA based weights for features
• Combine classifiers
• KNNC
• Threshold based
• More details in Ramesh and Murty, Patt.
Recog., 1999, pp. 217-233.
Genuine Simple forgeries Skilled forgeries
82% 100% 75%
(Expertise from SBI and Canara Bank
branches at IISc; Machines were
Sample Signatures comparable on Genuine and skilled)
Example: Credit Risk Assessment

Rules Learned: ... ...

If Delinquent A/Cs > 2 and Delay in payment > 1 Then Repay loan = No
If Delinquent A/Cs = 0 and ((Income > 50000) or (Years-of- Credit > 3)) Then
Repay loan = Yes
Examples of Applications
• Optical Character Recognition
• Handwritten: sorting letters by postal code, input device for PDA‘s
• Printed texts: reading machines for blind people, digitalization of text documents
• Biometrics
• Face recognition, verification, retrieval,
• Fingerprints recognition
• Speech recognition
• Health, Education, Transport, Agriculture, Finance
• Normal/Abnormal
• Easy/Difficult concepts
• Traffic violation
• Crop disease, crop yield
Practical Learning System
Application Domain Data Acquisition

Feature Engineering = Preprocessing + Representation

Model Selection Choose Model Domain Knowledge

Model Learning Train the Model Training Data

Model Evaluation Validate Model Evaluation Data


(Validation Data)
Learnt Model

Model Explanation Explain Model Expert Feedback


Applications of ML in Agriculture
• Crop Management
• Yield Prediction
• Disease Detection
• Weed Detection
• Crop Quality
• Species Recognition
• Livestock Management
• Water Management
• Soil Management
Soybean Data (Yield in Bhopal District)

Year
Rainfall
Evaporation
Temp
RelativeHumd

Veenadhari S., Bharat Mishra, CD Singh, Soybean Productivity Modelling using


Decision Tree Algorithms, IJCA, 27(7):11-15, 2011.
Soybean Data
Decision Tree Classifier
Music Data (Python Programming with Mosh)
Music Data
Classification
Learn a method for predicting Many approaches:
the instance class from Exemplar-based learning: Learning is
pre-labeled (classified) instances achieved by facilitating Search and/or
matching: Nearest Neighbor
Weight Classifier (NNC)

Chair Height
Human
Test Pattern
Matching
• In databases exact match is typically required.
• In ML approximations are the order of the day.
• Approximation algorithms area is popular.
• Matching is done with the help of wordnet, Wikipedia,
Twitter and FaceBook.
• In information retrieval, typically matching is based on
tf-idf weights; a band pass filter.
• Most of the phrases like machine learning, Pattern
Recognition, Data Mining, Indian Institute of Science
have mid-frequency words. Both rare words and
frequent words are ignored.
• A fundamental notion here is Zipf’s law.
Dimensionality Reduction
Class
Politics
Sports

Sports

• Dimensionality reduced from l to k


• There could be other ways of dimensionality reduction
• One of them is based on Random Projections.
Classification

(21, 1) – HipHop
(22, 0) – Dance
(28, 0) -- Acoustic
Representation using PCs

Variance of each component:


[0.52787257 0.47212743]
Total Variance Explained: 100.0
Iris Flower Dataset
Iris Flower Dataset: Description
Variance of each component:
[0.72770452 0.23030523]
Total Variance Explained: 95.8
Regions using Petal Features
Classification Using the Petal Features
Nearest Neighbor Classifier (NNC)
• No training
• Very simple to understand and implement
• Computationally intensive on test patterns. There are data
structures and algorithms to speed up.
• In practice, if there are a large number of training
samples, the performance of nearest neighbor rule is
good. However, computational requirements go up.
• In theory, with an unlimited number of training samples,
the error rate is never worse than twice the optimal error.

 c *
E  E  E 2 −
* *
E 
 c −1 
Classification
Learn a method for predicting the instance class from pre-labeled
(classified) instances

Weight Many approaches:


KNNC

• NNC is not robust to noise

• 3-NNC uses majority class


label from 3 neighbors
Chair Height
Human
Test Pattern
Classification
Learn a method for predicting the instance class from pre-labeled
(classified) instances
Percptron, SVM, NBC
Linear Discriminants
Weight
...

• W controls orientation
• b controls location
Chair Height
Human
Test Pattern: Human
k-Nearest Neighbor Classifier (KNNC)
We are given:
• A set of training patterns
• A proximity measure
• The value of K: Number of nearest neighbors

To classify a test pattern T,


• Compute distance from T to all the training
patterns
• Obtain its K nearest neighbors
• Use class labels of K nearest neighbors and go for majority vote
• Change in value of K may effect classification; K = 1 is NNC
What is Clustering?
Clustering is Grouping of Data Clustering is data organization
10101 10101 11100
Group by
01011 10101 11100
Example: 10 Objects 10101 10111 columns 11110
01011 Group 01011 00111
01010 01011 00111
01010 by rows 01010 00011
10111 01010 00011
After clustering: • Similarity is based on number of
matching `1’s and `0’s
• Rows 1, 3, and 7 are grouped
together
Matching is based on colour • Similarly columns 1,3, and 5 are
in the same cluster
• Reorganization of rows and
Matching can be more general! columns
What is Clustering?
Leader Clustering Algorithm
Patterns in the vicinity of each other are assigned to the same cluster;
this is based on a threshold 𝜃 on distance

Algorithm:
Neighborhood Based Learning
NNC: T is in O
O 3NNC: T is in X
O
X Purity of Clustering:
O 7 Clusters: 5 are pure
T O
Feature 1

O
O 2 clusters have impurity:
X X 1/3 and 1/4
X X
O O Purity: 17/19
X O
O
X
X On cluster leaders
X
Feature 1 NNC: T is in O
19 patterns into 3NNC: T is in X
7 Clusters
Neighborhood Based Learning

On 7 cluster leaders
T O
Feature 1

O
X
X NNC: T is in O
O 3NNC: T is in X
X

Feature 1
19 patterns into
7 Clusters
K-Means Clustering Algorithm: Cluster 1
K-Means Clustering Algorithm: Cluster 2
2-d Plot of the Images
K-Means Clustering Algorithm: Misclassified

These are clustered wrongly into Cluster 1

Danny Varghese, https://towardsdatascience.com/image-clustering-using-transfer-


learning-df5862779571
5
5

Clustering is Matrix Factorization

• K-Means clustering can be characterized by matrix


factorization as shown in the above example.
• Outputs of several other clustering algorithms can be
represented using matrix factorization.

55
Neighborhood Based Learning: Regression
• The data is generated using y = 1 + x + x 2
• Suppose we select a random point in the interval
[0,1]; say 0.16
• We take 3 NNs of 0.16 among the x values; they are
0.2, 0.1, and 0.3, the respective values of y are
1.24, 1.11, 1.39. The average value, 𝑦ො is
approximately 1.25 ( this is the predicted value of y
for 0.16, even 0.17 and many more.)
• The target value is 1.1856; there is an error
• If we take the value of K=5, the NNs are 0.2, 0.1,
0.3, 0, 0.4 with y values 1.24, 1.11, 1.39, 1, 1.56
respectively. The average value of y is 1.26
• The target value is 1.1856
Weighted Neighborhood Based Learning: Regression
• The data is generated using y = 1 + x + x 2
• Suppose we select a random point in the interval
[0,1]; say 0.16
• We take 3 NNs of 0.16 among the x values; they are
0.2, 0.1, and 0.3, the respective values of y are
1.24, 1.11, 1.39. The weighted average value, 𝑦ො is
1.24 1.11 1.39
approximately 1.22 ( + + ) / 48.8
0.04 0.06 0.14
• The target value is 1.1856; there is an error
• Nearer neighbours contribute more and farther
neighbours less.
𝑑𝑚𝑎𝑥 − 𝑑𝑖
• In general, 𝑊𝑖 =
𝑑𝑚𝑎𝑥 − 𝑑𝑚𝑖𝑛
• 𝑊1 = 1 and 𝑊𝑘 = 0 for kNNC

You might also like