23ECE205 FoDS 13 Introduction To ML
23ECE205 FoDS 13 Introduction To ML
23ECE205 Foundations of
Data Science
Introduction to Machine
Learning
Dr. Binoy B. Nair
1
1
Introduction
2
Why Machine Learning?
5
What Is Machine Learning?
• Alternative names:
Knowledge discovery (mining) in databases (KDD),
Data Mining, knowledge extraction, data/pattern
analysis, data archeology, data dredging, information
harvesting, business intelligence, etc. 6
Machine Learning
Workflow
7
In other words
8
Types of ‘Learning’ in Machine
Learning
9
Types of ‘Learning’ in Machine
Learning
Machine Learning
Unsupervised Reinforcement
Supervised Learning
Learning Learning
Classificatio
Clustering
n
Dimensionality
Regression
Reduction
Anomaly Detection
10
Supervised Learning
1
1
Supervised Learning
13
Types of Supervised Learning
1
4
Classification: Example- Predict survival on the Titanic [3]
Each row
is called
an
observatio Attributes can be
n or a numeric, logical,
sample ordinal, nominal
or one of several
other types 16
Classification: Example- Survival on
Titanic
17
How a ‘clean’ dataset might look
like
input features o/p class
Sampl Sepa Sepa
e No. Petal
l l Petal
Lengt Species
lengt widt width
h Class
h h
Label
Featur 1 5 3.5 6 1.5 Setosa
for the
es 2 5 3.2 6 1.5 Setosa feature
s
3 4 4 5.5 1.4 Setosa
4 7.4 9 2 2 Setosa
Sl. No.
(not a 5 7 9 2 5 Versicolor
feature) 6 7 8 1.2 5 Versicolor
7 8.6 6 1.5 6 Versicolor
8 8 7 2.5 6.4 Versicolor 18
Unsupervised Learning
1
9
Unsupervised Learning
20
Key Characteristics of Unsupervised
Learning
21
Common Tasks in Unsupervised
Learning
1. Clustering: The process of grouping data points based on their
similarity.
• Example: Grouping customers into different market segments based on
purchasing behavior.
• Algorithms: k-means, hierarchical clustering, DBSCAN.
2. Dimensionality Reduction: Reducing the number of input
variables while retaining the most important information.
• Example: Reducing the number of features in an image dataset for
visualization or speeding up computation.
• Algorithms: PCA (Principal Component Analysis), t-SNE.
3. Anomaly Detection: Identifying rare or unusual data points
that don't fit the general pattern.
• Example: Detecting fraudulent transactions in banking data.
4. Association: Finding rules that describe relationships between
variables in the data.
• Example: Market basket analysis, where you identify items frequently
bought together in a store.
• Algorithms: Apriori, FP-Tree.
22
Association Rule Mining Example: Market
Basket Analysis
23
Association Rule Mining
Rule
antecedent Rules derived will typically be of the form:
Rule
Soy milk => Orange Juice consequent 24
Unsupervised Learning Applications
25
Example: Text document clustering
26
Partitional, Hierarchical, Density
Popular Based Clustering
Unsupervis
ed
Principal Component Analysis
Learning (PCA)
Algorithms
Autoencoders
2
7
Reinforcement Learning
2
8
Reinforcement learning
29
Reinforcement Learning Working
1. Interaction: The agent interacts with the environment by observing
the current state, choosing an action, and receiving feedback in the
form of a reward.
2. Feedback: After taking an action, the environment moves to a new
state, and the agent receives a reward (positive or negative) based on
the action's outcome.
3. Learning: The agent updates its understanding of the environment
(typically by updating the value function or policy) based on the
reward and the new state.
4. Exploration vs. Exploitation: The agent must balance exploring new
actions (to find better strategies) with exploiting known actions that
give good rewards. This balance is crucial for maximizing long-term
rewards.
30
Applications of Reinforcement
Learning
31
Introduction
32
Machine Learning: On What Kinds of
Data?
Database-oriented data sets and
applications
• Relational database, data warehouse, transactional
database
Advanced data sets and advanced
applications
• Data streams and sensor data
• Time-series data, temporal data, sequence data
(incl. bio-sequences)
• Heterogeneous databases and legacy databases
• Spatial data and spatiotemporal data
• Multimedia database
• Text databases
• The World-Wide Web
33
Introduction
34
Machine Learning: Confluence of Multiple
Disciplines
Statistics
High-Performance
Computing
35
Why Confluence of Multiple Disciplines?
• High-dimensionality of data
• Micro-array may have tens of thousands of dimensions
37
Applications- Actual Story so
Far
38
Midjourney: overview shot of
three dutch happy 40-year-
old woman chatting in a 39
1. J. Han , M. Kamber and J Pei, Data Mining: Concepts
and Techniques. Morgan Kaufmann, 3rd ed., 2011
2. Is free will a matter of being a conscious outlier?,
Available online:
https://baldscientist.wordpress.com/2013/02/02/is-
free-will-a-matter-of-being-a-conscious-outlier/, Last
accessed: Jan 1,2016
3. Hermann Mucke, , Data Mining in Drug
Development and Translational Medicine Overview,
Recommend Data Mining in Drug Development and Translational
Medicine, Available online:
ed http://www.insightpharmareports.com/data_mining/,
Last accessed: Jan 1,2016.
Reference 4. Peter Bajcsy, Introduction to Data Mining, Available
online:
Books http://www.slideshare.net/p2045i/introduction-to-
data-mining, Last accessed: Jan 1,2016.
5. Machine learning and Data Mining - Association
Analysis with Python, Available online:
http://aimotion.blogspot.in/2013/01/ machine-
learning-and-data-mining.html, Last accessed: Jan
1,2016.
6. Titanic dataset, Available online:
https://www.kaggle.com/c/titanic/data, Last
accessed: Jan 1,2016
7. R. O. Duda, P. E. Hart, and D. G. Stork, Pattern
Classification, 2ed., Wiley-Interscience, 2000
4
0
Questions??
41