0% found this document useful (0 votes)
18 views41 pages

23ECE205 FoDS 13 Introduction To ML

Uploaded by

rohithdhoni86
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views41 pages

23ECE205 FoDS 13 Introduction To ML

Uploaded by

rohithdhoni86
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 41

Om

23ECE205 Foundations of
Data Science

Introduction to Machine
Learning
Dr. Binoy B. Nair

1
1
Introduction

• Why Machine Learning?

• What Is Machine Learning?

• What Kind of Data Can Be Mined?

• What Technology Are Used?

• What Kind of Applications Are Targeted?

2
Why Machine Learning?

• The Explosive Growth of Data: from terabytes to petabytes


• Data collection and data availability
• Automated data collection tools, database systems, Web,
computerized society
• Major sources of abundant data
• Business: Web, e-commerce, transactions, stocks, …
• Science: Remote sensing, bioinformatics, scientific simulation, …
• Society and everyone: news, digital cameras, YouTube

• We are drowning in data, but starving for knowledge!


• “Necessity is the mother of invention”—Machine Learning—
Automated analysis of massive data sets
3
4
Introduction

• Why Machine Learning?

• What Is Machine Learning?

• What Kind of Data Can Be Mined?

• What Kinds of Patterns Can Be Mined?

• What Technology Are Used?

• What Kind of Applications Are Targeted?

5
What Is Machine Learning?

• Machine Learning (knowledge discovery from


data)

• Extraction of interesting (non-trivial, implicit,


previously unknown and potentially useful)
patterns or knowledge from huge amount of data

• Alternative names:
Knowledge discovery (mining) in databases (KDD),
Data Mining, knowledge extraction, data/pattern
analysis, data archeology, data dredging, information
harvesting, business intelligence, etc. 6
Machine Learning
Workflow

7
In other words

8
Types of ‘Learning’ in Machine
Learning

9
Types of ‘Learning’ in Machine
Learning
Machine Learning

Unsupervised Reinforcement
Supervised Learning
Learning Learning

Classificatio
Clustering
n

Dimensionality
Regression
Reduction

Anomaly Detection

Association Rule Mining

10
Supervised Learning

1
1
Supervised Learning

Supervised learning is a type of machine


learning where a model is trained on a
labeled dataset.

In this context, "labeled" means that each


training example has input data (features)
and the corresponding correct output (label
or target).

The goal of supervised learning is to learn a


mapping from inputs to outputs, so the model
can predict the correct output for new,
unseen inputs.
12
Key Components of Supervised
Learning
1. Training Data: A dataset that includes both input
features (e.g., Age, Income) and their corresponding
labels (e.g., 'Fiction' or 'NonFiction’).
2. Model: A mathematical function or algorithm that
maps inputs to outputs.
3. Loss Function: Measures how well the model's
predictions match the actual labels. The model is
trained to minimize this loss.
4. Optimization Algorithm: Adjusts the model's
parameters (weights) to improve predictions by
minimizing the loss function, commonly using
methods like gradient descent.

13
Types of Supervised Learning

Most common types of supervised learning:

1.Classification: Predicts a discrete label


(e.g., classifying emails as 'spam' or 'not
spam’).

2.Regression: Predicts a continuous value


(e.g., predicting house prices based on
features like size, location).

1
4
Classification: Example- Predict survival on the Titanic [3]

• On April 15, 1912, during her maiden voyage, RMS


Titanic sank after collision with an iceberg, killing 1502
out of 2224 passengers and crew.

• Although there was some element of luck involved in


surviving the sinking, some groups of people were
more likely to survive.

• We try to use classification to predict which passengers


are more likely to survive such future tragedies
(assuming that we are still living in 1912 and that our
hero has booked next boat ticket to USA via the same
route)
15
Classification: Example- Titanic
dataset Issues like
The classification Each column is
missing data
dataset will always
are very called a feature
have the observed
class. Here we have
common or attribute
two classes denoted
by 0 and 1

Each row
is called
an
observatio Attributes can be
n or a numeric, logical,
sample ordinal, nominal
or one of several
other types 16
Classification: Example- Survival on
Titanic

• The result of classification


using decision tree
classifier (we will learn how
this is obtained, later) is
given alongside.

• Let’s now check what would


be the possible outcome for
our hero Jack, given that he
is a male, 20 year old person
with no parent or child Jack might wind
accompanying him, looking up dead
at the rule generated by the
classifier:

17
How a ‘clean’ dataset might look
like
input features o/p class
Sampl Sepa Sepa
e No. Petal
l l Petal
Lengt Species
lengt widt width
h Class
h h
Label
Featur 1 5 3.5 6 1.5 Setosa
for the
es 2 5 3.2 6 1.5 Setosa feature
s
3 4 4 5.5 1.4 Setosa
4 7.4 9 2 2 Setosa
Sl. No.
(not a 5 7 9 2 5 Versicolor
feature) 6 7 8 1.2 5 Versicolor
7 8.6 6 1.5 6 Versicolor
8 8 7 2.5 6.4 Versicolor 18
Unsupervised Learning

1
9
Unsupervised Learning

Unsupervised learning is a type of ML


where the model is trained on data that
does not have labeled outputs.

Unsupervised learning focuses on finding


hidden patterns, structures, or
relationships within the input data without
the guidance of known labels.

20
Key Characteristics of Unsupervised
Learning

• No Labels: The dataset contains only input data


(features) without corresponding outputs or labels.

• Pattern Discovery: The model's objective is to learn


the underlying structure or distribution in the data,
such as grouping similar examples together or finding
relationships between features.

• Exploratory: Unsupervised learning is often used for


exploratory analysis to understand the data better or as
a preprocessing step for other tasks.

21
Common Tasks in Unsupervised
Learning
1. Clustering: The process of grouping data points based on their
similarity.
• Example: Grouping customers into different market segments based on
purchasing behavior.
• Algorithms: k-means, hierarchical clustering, DBSCAN.
2. Dimensionality Reduction: Reducing the number of input
variables while retaining the most important information.
• Example: Reducing the number of features in an image dataset for
visualization or speeding up computation.
• Algorithms: PCA (Principal Component Analysis), t-SNE.
3. Anomaly Detection: Identifying rare or unusual data points
that don't fit the general pattern.
• Example: Detecting fraudulent transactions in banking data.
4. Association: Finding rules that describe relationships between
variables in the data.
• Example: Market basket analysis, where you identify items frequently
bought together in a store.
• Algorithms: Apriori, FP-Tree.
22
Association Rule Mining Example: Market
Basket Analysis

What the store


wants to know
from your
purchase list

Fig. MB analysis [4]

What it does with


the mined
associations

23
Association Rule Mining

A typical transaction database from a shop [5]


Assume that each
transaction no.
denotes one purchase
session
Each row
is called a
transactio
n

Rule
antecedent Rules derived will typically be of the form:
Rule
Soy milk => Orange Juice consequent 24
Unsupervised Learning Applications

Customer Segmentation in marketing

Anomaly Detection in security or fraud


detection

Data Compression or feature extraction

Recommender Systems based on association


rules

25
Example: Text document clustering

26
Partitional, Hierarchical, Density
Popular Based Clustering
Unsupervis
ed
Principal Component Analysis
Learning (PCA)
Algorithms

Autoencoders

2
7
Reinforcement Learning

2
8
Reinforcement learning

• RL is a type of machine learning where an agent learns


to make decisions by interacting with an environment
in order to maximize some notion of cumulative reward.
• Unlike supervised learning, where the correct input-
output pairs are provided, or unsupervised learning,
where the goal is to find hidden patterns in data,
reinforcement learning focuses on learning from the
consequences of actions taken within an environment.

29
Reinforcement Learning Working
1. Interaction: The agent interacts with the environment by observing
the current state, choosing an action, and receiving feedback in the
form of a reward.
2. Feedback: After taking an action, the environment moves to a new
state, and the agent receives a reward (positive or negative) based on
the action's outcome.
3. Learning: The agent updates its understanding of the environment
(typically by updating the value function or policy) based on the
reward and the new state.
4. Exploration vs. Exploitation: The agent must balance exploring new
actions (to find better strategies) with exploiting known actions that
give good rewards. This balance is crucial for maximizing long-term
rewards.

30
Applications of Reinforcement
Learning

Game Playing: RL has been used to train


agents to play games like chess (AlphaZero),
Go, and Atari video games.

Robotics: Training robots to perform tasks like


walking, grasping objects, or flying drones.

Self-Driving Cars: Learning to navigate in


complex environments with various inputs
and outcomes.

31
Introduction

• Why Machine Learning?

• What Is Machine Learning?

• What Kind of Data Can Be Mined?

• What Technology Are Used?

• What Kind of Applications Are Targeted?

32
Machine Learning: On What Kinds of
Data?
Database-oriented data sets and
applications
• Relational database, data warehouse, transactional
database
Advanced data sets and advanced
applications
• Data streams and sensor data
• Time-series data, temporal data, sequence data
(incl. bio-sequences)
• Heterogeneous databases and legacy databases
• Spatial data and spatiotemporal data
• Multimedia database
• Text databases
• The World-Wide Web

33
Introduction

• Why Machine Learning?

• What Is Machine Learning?

• A Multi-Dimensional View of Machine Learning

• What Kind of Data Can Be Mined?

• What Kinds of Patterns Can Be Mined?

• What Technology Are Used?

• What Kind of Applications Are Targeted?

• Major Issues in Machine Learning

34
Machine Learning: Confluence of Multiple
Disciplines

Statistics

Database Machine Learning Visualization


Technology

High-Performance
Computing

35
Why Confluence of Multiple Disciplines?

• Tremendous amount of data


• Algorithms must be highly scalable to handle such as tera-
bytes of data

• High-dimensionality of data
• Micro-array may have tens of thousands of dimensions

• High complexity of data


• Data streams and sensor data
• Time-series data, temporal data, sequence data
• Structure data, graphs, social networks and multi-linked data
• Heterogeneous databases and legacy databases
• Spatial, spatiotemporal, multimedia, text and Web data
• Software programs, scientific simulations 36
Introduction

• Why Machine Learning?

• What Is Machine Learning?

• A Multi-Dimensional View of Machine Learning

• What Kind of Data Can Be Mined?

• What Kinds of Patterns Can Be Mined?

• What Technology Are Used?

• What Kind of Applications Are Targeted?

37
Applications- Actual Story so
Far

38
Midjourney: overview shot of
three dutch happy 40-year-
old woman chatting in a 39
1. J. Han , M. Kamber and J Pei, Data Mining: Concepts
and Techniques. Morgan Kaufmann, 3rd ed., 2011
2. Is free will a matter of being a conscious outlier?,
Available online:
https://baldscientist.wordpress.com/2013/02/02/is-
free-will-a-matter-of-being-a-conscious-outlier/, Last
accessed: Jan 1,2016
3. Hermann Mucke, , Data Mining in Drug
Development and Translational Medicine Overview,
Recommend Data Mining in Drug Development and Translational
Medicine, Available online:
ed http://www.insightpharmareports.com/data_mining/,
Last accessed: Jan 1,2016.
Reference 4. Peter Bajcsy, Introduction to Data Mining, Available
online:
Books http://www.slideshare.net/p2045i/introduction-to-
data-mining, Last accessed: Jan 1,2016.
5. Machine learning and Data Mining - Association
Analysis with Python, Available online:
http://aimotion.blogspot.in/2013/01/ machine-
learning-and-data-mining.html, Last accessed: Jan
1,2016.
6. Titanic dataset, Available online:
https://www.kaggle.com/c/titanic/data, Last
accessed: Jan 1,2016
7. R. O. Duda, P. E. Hart, and D. G. Stork, Pattern
Classification, 2ed., Wiley-Interscience, 2000

4
0
Questions??

41

You might also like