0% found this document useful (0 votes)
18 views

03 Machine Learning Overview

The document provides an overview of machine learning as part of an artificial intelligence course taught by Marco Bonzanini. It covers the differences between machine learning and programming, various applications of machine learning, tasks such as supervised and unsupervised learning, and the machine learning process including modeling, feature engineering, and challenges like overfitting and underfitting. Key concepts like item representation, feature selection, and scaling are also discussed.

Uploaded by

hrhee1atl
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

03 Machine Learning Overview

The document provides an overview of machine learning as part of an artificial intelligence course taught by Marco Bonzanini. It covers the differences between machine learning and programming, various applications of machine learning, tasks such as supervised and unsupervised learning, and the machine learning process including modeling, feature engineering, and challenges like overfitting and underfitting. Key concepts like item representation, feature selection, and scaling are also discussed.

Uploaded by

hrhee1atl
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Machine Learning

Overview
Course: Artificial Intelligence
Fundamentals

Instructor: Marco Bonzanini


Machine Learning vs Programming

Rules
Programming Answers
Data

Answers
Machine Learning Rules
Data

Ref: Deep Learning with Python, F. Chollet, 2017.


Examples of ML
Applications
• Filtering Emails (Spam Detection)

• Automatic Trading

• Fraud Detection

• Self-driving cars

• Playing chess/poker/go

• Recommending products / items / services


Machine Learning Tasks

Supervised Unsupervised
Discrete Data

Classification Clustering
(predict a label) (group similar items)

Continuous Data
Dimensionality
Regression Reduction
(predict a quantity) (reduce n. of variables)
Machine Learning Process

• Exercise:
— Search “machine learning stages” (or steps, or
process) on Google
— Find dozens of “The X stages of Machine
Learning” articles

• No standard process?!
Recap: CRISP-DM
Recap: CRISP-DM
Machine Learning Process

• What’s the problem you’re trying to solve?


(identify ML task)

• What ML algorithms are available for such task?

• How does the data set look like?


(enough data? need labelled data? need pre-
processing?)
ML Modelling
• Step 1: Learning (a.k.a. Training)
— Batch process (could take hours/days)
— “Learn” from the data
— Output: your “model”

• Step 2: Prediction (a.k.a. Testing)


— Given a trained model, make a prediction on
new, unseen data
— Output: depends on the task
Example: classification task

Ref: Mastering Social Media Mining with Python, M. Bonzanini, 2016.


ML Terminology

• Item or Sample: the “objects” we’re dealing with

• Item representation (e.g. a vector)

• Features: the attributes of an item (e.g. elements of


a vector)
Item Representation

• We can use any type of attributes

• Numerical features

• Categorical features → one-hot encoding

• Text → bag-of-words
Item Representation
Item Representation
One-hot Encoding

Rome = [1, 0, 0, 0, 0, 0, …, 0]
Paris = [0, 1, 0, 0, 0, 0, …, 0]
Italy = [0, 0, 1, 0, 0, 0, …, 0]
France = [0, 0, 0, 1, 0, 0, …, 0]
Feature Engineering

• Using domain knowledge of the data to create


features that make ML algorithms work

• Fundamental, difficult, expensive, time-consuming

• Quality and quantity of features can have a big


impact on the final result
Feature Selection
• Dimensionality!
How many words in the English vocabulary?
How many unique tokens on the Web?

• Using millions of features is not feasible for some


classifiers

• Reducing training time

• Can improve generalisation, e.g. eliminate noise,


avoid overfitting
Feature Selection
• Define a utility function A(f, c)
For a given class c, for all features f, compute the
value of A(f, c) and only use the k features with the
highest utility

• Example: Term Frequency


- Discard words that appear in many documents
- Discard words that appear in a very small number
of documents
Feature Scaling
• a.k.a. data normalisation

• Different features may have different range of values

• Many algorithms use a concept of “distance”,


therefore features with a broad range will dominate

• After scaling, features will contribute equally to the


distance
Feature Scaling (2)

• Many options for scaling

• “Standardisation”: zero-mean and unit-variance


Overfitting and Underfitting

• Symptom: your ML model doesn’t perform well


outside of your test environment

• Possible cause: generalisation is hard!

• More precisely:
— Overfitting
— Underfitting
Overfitting
• Your model learns the details of the training data
set “too well”

• Good performance on the given data set,


but not on new data sets

• Noise and random fluctuations in your training data


treated as important information

• Possible solution: cross-validation


Underfitting

• Less discussed (it’s clear since the beginning)

• Your model performs badly with the given data set,


and doesn’t generalise to new data

• Possible solution: move on (change feature


engineering, feature selection, or ML algorithm
altogether)
Questions?

You might also like