0% found this document useful (0 votes)
64 views35 pages

Helsenki - Intro To ML

This document provides an introduction to a course on machine learning, describing what machine learning is through examples like playing tic-tac-toe, spam filtering, and face recognition. It discusses how machine learning algorithms can learn from large amounts of data to improve performance on tasks like predicting search queries, ranking search results, detecting credit card fraud, and enabling self-driving cars. The course will cover basic machine learning concepts and techniques and their application to real-world problems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views35 pages

Helsenki - Intro To ML

This document provides an introduction to a course on machine learning, describing what machine learning is through examples like playing tic-tac-toe, spam filtering, and face recognition. It discusses how machine learning algorithms can learn from large amounts of data to improve performance on tasks like predicting search queries, ranking search results, detecting credit card fraud, and enabling self-driving cars. The course will cover basic machine learning concepts and techniques and their application to real-world problems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

582631 — 5 credits

Introduction to Machine Learning

Lecturer: Teemu Roos


Assistant: Ville Hyvönen

Department of Computer Science


University of Helsinki

material created by Patrik Hoyer, Jyrki Kivinen and others

November 1st–December 16th 2016

1,
Introduction

I What is machine learning? Motivation & examples


I Definition
I Relation to other fields
I Examples

I Course outline and related courses

I Practical details of the course


I Lectures
I Exercises
I Exam
I Grading

2,
What is machine learning?

I Definition:
machine = computer, computer program (in this course)
learning = improving performance on a given task, based
on experience / examples

I In other words
I instead of the programmer writing explicit rules for how to
solve a given problem, the programmer instructs the computer
how to learn from examples
I in many cases the computer program can even become better
at the task than the programmer is!

3,
Example 1: tic-tac-toe
I How to program the computer to play tic-tac-toe?

I Option A: The programmer writes explicit rules, e.g. ‘if the


opponent has two in a row, and the third is free, stop it by
placing your mark there’, etc (lots of work, difficult, not at all
scalable!)

I Option B: Go through the game tree, choose optimally (for


non-trivial games, must be combined with some heuristics to
restrict tree size)

I Option C: Let the computer try out various strategies by


playing against itself and others, and noting which strategies
lead to winning and which to losing (=‘machine learning’)
4,
I Arthur Samuel (50’s and 60’s):
I Computer program that learns to play checkers
I Program plays against itself thousands of times, learns which
positions are good and which are bad (i.e. which lead to
winning and which to losing)
I The computer program eventually becomes much better than
the programmer.

5,
Example 2: spam filter

I Programmer writes rules: “If it contains ‘viagra’ then it is


spam.” (difficult, not user-adaptive)
Example 2: spam filter
I The user marks which mails are spam, which are legit, and the
� X is the set of all possible emails (strings)
computer learns itself what words are predictive
� Y is the set { spam, non-spam }

From: [email protected]
Subject: viagra spam
cheap meds...
From: [email protected]
Subject: important information
non-spam
here’s how to ace the test...
.. ..
. .

From: [email protected]
Subject: you need to see this ?
how to win $1,000,000...

6, 6,
Example 3: face recognition

I Face recognition is hot (facebook, apple; security; . . . )

I Programmer writes rules: “If short dark hair, big nose, then it
is Mikko” (impossible! how do we judge the size of the nose?!)

I The computer is shown many (image, name) example pairs,


and the computer learns which features of the images are
predictive (difficult, but not impossible)

...

patrik antti doris patrik ... ?

7,
Problem setup

I One definition of machine learning: A computer program


improves its performance on a given task with
experience (i.e. examples, data).

I So we need to separate
I Task: What is the problem that the program is solving?

I Performance measure: How is the performance of the program


(when solving the given task) evaluated?

I Experience: What is the data (examples) that the program is


using to improve its performance?

8,
Related scientific disciplines (1)

I Artificial Intelligence (AI)


I Machine learning can be seen as ‘one approach’ towards
implementing ‘intelligent’ machines (or at least machines that
behave in a seemingly intelligent way).

I Artificial neural networks, computational neuroscience


I Inspired by and trying to mimic the function of biological
brains, in order to make computers that learn from experience.
Modern machine learning really grew out of the neural
networks boom in the 1980’s and early 1990’s.

I Pattern recognition
I Recognizing objects and identifying people in controlled or
uncontrolled settings, from images, audio, etc. Such tasks
typically require machine learning techniques.

9,
Availability of data

I These days it is very easy to


I collect data (sensors are cheap, much information digital)
I store data (hard drives are big and cheap)
I transmit data (essentially free on the internet).

I The result? Everybody is collecting large quantities of data.


I Businesses: shops (market-basket data), search engines (web
pages and user queries), financial sector (stocks, bonds,
currencies etc), manufacturing (sensors of all kinds), social
networking sites (facebook, twitter), anybody with a web
server (hits, user activity)
I Science: genomes sequenced, gene expression data,
experiments in high-energy physics, images of remote galaxies,
global ecosystem monitoring data, drug research and
development, public health data

I But how to benefit from it? Analysis is becoming key!


10 ,
Big Data

I one definition: data of a very large size, typically to the extent


that its manipulation and management present significant
logistical challenges (Oxford English Dictionary)

I 3V: volume, velocity, and variety (Doug Laney, 2001)

I a database may be able to handle a lot of data, but you can’t


implement a machine learning algorithm as an SQL query

I on this course we do not consider technical issues relating to


extremely large data sets

I basic principles of machine learning still apply, but many


algorithms may be difficult to implement efficiently

11 ,
Related scientific disciplines (2)

I Data mining
I Trying to identify interesting and useful associations and
patterns in huge datasets
I Focus on scalable algorithms
I Example: shopping basket analysis

I Statistics
I historically, introductory courses on statistics tend to focus on
hypothesis testing and some other basic problems
I however there’s a lot more to statistics than hypothesis testing
I there is a lot of interaction between research in machine
learning, data mining and statistics

12 ,
Example 4

I Prediction of search queries


I The programmer provides a standard dictionary (words and
expressions change!)
I Previous search queries are used as examples!

13 ,
Example 5

I Ranking search results:


I Various criteria for
ranking results
I What do users click on
after a given search?
Search engines can
learn what users are
looking for by
collecting queries and
the resulting clicks.

14 ,
Example 6

I Detecting credit card fraud


I Credit card companies typically end up
paying for fraud (stolen cards, stolen card
numbers)
I Useful to try to detect fraud, for instance
large transactions
I Important to be adaptive to the behaviors
of customers, i.e. learn from existing data
how users normally behave, and try to
detect ‘unusual’ transactions

15 ,
Example 7

I Self-driving cars:
I Sensors (radars,
cameras) superior to
humans
I How to make the
computer react
appropriately to the
sensor data?

16 ,
Example 8

I Character recognition:
I Automatically sorting
mail (handwritten
characters)
I Digitizing old books
and newspapers into
easily searchable
format (printed
characters)

17 ,
Example 9

I Recommendation systems
(‘collaborative filtering’):
I Amazon: ”Customers
who bought X also
bought Y ”...
I Netflix: ”Based on

tar
ens
your movie ratings, you

en

go

Ava
Leo
Ali
Sev

Far
might enjoy...”
Challenge: One million Linda 4 5 5 1 2
dollars ($1,000,000) 3 4 3
prize money recently Jack 1 4 1 5 1
awarded! Bill ? 4 1 ?
Lucy 2 1 1 5
John 1 1 4 5
4 5 5
2 3 3

18 ,
Example 10

I Machine translation:
I Traditional approach: Dictionary and explicit grammar
I More recently, statistical machine translation based on
example data is increasingly being used

19 ,
Example 11

I Online store website


optimization:
I What items to present,
what layout?
I What colors to use?
I Can significantly affect
sales volume
I Experiment, and
analyze the results!
(lots of decisions on
how exactly to
experiment and how to
ensure meaningful
results)

20 ,
Example 12

I Mining chat and discussion forums


I Breaking news
I Detecting outbreaks of infectious disease
I Tracking consumer sentiment about companies / products

21 ,
Example 13

I Real-time sales and


inventory management
I Picking up quickly on
new trends (what’s hot
at the moment?)
I Deciding on what to
produce or order

22 ,
Example 14

I Prediction of friends in Facebook, or prediction of who you’d


like to follow on Twitter.

23 ,
What about privacy?

I Users are surprisingly willing to sacrifice privacy to obtain


useful services and benefits

I Regardless of what position you take on this issue, it is


important to know what can and what cannot be done with
various types information (i.e. what the dangers are)

I ‘Privacy-preserving data mining’


I What type of statistics/data can be released without exposing
sensitive personal information? (e.g. government statistics)
I Developing data mining algorithms that limit exposure of user
data (e.g. ‘Collaborative filtering with privacy’, Canny 2002)

24 ,
Course outline

I Introduction

I Ingredients of machine learning


I task
I models
I data

I Supervised learning
I classification
I regression
I evaluation and model selection

I Unsupervised learning
I clustering
I matrix decompositions
25 ,
Related courses

I Various advaced CS courses (Spring 2017):


I Probabilistic Models (period III, plus optional project)
I Project in Practical Machine Learning (period III)
I Advanced Course in Machine Learning (period IV)
I Data Mining (self study, plus optional project)
I Big Data Frameworks (period IV)
I Seminar of Reinforcement Learning and Information Retrieval
(period III)

I A number of other specialized courses at CS department

I A number of courses at maths+stats

I Lots of courses at Aalto as well

26 ,
Practical details (1)

I Lectures:
I November 1st (today) – December 16th
I Tuesdays and Fridays at 10:15–12:00 in Exactum CK112
I Lecturer: Teemu Roos
(Exactum A322, [email protected])
I Language: English
I Based on parts of the course textbook (next slide)
I (previous instances of this course have used different
textbooks)

27 ,
Practical details (2)

I Textbook:
I authors: Gareth James, Daniela Witten,
Trevor Hastie and Robert Tibshirani
I title: An Introduction to Statistical
Learning – with Applications in R
I publisher: Springer (2013, first edition)
I web page:
www-bcf.usc.edu/~gareth/ISL/

I we’ll probably cover the whole book except


splines and generalized additive models
(GAMs) – and include some additional
Bayesian stuff

28 ,
Practical details (3)

I Lecture material
I this set of slides (by Hoyer/Kivinen/Roos) is intended for use
as part of the actual lectures, together with the blackboard etc.
I we will cover some topics in more detail than the textbook
(and some less)
I in particular some additional detail is needed for homework
problems
I both the selected parts of the textbook as well as additional
material indicated on the course homepage are required
material for the exam

29 ,
Practical details (4)
I Exercises (Thu 14:15, Fri 12:15):
I Two kinds:
I mathematical exercises (pen-and-paper)
I computer exercises (support given in R but Python is a good
choice too)
I Problem set handed out every Friday, focusing on topics from
that week’s lectures
I Solutions returned at the exercise sessions
I For those without an exercise group: crash the party in either
group until we either create a new group or vacancy is created
otherwise
I If necessary, solutions can be turned in by email
([email protected]) but pity the poor TA who
checks piles of exercises weekly)
I Language of exercise sessions: English
I Exercise points make up 40% of your total grade, must get at
least half the points to be eligible for the course exam.
30 ,
Practical details (5)

I Exercises this week:


I Exercise session this week are voluntary R “tutorials”.
I Instruction on R and its features used on this course
I Voluntary, no points awarded. Recommended for everyone not
previously familiar with R.
I Bring you own laptop, with R (and possible RStudio)
installed.

31 ,
Practical details (6)

I Course exam:
I December 20th at 8:00am (sorry, not my choice)
I Constitutes 60% of your course grade
I Must get a minimum of half the points of the exam to pass
the course
I Pen-and-paper problems, similar style as in exercises (also
‘essay’ or ‘explain’ problems)

I (Note: To be eligible to take a ‘separate exam’ you need to first


complete some programming assignments. These will be available
on the course web page a bit later. However since you are here at
the lecture, this probably does not concern you.)

I You may answer exam problems also in Finnish or Swedish.

32 ,
Practical details (8)

I Prerequisites:
I Mathematics: Basics of probability theory and statistics, linear
algebra (i.e., vectors and matrices) and real analysis (i.e.,
derivatives, etc.)
I Computer science: Good programming skills (but no previous
familiarity with R necessary)

33 ,
Practical details (9)

I Course material:
I Webpage (public information about the course):
www.cs.helsinki.fi/en/courses/582631/2016/s/k/1

I NB: You should have signed up on the department


registration system

I Help?
I Ask the assistants/lecturer at exercises/lectures
I Contact assistants/lecturer separately

34 ,
We’re in this together. Let’s do it!

35 ,

You might also like