Lecture 1.2 Introduction to Machine Learning
Lecture 1.2 Introduction to Machine Learning
What is Learning?
•“Learning denotes changes in a system that ... enable a system to
do the same task … more efficiently the next time.” - Herbert Simon
•“Learning is constructing or modifying representations of what is
being experienced.” - Ryszard Michalski
•“Learning is making useful changes in our minds.” - Marvin Minsky
3
What We Talk About When We Talk
About“Learning”
•Learning general models from a data of particular examples
•Data is cheap and abundant (data warehouses, data marts);
knowledge is expensive and scarce.
•Example in retail: Customer transactions to consumer behavior:
People who bought “Da Vinci Code” also bought “The Five
People You Meet in Heaven” (www.amazon.com)
•Build a model that is a good and useful approximation to the
data.
4
So What Is Machine Learning?
•Automating automation
•Getting computers to program themselves
•Writing software is the bottleneck
•Let the data do the work instead!
What is Machine Learning?
•Machine Learning
•Study of algorithms that improve their performance at some
task with experience
•Optimize a performance criterion using example data or
past experience.
•Role of Statistics: Inference from a sample
•Role of Computer science: Efficient algorithms to
•Solve the optimization problem
•Representing and evaluating the model for inference
6
Why Machine Learning?
•No human experts
•industrial/manufacturing control
•mass spectrometer analysis, drug design, astronomic discovery
•Black-box human expertise
•face/handwriting/speech recognition
•driving a car, flying a plane
•Rapidly changing phenomena
•credit scoring, financial modeling
•diagnosis, fraud detection
•Need for customization/personalization
•personalized news reader
•movie/book recommendation
Traditional Programming
Data
Computer Output
Program
Machine Learning
Data
Computer Program
Output
Growth of Machine Learning
•Machine learning is preferred approach to
•Speech recognition, Natural language processing
•Computer vision
•Medical outcomes analysis
•Robot control
•Computational biology
•This trend is accelerating
•Improved machine learning algorithms
•Improved data capture, networking, faster computers
•Software too complex to write by hand
•New sensors / IO devices
•Demand for self-customization to user, environment
•It turns out to be difficult to extract knowledge from human expertsfailure of expert systems in the 1980’s.
9
Related Fields
data
mining control theory
statistics
decision theory
information theory machine
learning cognitive science
databases
psychological models
evolutionary neuroscience
models
•Combinatorial optimization
•E.g.: Greedy search
•Convex optimization
•E.g.: Gradient descent
•Constrained optimization
•E.g.: Linear programming
Types of Learning
23
Face Recognition
Training examples of a person
Test images
25
Supervised Learning: Uses
Example: decision trees tools that create rules
26
Unsupervised Learning
27
Reinforcement Learning
•Topics:
•Policies: what actions should an agent take in a particular situation
•Utility estimation: how good is a state (used by policy)
•No supervised output but delayed reward
•Credit assignment problem (what was responsible for the outcome)
•Applications:
•Game playing
•Robot in a maze
•Multiple agents, partial observability, ...
28
ML in Practice
1. Understanding domain, prior knowledge, and goals
2. Data integration, selection, cleaning, pre-processing,
etc.
3. Learning models
4. Interpreting results
5. Consolidating and deploying discovered knowledge
6. Loop
Summary
•Introductory course that covers a wide range of machine learning
techniques—from basic to state-of-the-art.
• You will learn about the methods you heard about: Naïve Bayes’, regression,
nearest-neighbor (kNN), decision trees, support vector machines, learning ensembles,
over-fitting, regularization, dimensionality reduction & PCA, parameter estimation,
mixture models, comparing models, density estimation, clustering centering on K-
means, EM, and DBSCAN, active and reinforcement learning.
•Covers algorithms, theory and applications
30
What We’ll Cover
•Supervised learning
•Decision tree induction
•Naïve Bayes’
•Regression
•nearest-neighbor (kNN)
•Over-fitting, Regularization,
•parameter estimation
•comparing models
•Neural networks
•Model ensembles
•Unsupervised learning
Clustering - clustering centering on K-means, EM, and DBSCAN,
Dimensionality reduction & PCA,
active and reinforcement learning.