0% found this document useful (0 votes)
15 views25 pages

ML Basic Concepts

Uploaded by

pooja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views25 pages

ML Basic Concepts

Uploaded by

pooja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Machine Learning

Basic Concepts

!"#$%&"'('
Feature'2'

!"#$%&"')'
Feature'1' *"+,-,./'0.%/1#&2'
Terminology
Machine Learning, Data Science, Data Mining, Data Analysis, Sta-
tistical Learning, Knowledge Discovery in Databases, Pattern Dis-
covery.
Data everywhere!
1. Google: processes 24 peta bytes of data per day.

2. Facebook: 10 million photos uploaded every hour.

3. Youtube: 1 hour of video uploaded every second.

4. Twitter: 400 million tweets per day.

5. Astronomy: Satellite data is in hundreds of PB.

6. . . .

7. “By 2020 the digital universe will reach 44


zettabytes...”

The Digital Universe of Opportunities: Rich Data and the


Increasing Value of the Internet of Things, April 2014.
That’s 44 trillion gigabytes!
SPECIAL
SPECIAL
OFFER
OFFER

o X
Python Machine Deep SQL Excel
Learning Learning

R Language Power BiTableau Statistics Job


Assistance

DATA SCIENCE & ANALYTICS

COMBO COURSE

Available at just
259941$337,42) 9999 ($13499)
Data types
Data comes in different sizes and also flavors (types):

 Texts

 Numbers

 Clickstreams

 Graphs

 Tables

 Images

 Transactions

 Videos

 Some or all of the above!


Smile, we are ’DATAFIED’ !
• Wherever we go, we are “datafied”.

• Smartphones are tracking our locations.

• We leave a data trail in our web browsing.

• Interaction in social networks.

• Privacy is an important issue in Data Science.


The Data Science process
1 DATA COLLECTION 2 DATA PREPARATION 3 EDA
A!and!B!!!C!
! Static Data!cleaning! Descriptive
DB% Static
Data. statistics,
Data. Feature/variable! Clustering
engineering!
Research
questions?
e

Domain DB
m
Ti

expertise

5 Visualization 4 MACHINE LEARNING


+
Application
+
Classification, -
deployment scoring, predictive -
+ +

+
models, - +
- - +

clustering, density -
-
+

estimation, etc. Model%(f)%


Data-driven
decisions Predicted%class/risk%
Dashboard Yes!/!
90%!
Applications of ML
• We all use it on a daily basis. Examples:
Machine Learning
• Spam filtering
• Credit card fraud detection
• Digit recognition on checks, zip codes
• Detecting faces in images
• MRI image analysis
• Recommendation system
• Search engines
• Handwriting recognition
• Scene classification
• etc...
Interdisciplinary field

Statistics!

Biology! Visualization!

Engineering ! ML! Economics!

Signal
processing! Databases!
ML versus Statistics
Statistics: Machine Learning:

• Hypothesis testing • Decision trees


• Experimental design • Rule induction
• Anova • Neural Networks
• Linear regression • SVMs
• Logistic regression • Clustering method
• GLM • Association rules
• PCA • Feature selection
• Visualization
• Graphical models
• Genetic algorithm

http://statweb.stanford.edu/~jhf/ftp/dm-stat.pdf
Machine Learning definition

“How do we create computer programs that improve with experi-


ence?”
Tom Mitchell
http://videolectures.net/mlas06_mitchell_itm/
Machine Learning definition

“How do we create computer programs that improve with experi-


ence?”
Tom Mitchell
http://videolectures.net/mlas06_mitchell_itm/

“A computer program is said to learn from experience E with


respect to some class of tasks T and performance measure P , if
its performance at tasks in T , as measured by P , improves with
experience E. ”
Tom Mitchell. Machine Learning 1997.
Supervised vs. Unsupervised
Given: Training data: (x1, y1), . . . , (xn, yn) / xi ∈ Rd and yi is the
label.

example x1 → x11 x12 ... x1d y1 ← label


... ... ... ... ... ...
example xi → xi1 xi2 ... xid yi ← label
... ... ... ... ... ...
example xn → xn1 xn2 ... xnd yn ← label
Supervised vs. Unsupervised
Given: Training data: (x1, y1), . . . , (xn, yn) / xi ∈ Rd and yi is the
label.

example x1 → x11 x12 ... x1d y1 ← label


... ... ... ... ... ...
example xi → xi1 xi2 ... xid yi ← label
... ... ... ... ... ...
example xn → xn1 xn2 ... xnd yn ← label
Supervised vs. Unsupervised

Unsupervised learning:
Learning a model from unlabeled data.

Supervised learning:
Learning a model from labeled data.
Unsupervised Learning
Training data:“examples” x.

x1, . . . , xn, xi ∈ X ⊂ Rn

• Clustering/segmentation:

f : Rd −→ {C1, . . . Ck } (set of clusters).

Example: Find clusters in the population, fruits, species.


Unsupervised learning

Feature'2'

Feature'1'
Unsupervised learning

Feature'2'

Feature'1'
Unsupervised learning

Feature'2'

Feature'1'
Methods: K-means, gaussian mixtures, hierarchical clustering,
spectral clustering, etc.
Supervised learning
Training data:“examples” x with “labels” y.

(x1, y1), . . . , (xn, yn) / xi ∈ Rd

• Classification: y is discrete. To simplify, y ∈ {−1, +1}

f : Rd −→ {−1, +1} f is called a binary classifier.

Example: Approve credit yes/no, spam/ham, banana/orange.


Supervised learning

!"#$%&"'('

!"#$%&"')'
Supervised learning

!"#$%&"'('

!"#$%&"')'
*"+,-,./'0.%/1#&2'
Supervised learning

!"#$%&"'('

!"#$%&"')'
*"+,-,./'0.%/1#&2'
Methods: Support Vector Machines, neural networks, decision
trees, K-nearest neighbors, naive Bayes, etc.
Supervised learning
Classification:

!"#$%&"'('

!"#$%&"')'

You might also like