0% found this document useful (0 votes)
21 views43 pages

Intro ML 1 Day

The document provides an overview of machine learning including definitions of key concepts like supervised and unsupervised learning. It also discusses machine learning models and algorithms like linear regression and decision trees. Types of errors in machine learning like bias, variance and overfitting are also explained.

Uploaded by

ravinyse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views43 pages

Intro ML 1 Day

The document provides an overview of machine learning including definitions of key concepts like supervised and unsupervised learning. It also discusses machine learning models and algorithms like linear regression and decision trees. Types of errors in machine learning like bias, variance and overfitting are also explained.

Uploaded by

ravinyse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Machine Learning

Written by Abhishek kaushik (Abhi)

~ Material is for educational purpose for the specific audience.


Distribution in any form is not allowed.
OVERVIEW
● About Me
● Data Incorporate
AI
● Data Description
Techniques
● Types of Learning
● ML Models
● Workflow Models
● Errors
● My Workflow
● Evaluation This Photo by Unknown Author is licensed
under CC BY-NC-ND

● Application
● Conclusion
ABOUT ME
● Qualification
PhD track (Ireland), M.Sc. (Germany) and
Bachelor of Technology (India)
● Teaching Experience & Industrial Experience
Dublin Business School
Dublin City University And TU Dublin
National College of Ireland And American
College Dublin
Adapt Centre AI Labs (Ireland), EagleBurgmann
(Germany), Siemens AG (Germany) and
Consulting in start-up & TSYS (India)
● Research Interest
Information retrieval , Information seeking
behaviour, Chatbots, Machine learning, Deep
Learning and Conversational Information
retrieval
● Supervision
Exchange Interns (France), ICT and Masters
Dissertation
MY DBS PROFILE
● Subjects
Big Data Visualisation
Research Methods in Computing
Research Methods in FinTech
Machine Learning
Research methods Anaytics
● 7 Students represented DBS in IPRC conference under my supervision.
● Supervision (Master Dissertation & ICT)
50 (Completed) + 5 (In Progress)
● Paper Publication
4 Published
1 Accepted
2 In Progress
Paper Published
List of Paper Published (with DBS Students)

● Kaur, G., Kaushik, A. and Sharma, S., 2019. Cooking Is Creating Emotion: A
Study on Hinglish Sentiments of Youtube Cookery Channels Using
Semi-Supervised Approach. Big Data and Cognitive Computing, 3(3), p.37.
● Das, J., Sharma, S. and Kaushik, A., 2019. Views of Irish Farmers on Smart
Farming Technologies: An Observational Study. AgriEngineering, 1(2),
pp.164-187.
● Nair, S., Kaushik, A. and Dhoot, H., 2019. Conceptual framework of a
skill-based interactive employee engaging system: In the Context of
Upskilling the present IT organization. Applied Computing and Informatics.
● Ajumi, O., & Kaushik, A.,2019. Exchange Rates Prediction via Deep Learning
and Machine Learning : A Literature Survey on Currency Forecasting.
● Sentiment Analysis on Google Play Store Data using Deep Learning Accepted
in Springer (2019)
Evaluation
• Two assignments (30% each)
– Handed out on weeks 4 and 8
– Due two weeks later
– Main Exam (40%)
– Mix of:
• Implementing machine learning algorithms
• Applying them to real datasets
• Exercises
Source Materials

● Material provided my me
● Material provided in The class by Online Instructor
● Bishop, Christopher M. Pattern recognition and machine learning.
springer, 2006.
● Witten, Ian H., et al. Data Mining: Practical machine learning tools and
techniques. Morgan Kaufmann, 2016.
● Zhang, Cha, and Yunqian Ma, eds. Ensemble machine learning:
methods and applications. Springer Science & Business Media, 2012.
● Brownlee, Jason. "Machine learning mastery." URL:
http://machinelearningmastery.
com/discover-feature-engineering-howtoengineer-features-and-how-to-
getgood-at-it (2014).
A Few Quotes
DATA INCORPORATE AI
● Artificial Intelligence (AI)
Reproducing human intelligence
in machines, especially computer
systems through learning ,
reasoning and self-correction

● Machine Learning (ML)


Machine learning(ML) is a set of
statistical tools to learn from
data.
e.g. Model = Algorithm (Data)
------------(1) Source: Facebook Developer
Circles Lagos
Output = Model (New Data)
----------(2)
● Deep learning (DL)
Data goes through multiple number
of non-linear transformations to
obtain an output
● Data Science (DS)
Data science has an intersection
with artificial intelligence but
is not a subset of artificial
intelligence. Processing data,
analyzing and visualizing this
data, so as to make meaning out of
it for business strategies.

This Photo by Unknown Author is licensed


under CC BY-SA

DATA INCORPORATE AI
Data Computer Output
Program

Data
Computer Program
Output
Magic?





Sample Applications
ML in a Nutshell
Representation

• .
Evaluation
Optimization
Types of Learning
Inductive Learning


What We’may ll Cover*

* it may varies time to time depending upon the external


factors
MACHINE LEARNING (1)
● Machine learning is subdivided into three major parts
● Supervised
All data is labelled and the algorithm need to predict
the output from the input data such as Regression and
Classification
● Unsupervised
All data is unlabelled and the algorithm learns to
inherit structure from the input data such as
clustering and Associations
● Semi-Supervised
Some data is labelled but most of them are unlabelled
and a mixture of supervised and unsupervised
techniques can be used
MACHINE LEARNING (2) Featur Output/labelle
es d data
● Features vectors
or attribute
● Output value or
predictions or
Labelled data

This Photo by Unknown Author is licensed


under CC BY-SA
ML in Practice
DATA ANALYSIS TECHNIQUES (1)

There are major six data analysis techniques


● Descriptive
Describes a set of data
● Exploratory
An approach to analyzing data sets
to find previously unknown relationships.
● Inferential
Aims to test theories about the nature of
the world in general
DATA ANALYSIS TECHNIQUES (2)

● Predictive
Analyze current and historical facts to
make predictions about future events
● Causal
To find out what happens to one variable
when you change another.
● Mechanistic
Understand the exact changes in variables
that lead to changes in other variables for
individual objects.
MODELS
● Machine learning Models` are Parametric and Non Parametric
● Parametric Models
It summarizes the data with the set variables of fixed
size
Independent of number of training example
Y = MX + C ------------(3) where X is
Input variable, Y is Output predicted and C is Bias
Such as Logistic regression and Perceptron
● Non-Parametric Models
Don’t make the strong assumptions about mapping the
functions
Free to form any functional form
Such as Decision Tree and Support Vector Machine
● Benefits
Simpler (easier to
understand)
Speed (fast in Processing)

BENEFITS Less data (require less


data for training)
AND ● Limitations

LIMITATIONS Constrained functional form


(limited to specific
(PARAMETRIC functional form)

)
Limited Complexity (method
are more suited to simpler
problem)
Poor fit (In practise the
methods are unlikely to
match the mapping
functions)
● Benefits
Flexible (capable into
fitting into large data
set),

BENEFITS AND Power (no assumption


needed)

LIMITATIONS Performance (higher


performance on complex
(NON-PARAME model)
● Limitations
TRIC) More data,
Slower
Overfitting
● 200 year old method
● Model with Linear relationship
with input and predicted
variables
● Y = B0 + B1* x -------------(4)
where Y is predicted value, B0,
B1 are coefficient and x is
input variable or plane
● Linear Transformation
● Remove Noise
● Remove Collinearity
● Gaussian Distributions
This Photo by Unknown Author is
● Rescale inputs licensed under CC BY

LINEAR REGRESSION (PARAMETRIC)


CART OR DECISION TREE
(NON-PARAMETRIC)
● Know as
Classification
and Regression
tree
● Introduced by
Leo Breiman
● Ginni Index
method to split
● Greedy methods
● Pruning effect
This Photo by Unknown Author is licensed
under CC BY-SA
This Photo by Unknown Author is licensed
under CC BY-SA

WORKFLOW OF MACHINE LEARNING


ERRORS (1)

● Bias Error
Assumptions made by the Model to make
the target function easier to learn
● Variance Error
It is the amount to estimate the target
function with change in different
training data
● Irreducible Error
It can’t be reduce regardless of what
algorithm is used such as error caused
by unknow variables
ERRORS (2)

● Overfitting
Training data learn well but testing
data predict poorly
More with Non-parametric Algorithm
Remedy is to features selections
Cross Validation and Hold back
Validation dataset
● Underfitting
Failing to learn from the train data
Remedy is to try alternate algorithm
MY ML FLOWCHART
Text Cleaning
Splitting the Data (70% Training data and 30% testing data)
Implement Cross validation on training data using multiple algorithms
Variations in Parameters to study the effect of bias and variance
Choose the best classifier or regression model
Retrain the Model on 70% data
Validation test on testing data
Identify the underfitting and overfitting
Retrain the model on whole data set
Save the Model and build the API over it
Classification Accuracy

Logarithmic Loss

Confusion Matrix

Area under Curve


EVALUATION
F1 Score

Mean Absolute Error

Mean Squared Error


APPLICATION
● Chatbots
● Facial recognition
● Image tagging
This Photo by Unknown Author is licensed
under CC BY-NC

● Machine translation
● Sales prediction
● Self-driving cars
● Sentiment analysis
● Data is very powerful
● Patterns talk about the
personality
● ML and DL is having high
potential

CONCLUSION ● Understanding the Data and


Algorithm is the state of
Art
● Big Data needs ML
● Exploratory data Analysis
is a must for ML Scientist
THANK
YOU!!!!!!!!!
QUESTIONS?
APPENDIX
EVALUATION (1)
● Classification Accuracy

● Log Loss

● Confusion metrics
EVALUATION (2)
● Area under Curve

● F1 Score
EVALUATION (3)

You might also like