Intro ML 1 Day
Intro ML 1 Day
● Application
● Conclusion
ABOUT ME
● Qualification
PhD track (Ireland), M.Sc. (Germany) and
Bachelor of Technology (India)
● Teaching Experience & Industrial Experience
Dublin Business School
Dublin City University And TU Dublin
National College of Ireland And American
College Dublin
Adapt Centre AI Labs (Ireland), EagleBurgmann
(Germany), Siemens AG (Germany) and
Consulting in start-up & TSYS (India)
● Research Interest
Information retrieval , Information seeking
behaviour, Chatbots, Machine learning, Deep
Learning and Conversational Information
retrieval
● Supervision
Exchange Interns (France), ICT and Masters
Dissertation
MY DBS PROFILE
● Subjects
Big Data Visualisation
Research Methods in Computing
Research Methods in FinTech
Machine Learning
Research methods Anaytics
● 7 Students represented DBS in IPRC conference under my supervision.
● Supervision (Master Dissertation & ICT)
50 (Completed) + 5 (In Progress)
● Paper Publication
4 Published
1 Accepted
2 In Progress
Paper Published
List of Paper Published (with DBS Students)
● Kaur, G., Kaushik, A. and Sharma, S., 2019. Cooking Is Creating Emotion: A
Study on Hinglish Sentiments of Youtube Cookery Channels Using
Semi-Supervised Approach. Big Data and Cognitive Computing, 3(3), p.37.
● Das, J., Sharma, S. and Kaushik, A., 2019. Views of Irish Farmers on Smart
Farming Technologies: An Observational Study. AgriEngineering, 1(2),
pp.164-187.
● Nair, S., Kaushik, A. and Dhoot, H., 2019. Conceptual framework of a
skill-based interactive employee engaging system: In the Context of
Upskilling the present IT organization. Applied Computing and Informatics.
● Ajumi, O., & Kaushik, A.,2019. Exchange Rates Prediction via Deep Learning
and Machine Learning : A Literature Survey on Currency Forecasting.
● Sentiment Analysis on Google Play Store Data using Deep Learning Accepted
in Springer (2019)
Evaluation
• Two assignments (30% each)
– Handed out on weeks 4 and 8
– Due two weeks later
– Main Exam (40%)
– Mix of:
• Implementing machine learning algorithms
• Applying them to real datasets
• Exercises
Source Materials
● Material provided my me
● Material provided in The class by Online Instructor
● Bishop, Christopher M. Pattern recognition and machine learning.
springer, 2006.
● Witten, Ian H., et al. Data Mining: Practical machine learning tools and
techniques. Morgan Kaufmann, 2016.
● Zhang, Cha, and Yunqian Ma, eds. Ensemble machine learning:
methods and applications. Springer Science & Business Media, 2012.
● Brownlee, Jason. "Machine learning mastery." URL:
http://machinelearningmastery.
com/discover-feature-engineering-howtoengineer-features-and-how-to-
getgood-at-it (2014).
A Few Quotes
DATA INCORPORATE AI
● Artificial Intelligence (AI)
Reproducing human intelligence
in machines, especially computer
systems through learning ,
reasoning and self-correction
DATA INCORPORATE AI
Data Computer Output
Program
Data
Computer Program
Output
Magic?
•
•
•
•
Sample Applications
ML in a Nutshell
Representation
• .
Evaluation
Optimization
Types of Learning
Inductive Learning
•
•
What We’may ll Cover*
● Predictive
Analyze current and historical facts to
make predictions about future events
● Causal
To find out what happens to one variable
when you change another.
● Mechanistic
Understand the exact changes in variables
that lead to changes in other variables for
individual objects.
MODELS
● Machine learning Models` are Parametric and Non Parametric
● Parametric Models
It summarizes the data with the set variables of fixed
size
Independent of number of training example
Y = MX + C ------------(3) where X is
Input variable, Y is Output predicted and C is Bias
Such as Logistic regression and Perceptron
● Non-Parametric Models
Don’t make the strong assumptions about mapping the
functions
Free to form any functional form
Such as Decision Tree and Support Vector Machine
● Benefits
Simpler (easier to
understand)
Speed (fast in Processing)
)
Limited Complexity (method
are more suited to simpler
problem)
Poor fit (In practise the
methods are unlikely to
match the mapping
functions)
● Benefits
Flexible (capable into
fitting into large data
set),
● Bias Error
Assumptions made by the Model to make
the target function easier to learn
● Variance Error
It is the amount to estimate the target
function with change in different
training data
● Irreducible Error
It can’t be reduce regardless of what
algorithm is used such as error caused
by unknow variables
ERRORS (2)
● Overfitting
Training data learn well but testing
data predict poorly
More with Non-parametric Algorithm
Remedy is to features selections
Cross Validation and Hold back
Validation dataset
● Underfitting
Failing to learn from the train data
Remedy is to try alternate algorithm
MY ML FLOWCHART
Text Cleaning
Splitting the Data (70% Training data and 30% testing data)
Implement Cross validation on training data using multiple algorithms
Variations in Parameters to study the effect of bias and variance
Choose the best classifier or regression model
Retrain the Model on 70% data
Validation test on testing data
Identify the underfitting and overfitting
Retrain the model on whole data set
Save the Model and build the API over it
Classification Accuracy
Logarithmic Loss
Confusion Matrix
● Machine translation
● Sales prediction
● Self-driving cars
● Sentiment analysis
● Data is very powerful
● Patterns talk about the
personality
● ML and DL is having high
potential
● Log Loss
● Confusion metrics
EVALUATION (2)
● Area under Curve
● F1 Score
EVALUATION (3)