0% found this document useful (0 votes)
168 views

Machine Learning Engineering

Uploaded by

syuvraj889
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
168 views

Machine Learning Engineering

Uploaded by

syuvraj889
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 80

Machine Learning Engineering

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


Head Dept of CSE
SDMIT Ujire

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


Agenda
1. What is Machine Learning ?
2. Machine Learning Engineer
3. Skills required to become MLE
4. Lifecycle and Architecture of Machine Learning
5. Machine Learning Algorithms - Types
6. Machine Learning Tools
7. Open Source and Commercial Machine Learning Tools
8. Introduction to Scikit Learn
9. Deep Learning

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


1. What is Machine Learning ?

Machine learning (ML) — a subset of


• Study of algorithms that artificial intelligence (AI) — is more than a
• Improve their performance P technique for analyzing data. It's a system
• At some task T that is fueled by data, with the ability to
• With Experience E learn and improve by using algorithms that
• Well-defined learning task<P,T,E> provide new insights without being
explicitly programmed to do so.

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


Figure 1. Learning From Data Without Being Explicitly Programmed

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


Figure 2. The Basics of Machine Learning Technology

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


Figure 4. Stages of the Machine Learning Process

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


2. Machine Learning Engineer
• Machine learning engineers are sophisticated programmers who
develop machines and systems that can learn and apply knowledge
without specific direction.
• Artificial intelligence is the goal of a machine learning engineer.
• An example of a system a machine learning engineer would work on
is a self-driving car.

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


2 Machine Learning Engineers

• Machine learning engineers develop programs that control robots


and computers.
• The algorithms they create allow a machine to find patterns in its own
programming data, teaching it to understand commands and even
think for itself.
• The artificial intelligences seen in automatic vacuums and self-driving
cars are the 'thought children' of these engineers.
• Job Skills : Computer programming skills, strong mathematical skills,
knowledge of cloud applications and computer languages, excellent
communication skills

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


3 Job responsibilities of a machine learning
engineer include:
• Researching new technologies and implementing them in machine
learning programs
• Finding the best design and hardware to use when building the robot
or computer
• Developing tangible prototypes to show stakeholders
• Putting the machines through various tests to ensure they function as
planned

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


3. Skills required for MLE
1. Math Skills
2. Programming Skills
3. Data Engineering Skills
4. Knowledge of Machine Learning Algorithms
5. Knowledge of Machine Learning Frameworks

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


3.1. Math Skills
• Probability and Statistics : Descriptive Statistics , Bayes Rule and
Random Variables , Probability Distributions , Sampling Hypothesis ,
Testing Regression and Decision Analysis.

• Linear Algebra : Matrices , Vector Spaces

• Calculus : Basics of Differential and Integral calculus

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


3.2. Programming Skills
• Coding Skills ,Algorithms , Data Structure and OOPS Concepts
• Languages like Python , R , Java and C (Master of any one or two)

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


3.3. Data Engineering Skills
• Ability to work with large amount of Data , Data Preprocessing ,
Knowledge of SQL an No SQL .
• ETL (Extract , Transform and Load ) Operations
• Data Analysis and Visualization Skills

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


3.4. Knowledge of Machine Learning Algorithms
• Shallow and Deep Learning
• Supervised
• Unsupervised
• Semi Supervised
• Reinforcement

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


3.5. Knowledge of Machine Learning Frameworks
• Familiar with popular machine learning Frameworks such as
• SCIKIT Learn,
• TensorFlow ,
• Azure ,
• Caffe ,
• Theano,
• Spark and
• Torch

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


4.Machine Learning Life Cycle and Architecture

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


Architecture

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


Terms Frequently Used

• Labeled data: Data consisting of a set of training examples, where


each example is a pair consisting of an input and a desired output
value (also called the supervisory signal, labels, etc)
• Classification: The goal is to predict discrete values, e.g. {1,0}, {True,
False}, {spam, not spam}.
• Regression: The goal is to predict continuous values, e.g. home
prices.

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


5. Types of Machine Learning Algorithms

1. Based on depth of Learning


2. Based on type of learning

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


5.1 Types of algorithms based on level of learning

1. Shallow Learning
• Algorithms with Few Layers
• Better for Less Complex and Smaller Data sets
• Eg: Logistic Regression and Support vector Machines
2. Deep Learning
• New technique that uses many layers of neural network ( a model based on
the structure of human brain)
• Useful when the target function is very complex and data sets are very large.

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


5.2 Based on Type of Learning
1. Supervised Learning
• X and Y
• Given an observation X what is the best label for Y
2. Unsupervised Learning
• X
• Given a set of X cluster or summarize them

3. Semi Supervised Learning

4. Reinforcement Learning
• Determine what to do based on Rewards and punishments

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


5.2.1. Supervised Learning

• Here the training (labeled) data set containing input/predictors and output will be fed to
the machine. The machine with the help of algorithm analyze the data set and generates
a suitable model /function that best describes the input data. i.e it generates a function
f(X) which makes best estimation of the output X for given X.

• The generated model / function can be used to predict the output values for new data
based on those relationships which it learned from the previous data sets.
• When Y is discrete (True /False, ..) – Classification
• When X is continuous (Real Numbers )- Regression

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್
Types of Supervised Learning
(Task Driven . Develop Prediction Model based on Input and Output Data)

1. Classification ( Discrete) 2. Regression ( Continuous)


a) Logistic Regression a) Linear Regression
b) KNN b) SVR
c) Decision Trees c) GPR
d) Support Vector Machines d) Ensemble Methods
e) Naïve Bayesian
f) Discriminant Analysis
g) Random Forest
h) AdaBoost
i) Neural Networks

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


5.2.2: Unsupervised Learning

• This approach is data driven . The computer is trained with unlabeled


input data.
• These algorithms try to use techniques on the input data to mine for
rules, detect patterns, and summarize and group the data points
which help in deriving meaningful insights and describe the data
better to the users.

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್
Types of Unsupervised Learning
1. Clustering
• K Means Clustering
• Hierarchical Clustering
• Gaussian Mixture Models
• Genetic Algorithms
• Artificial Neural Networks
2. Dimensionality Reduction
• Tensor Decomposition
• Principal Component Analysis
• Multidimensional statistics
• Random Projection
• Artificial Neural Networks
3. Association Rules

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್
ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್
5.2.3. Semi Supervised

• In the previous two types, either there are no labels for all the
observation in the dataset or labels are present for all the
observations.
• Semi-supervised learning falls in between these two.

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್
Applications
• One real world application for semi-supervised learning, is webpage
classification. Say you want to classify any given webpage into one of
several categories (like "Educational", " Shopping", "Forum", etc.).
• Protein function prediction using sequence

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


5.2.4. Reinforcement Learning
• This method aims at using observations gathered from the interaction
with the environment to take actions that would maximize the reward
or minimize the risk.
• Reinforcement learning algorithm (called the agent) continuously
learns from the environment in an iterative fashion.
• In the process, the agent learns from its experiences of the
environment until it explores the full range of possible states.

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


5.2.4 Reinforcement Learning
In order to produce intelligent programs (also called agents),
reinforcement learning goes through the following steps:
1. Input state is observed by the agent.
2. Decision making function is used to make the agent perform an
action.
3. After the action is performed, the agent receives reward or
reinforcement from the environment.
4. The state-action pair information about the reward is stored.

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್
Use cases

• Some applications of the reinforcement learning algorithms are


computer played board games (Chess, Go), robotic hands, and self-
driving cars.
• Manufacturing : a robot uses deep reinforcement learning to pick a
device from one box and putting it in a container. Whether it
succeeds or fails, it memorizes the object and gains knowledge and
train’s itself to do this job with great speed and precision.

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


Types of Reinforcement Learning Algorithms

• Q-Learning
• Temporal Difference (TD)
• Deep Adversarial Networks

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


Table1: Examples of Types of Machine Learning Algorithms / Problem Solving Approaches
Type Model Descriptions Usage Examples in
/Algorithm or Business
Task
Supervised Neural network Computations are structured in terms of interconnected groups , ■ Predicting financial
much like the neurons in a brain. Neural networks are used to results
model complex relationships between inputs and outputs to find ■ Fraud detection
patterns in data or to capture a statistical structure among
variables with unknown relationships. They may also be used to
discover unknown inputs (unsupervised).
Supervised Classification Computations are structured in terms of categorized outputs or ■ Spam filtering
and/or observations based on defined classifications. Classification ■ Fraud detection
regression models are used to predict new outputs based on classification
rules. Regression models are generally used to predict outputs
from training data.
Supervised Decision tree Computations are particular representations of possible solutions ■ Risk assessment
to a decision based on certain conditions. Decision trees are ■ Threat management
great for building classification models because they can systems
decompose datasets into smaller, more manageable subsets. ■ Any optimization
problem where an
ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್
exhaustive search is
not feasible
Table1: Examples of Types of Machine Learning Algorithms / Problem Solving Approaches
Type Model Descriptions Usage Examples in
/Algorithm or Business
Task
Unsupervised Cluster analysis Computations are structured in terms of groups of input data ■ Financial transactions
(clusters) based on how similar they are to one another. Cluster ■ Streaming analytics in
analysis is heavily used to solve exploratory challenges where IoT
little is known about the data. ■ Underwriting in
insurance
Unsupervised Pattern Computations are used to provide a description or label to input ■ Spam detection
recognition data, such as in classification. Each input is evaluated and ■ Biometrics
matched based on a pattern identified. Pattern recognition can be ■ Identity management
used for supervised learning as well.
Unsupervised Association Computations are rule-based in order to ■ Security and intrusion
rule determine the relationship between different detection
learning types of input or variables and to make ■ Bioinformatics
predictions. ■Manufacturing and
Assembly

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


Figure : Programming vs. Learning

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್
6. Machine Learning Tools
• Tools are a big part of machine learning and choosing the right tool
can be as important as working with the best algorithms.
• Platforms versus Libraries
• Graphical User Interfaces versus Command-Line Interface versus
Application Programming Interfaces
• Local versus Remote

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


6.1 Platforms Versus Libraries :

• A platform provides all you need to run a project, whereas a library


only provides discrete capabilities or parts of what you need to
complete a project.

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


6.1.1 Machine Learning Platform
• A machine learning platform provides capabilities to complete a
machine learning project from beginning to end.
• Namely, some data analysis, data preparation, modeling and
algorithm evaluation and selection.
• Examples :
• WEKA Machine Learning Workbench.
• R Platform.
• Subset of the Python SciPy (e.g. Pandas and scikit-learn).

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


6.1.2 Machine Learning Library
• A machine learning library provides capabilities for completing part of
a machine learning project.
• For example a library may provide a collection of modeling
algorithms.

• Examples :
• scikit-learn in Python.
• JSAT in Java.
• Accord Framework in .NET

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


7. Open Source and Commercial Tools

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


Open Source Tools
1. Scikit Learn 11. OpenAI
2. Shogun 12. TensorFlow
3. Accord.NET Framework 13. Keras
4. Spark MLlib 14. Charnn
5. H20 15. PaddlePAddle
6. Coudera Oryx
16. CNTK
7. GoLearn
8. Weka 17. R
9. Deep Learn.js 18. Monte Carlo ML Library
10. ConvNet.Js 19. Octave Forge

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


Commercial Tools
1. Microsoft Azure Machine Learning
2. SAS Enterprise Miner
3. IBM SPSS Modeler
4. RapidMiner
5. Apache Mahout
6. MATLAB
7. Oracle Data Mining

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


8. Introduction to Scikit Learn –
A Python Machine Learning Library

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


Where did it Come From ?
• Scikit-learn was initially developed by David Cournapeau as a Google
summer of code project in 2007.
• Later Matthieu Brucher joined the project and started to use it as
apart of his thesis work. In 2010 INRIA got involved and the first
public release (v0.1 beta) was published in late January 2010.
• The project now has more than 30 active contributors and has had
paid sponsorship from INRIA, Google, Tinyclues and the Python
Software Foundation.

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


Scikit Learn : http://scikit-learn.org/stable/index.html

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


What is scikit-learn?
• Scikit-learn provides a range of supervised and unsupervised learning
algorithms via a consistent interface in Python.
• It is licensed under a permissive simplified BSD license and is
distributed under many Linux distributions, encouraging academic
and commercial use.
• The library is built upon the SciPy (Scientific Python) that must be
installed before you can use scikit-learn.

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


Scikit Stack
• NumPy: Base n-dimensional array package
• SciPy: Fundamental library for scientific computing
• Matplotlib: Comprehensive 2D/3D plotting
• IPython: Enhanced interactive console
• Sympy: Symbolic mathematics
• Pandas: Data structures and analysis

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


Some popular groups of models provided by
scikit-learn include:
• Clustering: for grouping unlabeled data such as KMeans.
• Cross Validation: for estimating the performance of supervised models on unseen data.
• Datasets: for test datasets and for generating datasets with specific properties for investigating
model behavior.
• Dimensionality Reduction: for reducing the number of attributes in data for summarization,
visualization and feature selection such as Principal component analysis.
• Ensemble methods: for combining the predictions of multiple supervised models.
• Feature extraction: for defining attributes in image and text data.
• Feature selection: for identifying meaningful attributes from which to create supervised models.
• Parameter Tuning: for getting the most out of supervised models.
• Manifold Learning: For summarizing and depicting complex multi-dimensional data.
• Supervised Models: a vast array not limited to generalized linear models, discriminate analysis,
naive bayes, lazy methods, neural networks, support vector machines and decision trees.

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


Example : Scikit learn Machine Learning
There are 5 key libraries that you will need to install. Below is a list of
the Python SciPy libraries required for this tutorial:
• scipy
• numpy
• matplotlib
• pandas
• sklearn

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


8.1. Check the versions of libraries

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್
8.2. Load The Data

• We are going to use the iris flowers dataset. This dataset is famous
because it is used as the “hello world” dataset in machine learning
and statistics by pretty much everyone.
• The dataset contains 150 observations of iris flowers. There are four
columns of measurements of the flowers in centimeters. The fifth
column is the species of the flower observed. All observed flowers
belong to one of three species.

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


8.2.1 Load Libraries

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


8.2.2 Load Dataset

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


8.3. Summarize the Dataset
• In this step we are going to take a look at the data a few different
ways:
1. Dimensions of the dataset.
2. Peek at the data itself.
3. Statistical summary of all attributes.
4. Breakdown of the data by the class variable

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


8.3.1 Dimensions of Dataset

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


8.3.2 Peek at the Data

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


8.3.3 Statistical Summary

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


8.3.4 Class Distribution

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


8.4. Data Visualization

We now have a basic idea about the data. We need to extend that with
some visualizations.

We are going to look at two types of plots:


1.Univariate plots to better understand each attribute.
2.Multivariate plots to better understand the relationships
between attributes.

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


8.4.1 Univariate Plots

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


Histograms

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


8.4.2 Multivariate
Plots

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


8.5. Evaluate Some Algorithms
• Here is what we are going to cover in this step:
• Separate out a validation dataset.
• Set-up the test harness to use 10-fold cross validation.
• Build 5 different models to predict species from flower measurements
• Select the best model.

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


8.5.1 Create a Validation Dataset
• We will split the loaded dataset into two, 80% of which we will use to
train our models and 20% that we will hold back as a validation
dataset

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


8.5.2 Test Harness
We will use 10-fold cross validation to estimate accuracy.
This will split our dataset into 10 parts, train on 9 and test on 1 and repeat for all
combinations of train-test splits.

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


8.5.3 Build Models
• Let’s evaluate 6 different algorithms:
• Logistic Regression (LR)
• Linear Discriminant Analysis (LDA)
• K-Nearest Neighbors (KNN).
• Classification and Regression Trees (CART).
• Gaussian Naive Bayes (NB).
• Support Vector Machines (SVM).

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್
8.5.4 Select Best Model

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


9. Introduction to Deep Learning

• Deep learning is a type of machine learning that is based on


algorithms with extensive connections or layers between inputs and
outputs.
• ML neural nets have inputs (variables), hidden layers (functions that
compute the output) and output (results).
• In a simple example, imagine an ML neural net to detect a dog. This
seemingly simple example may include a tail detector, ear detector,
hair detector and so on.
• These detectors are combined into layers that contribute to detecting
a dog. The more "detectors" you have, the deeper the neural net.
ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್
Deep Learning

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್
Deep Learning Tools
• Neural Designer • H2O.ai
• Torch • ConvNetjs
• Apache SINGA • DeepLEarningkit
• Microsoft Cognitive ToolKit • Gensim
• Keras • Caffe
• Deeplearning4j • ND4J
• Theano • DeepLearnToolBox
• MXNet

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್


Thank you

ಡಾ|| ತ್ಾಾಗರಾಜು ಜಿ.ಎಸ್

You might also like