0% found this document useful (0 votes)
88 views52 pages

Mla Aiml

This document provides information about a machine learning course taught by Dr. Srinivasa L. Chakravarthy at GITAM Institute of Technology including: - The course objectives which are to understand key machine learning paradigms, algorithms, and approaches like supervised learning and neural networks. - The syllabus is divided into 5 modules covering topics like linear algebra, linear and logistic regression, classification, and neural networks. - Examples are provided to illustrate machine learning concepts like learning functions from data, classification, and adapting to changes. - The different types of machine learning systems are categorized by supervision, online vs batch learning, and instance-based vs model-based learning.

Uploaded by

Sampat Kandukuri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views52 pages

Mla Aiml

This document provides information about a machine learning course taught by Dr. Srinivasa L. Chakravarthy at GITAM Institute of Technology including: - The course objectives which are to understand key machine learning paradigms, algorithms, and approaches like supervised learning and neural networks. - The syllabus is divided into 5 modules covering topics like linear algebra, linear and logistic regression, classification, and neural networks. - Examples are provided to illustrate machine learning concepts like learning functions from data, classification, and adapting to changes. - The different types of machine learning systems are categorized by supervision, online vs batch learning, and instance-based vs model-based learning.

Uploaded by

Sampat Kandukuri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Day & Time: Monday (8AM-9AM)

Thursday(3PM-4PM)
Friday (2PM - 3PM)
Monday ( 2PM - 4PM) - LAB

Dr. Srinivasa L. Chakravarthy


Department of CSE
GITAM Institute of Technology (GIT)
Visakhapatnam – 530045
Email: [email protected]
Contact No: 9866656855
20 August 2020 Department of CSE, GIT EID 403 and machine learning 1
Course objectives

1. To understand various key paradigms for machine learning approaches


2. Familiarize with mathematical relationships across various machine
learning algorithms
3. To understand various key approaches in supervised learning.
4. To analyse ML algorithms with performance metrics
5. To understand the concept of the neural network

20 August 2020 Department of CSE, GIT EID 403 and machine learning 2
Syllabus
Module I: Number of hours (LTP) 9 0 6
Machine Learning Fundamentals: Use of Machine Learning, Types of machine learning systems,
machine learning challenges, testing and validating, working with real data, obtaining the data,
visualizing the data, data preparation, training and fine tuning the model.

Module II: Number of hours (LTP) 9 0 6


Linear Algebra: Scalars, Vectors, Matrices and Tensors Multiplying Matrices and Vectors, Identity
and Inverse Matrices, Linear Dependence and Span, Norms, The Trace Operator, The Determinant.

Module III: Number of hours(LTP) 9 0 6


Prediction using Linear Regression, Gradient Descent, Linear Regression with one Variable, Linear
Regression with Multiple Variables, Polynomial Regression, Feature Scaling/Selection.
learning curves, regularized linear models.

20 August 2020 Department of CSE, GIT EID 403 and machine learning 3
Syllabus
Module IV: Number of hours (LTP) 9 0 6
Classification, training a binary classifier, performance measures, multiclass classification, error
analysis, multi label classification, multi output classification. Logistic Regression: Classification
using Logistic Regression, Logistic Regression vs. Linear Regression, Logistic Regression with one
Variable and with Multiple Variables

Module V: Number of hours(LTP) 9 0 6


Neural Networks: Introduction, Model Representation, Gradient Descent vs. Perceptron Training,
Stochastic Gradient Descent, Multilayer Perceptrons, Multiclass Representation, Back Propagation
Algorithm.

20 August 2020 Department of CSE, GIT EID 403 and machine learning 4
Text Book -1

20 August 2020 Department of CSE, GIT EID 403 and machine learning 5
Text Books and Reference books
.
1. T2: Ian Goodfellow, Yoshua Bengio, Aaron Courville,Deep learning, MIT press, 2016
2. T3: Tom M. Mitchell, “Machine Learning” First Edition by Tata McGraw- Hill Education
3. R1: Ethem Alpaydin,”Introduction to Machine Learning ” 2nd Edition, The MIT Press, 2009
4. R2: Christopher M. Bishop, “Pattern Recognition and Machine Learning” By Springer, 2007.
5. R3: Mevi P. Murphy, “Machine Learning: A Probabilistic Perspective” by The MIT Press,
2012

20 August 2020 Department of CSE, GIT EID 403 and machine learning 6
Learning theory - Thorndike

Structure-

7
Learning With Example

X = {0,1,2,3,4,5,6,7,8,9}
Y = {0,2,4,6,8,10,12,14,16,18}

Learnable (HYPOTHESIS) Function is:

Y = 2X

8
Learning With Example 2

X = {0,1,2,3,4,5,6,7, 8, 9}

Y = {-1, 1, 3, 5, 7, 9, ?, 13, ?, 17 }

What could be values in the place of ‘?’

Learnable (HYPOTHESIS) Function is:

Y = 2X-1
Predicted (Interpolated) Values are : 11,15

Extrapolated Values can also be calculated

9
Learning With Example 3

X = {0,1,2,3,4,5,6,7, 8, 9}

Y = {0, 1, 2, 0, 1, 2, 0, 1, 2, 0}

Then the number 97 is mapped to ?

Learnable (HYPOTHESIS) Function is:

Y = X mod 3
Number of Classes and their labels are : 3; 0, 1, 2,

‘97’ can be classified in LABEL 1

10
Traditional Approach

11
Machine Learning Approach

12
Automatically Adapting to the change

13
ML can help humans learn

14
Summarized as -
● Problems for which existing solutions require a lot of hand-tuning or long
lists of rules: one Machine Learning algorithm can often simplify code and
perform better.

● Complex problems for which there is no good solution at all using a


traditional approach: the best Machine Learning techniques can find a
solution.

● Fluctuating environments: a Machine Learning system can adapt to new data.

● Getting insights about complex problems and large amounts of data


20 August 2020 Department of CSE, GIT EID 403 and machine learning 15
Types of ML systems -
Many are there but - categorized by using the following:

● Whether or not they are trained with human supervision


● Whether or not they can learn incrementally on the fly
● Whether comparing new data points to known data points or detect patterns
in the training data and build a predictive model

20 August 2020 Department of CSE, GIT EID 403 and machine learning 16
Supervised Learning Systems -
A Labelled Training Regression

20 August 2020 Department of CSE, GIT EID 403 and machine learning 17
Supervised Learning Systems -
Some Important Algorithms:

• k-Nearest Neighbors
• Linear Regression
• Logistic Regression
• Support Vector Machines (SVMs)
• Decision Trees and Random Forests
• Neural networks

20 August 2020 Department of CSE, GIT EID 403 and machine learning 18
Un-Supervised Learning Systems -

20 August 2020 Department of CSE, GIT EID 403 and machine learning 19
Un-Supervised Learning Systems -
Some Important Algorithms:
● Clustering
● K-means
● DBSCAN
● Hirerchical Cluster Analysis
● Anomaly Detection
● one-class SVM
● Isolation Forest
● Auto-encoders
20 August 2020 Department of CSE, GIT EID 403 and machine learning 20
Un-Supervised Learning Systems -
Some Important Algorithms:
● Dimensionality Reduction: where the goal is to compress the data without
losing too much information (one way to do it is to merge highly correlated
features)
● Principal Component Analysis: PCA
● t-distributed stochastic Neighbor Embedding: T-SNE
● Autoencoders
● Kernel PCA
● Local Linear Embedding (LLE)
20 August 2020 Department of CSE, GIT EID 403 and machine learning 21
Un-Supervised Learning Systems -
Some Important Algorithms:
● Association rule learning algorithms find interesting relations between
attributes
● Apriori
● Eclat

20 August 2020 Department of CSE, GIT EID 403 and machine learning 22
Un-Supervised Learning Systems -
Some Important Algorithms:
● Association rule learning algorithms find interesting relations between
attributes
● Apriori
● Eclat

20 August 2020 Department of CSE, GIT EID 403 and machine learning 23
Un-Supervised Learning Systems -
Anomaly Detection :

20 August 2020 Department of CSE, GIT EID 403 and machine learning 24
Other Learning Systems -
Semi-Supervised Learning :
-partially labeled training data, usually a lot of unlabeled data and a little bit
of labeled data
Reinforcement Learning:
-select and perform actions, and get rewards in return.
-learn by itself what is the best strategy

20 August 2020 Department of CSE, GIT EID 403 and machine learning 25
Other Learning Systems -
Semi-Supervised Learning :
-partially labeled training data, usually a lot of unlabeled data and a little bit
of labeled data
Reinforcement Learning:
-select and perform actions, and get rewards in return.
-learn by itself what is the best strategy
-DeepMind’s AlphaGo program

20 August 2020 Department of CSE, GIT EID 403 and machine learning 26
Batch & Online Learning -
● In batch learning, the model is incapable of incremental learning.

● to know about new data need to train a new version of the system from
scratch on the full dataset then stop the old system and replace it

● It starts by learning from all of the available data offline, and then gets
deployed to produce predictions without feeding it any new data points.

20 August 2020 Department of CSE, GIT EID 403 and machine learning 27
● Another name of batch learning is Offline Learning.
Batch & Online Learning -
● Train the data incrementally by continuously feeding it data instances as they come.

● mini-batches: either individually or in small groups

of instances.

● Each learning step is fast and cheap,

so the system can learn as data comes, on the fly.

● Online learning is great for systems that receive data in a continuous flow.

20 August 2020 Department of CSE, GIT EID 403 and machine learning 28
Instance-Based Vs Model-Based -
● Machine Learning systems is by how they generalize.
● able to generalize to examples it has never seen before
● two main approaches to generalization: instance-based learning and model-based learning.

20 August 2020 Department of CSE, GIT EID 403 and machine learning 29
Instance-Based
● the system learns the examples by heart
● generalizes to new cases by
comparing them to the learned examples
● using a similarity measure

20 August 2020 Department of CSE, GIT EID 403 and machine learning 30
Model-Based
● from a set of examples - to be build a model of these examples
● then use that model to make predictions

20 August 2020 Department of CSE, GIT EID 403 and machine learning 31
Let’s Go to JUPYTER NOTEBOOK
http://localhost:8888/lab/tree/Desktop/handson-ml2-master/handson-ml2-master/01_the_machine_learning_landscape.ipynb

20 August 2020 Department of CSE, GIT EID 403 and machine learning 32
In summary - ML programs
● typical Machine Learning project looks like
• You studied the data.
• You selected a model.
• You trained it on the training data (i.e., the learning algorithm searched for
the model parameter values that minimize a cost function).
• Finally, you applied the model to make predictions on new cases (this is
called inference), hoping that this model will generalize well.

20 August 2020 Department of CSE, GIT EID 403 and machine learning 33
Challenges of Machine Learning
BAD DATA:
● Insufficient Quantity of Training Data
- The Unreasonable Effectiveness of Data
● Nonrepresentative Training Data
● Poor Quality data - full of errors, outliers, and noise due to poor quality
measurements
- Cleaning up the data
● Irrelevant Features - coming up with a good set of features to train on
● Feature engineering process
- Feature selection: selecting the most useful features,
- Feature extraction: combining existing features,
- Creating new features by gathering new data
20 August 2020 Department of CSE, GIT EID 403 and machine learning 34
Challenges of Machine Learning

20 August 2020 Department of CSE, GIT EID 403 and machine learning 35
Challenges of Machine Learning
BAD algorithms:
● Over fitting: model performs well on the training data, but it does not
generalize well -
- the model is too complex relative to the amount and noisiness of the
training data.
● To simplify the model by selecting one with fewer parameters (e.g., a
linear model rather than a high-degree polynomial model), by reducing
the number of attributes in the training data or by constraining the model
● To gather more training data
● To reduce the noise in the training data (e.g., fix data errors and remove
outliers
20 August 2020 Department of CSE, GIT EID 403 and machine learning 36
Challenges of Machine Learning

Regularization:
- reducing degrees of freedom
- tuning hyperparameters

20 August 2020 Department of CSE, GIT EID 403 and machine learning 37
Challenges of Machine Learning
- The objective here is to know what they are (Underfitting and Overfitting), how they degrade model performance, and
finally, look at the concepts by which they can be properly managed.
- Let us first familiarize ourselves with the following.
- •Noise: Noise is distorted relevant features that reduces the performance of the model.
- •Bias: It is a result of oversimplification of the model. It is usually indicated by high training errors and high testing
errors. The reason is oversimplified model will not be able to capture true patterns / relationships between input data
and output data well due to insufficient consideration of input features or training is prematurely stopped. Example:
Linear Regression. Thus, High bias leads to Underfitting.
- •Variance: It is a result of making the model that works fine with training data but faults terribly on test data. It is
usually indicated by lower training errors but high test errors. It is actually the difference between training and testing
results. Example: k Nearest Neighbor (kNN), Decision Trees. Thus, High variance leads to Overfitting.
- •Both bias and variance lead to high prediction errors. They are diagrammatically shown in the figures given in next
two slides.

20 August 2020 Department of CSE, GIT EID 403 and machine learning 38
Challenges of Machine Learning
-

20 August 2020 Department of CSE, GIT EID 403 and machine learning 39
Challenges of Machine Learning
-

20 August 2020 Department of CSE, GIT EID 403 and machine learning 40
Challenges of Machine Learning
- K - Fold Cross validation

20 August 2020 Department of CSE, GIT EID 403 and machine learning 41
Challenges of Machine Learning
BAD algorithms:
● Under fitting: - when your model is too simple to learn the underlying
structure
- reality is just more complex than the model
- predictions are bound to be inaccurate

● How to Fix:
- Selecting a more powerful model, with more parameters
- Feeding better features to the learning algorithm (feature engineering)
- Reducing the constraints on the model (e.g., reducing the regularization
hyperparameter)
20 August 2020 Department of CSE, GIT EID 403 and machine learning 42
Takeaways -
● Machine Learning is about making machines get better at some task by learning
from data, instead of having to explicitly code rules.
● There are many different types of ML systems: supervised or not, batch or
online, instance-based or model-based, and so on.
● In a ML project you gather data in a training set and feed it to a learning
algorithm.
- In model-based it tunes some parameters - to make good predictions
- In instance-based - learns the examples by heart - generalizes using a similarity
measure.
● The system will not perform well if your training set is too small, or if the data is
not representative, noisy, or polluted.
- model needs to be neither too simple (underfit) nor too complex (overfit).
20 August 2020 Department of CSE, GIT EID 403 and machine learning 43
TESTING and VALIDATION
Training set and Testing set - 80 : 20
Training error and Generalization error
- Overfitting : training error is low and generalization error is high
- Choice of hyperparameter
- Practical vs real time
- generalization error is high in real production scenarios. why?
Validation - testing on a specific data set
Cross validation - validation on multiple chunks
Data Mismatch - validation set and the test must be as representative as possible

20 August 2020 Department of CSE, GIT EID 403 and machine learning 44
ML - Definition
● Machine Learning is the field of study that gives computers the ability
to learn without being explicitly programmed —Arthur Samuel, 1959

● A computer program is said to learn from experience E with respect to


some task T and some performance measure P, if its performance on T,
as measured by P, improves with experience E.—Tom Mitchell, 1997

20 August 2020 Department of CSE, GIT EID 403 and machine learning 45
ML - Project
● Get the data.
● Discover and visualize the data to gain insights.
● Prepare the data for Machine Learning algorithms.
● Select a model and train it.
● Fine-tune your model.
● Present your solution.
● Launch, monitor, and maintain your system.

20 August 2020 Department of CSE, GIT EID 403 and machine learning 46
ML - Project
● Frame the problem
○ what is business objective
○ end goal - not to build a model but how does organization get helped
○ End goal confers how do we frame the problem, what algorithms used, what
performance measures used and how much effort to put
○ Ex: A Machine Learning pipeline for real estate investments

20 August 2020 Department of CSE, GIT EID 403 and machine learning 47
ML - Project
● Select Performance Measure
- a typical performance measure for regression problems is the Root
Mean Square Error (RMSE) -

- another one is Mean absolute error (MAE) -


- l - norm distance

20 August 2020 Department of CSE, GIT EID 403 and machine learning 48
ML - Project
- Download the data ( github, kaggle etc…)
- Quick look at the Data structure (df.head(), df.describe(), df. hist())
- Create test set - stratified split, sample bias
- Insights through visualization - Correlation

20 August 2020 Department of CSE, GIT EID 403 and machine learning 49
ML - Project
- Prepare the data for ML algorithms
- Write functions instead doing manually
- This will allow you to reproduce these transformations easily on any dataset
- You will gradually build a library of transformation functions that you can reuse in
future projects.
- You can use these functions in your live system to transform the new data before
feeding it to your algorithms.
- This will make it possible for you to easily try various transformations and see
which combination of transformations works best.

20 August 2020 Department of CSE, GIT EID 403 and machine learning 50
ML - Project
- Data Cleaning
- three options
- Get rid of the rows
- Get rid of the attribute
- set values to some possible values ( Mean,median, zero etc)
- Categorical - Ordinal encoding, One-Hot encoding,
- Feature Scaling - MinMax (Normalization), Standardization
- Transformation of Pipeline

20 August 2020 Department of CSE, GIT EID 403 and machine learning 51
ML - Project
- Training Model
- Sample training with cross validation
-

20 August 2020 Department of CSE, GIT EID 403 and machine learning 52

You might also like