0% found this document useful (0 votes)
3 views43 pages

Week11-AI ML DL

Uploaded by

bcynfn159
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views43 pages

Week11-AI ML DL

Uploaded by

bcynfn159
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 43

INTRODUCTION TO

ARTIFICIAL INTELLIGENCE,
MACHINE LEARNING AND
DEEP LEARNING
Preparatory Year Program -
Introduction to Digital Technologies (PYP 002)
Announcement

Quiz 03
 Week 12 (17 Nov- 21 Nov) After midterm break
 Chapter 04 from LAB Manual
 Algorithms and Python Programming
 Conditional Statements in Python

2
Announcement
Projects List available on the blackboard
 Send the preferred project by Nov 21st, 2024
 Week 14 – 15 for project presentations and code demo
 Sample report provided on blackboard
Project Requirements
Problem Definition  Define the problem addressed in the project
Data Analysis  Plot chart describing the relation between different
features of the data

 Describe the number of data points available in the


dataset for each class

 Plot charts showing the class balance of the data


Machine Learning Algorithm  Describe working of algorithms assigned to the problem
Performance Evaluation  Define different performance metrics required to
evaluate in the project for example accuracy, precision,
specificity, f-measure, recall/sensitivity, confusion matrix
Results and Discussion  Discuss the results acquired by running the ML
algorithms and compare the performance metrics.

3
Objectives

Introduction to AI
Machine Learning
Deep Learning
Applications
Supervised ML
 Classification
ML Pipeline

4
Artificial Intelligence

Artificial intelligence refers to


methods that let computers
make intelligent decisions.
AI performs task that require
human intelligence.
 Solving problems
 Understanding languages
 Visual recognition
 Make decisions / Give
recommendations
AI works by using data and
algorithms. It can improve its
performance over time.
5
Artificial intelligence

Sending emails

Google assistant / Siri (Listen/understand/process


language)

6
Applications

Image Segmentation and recognition

7
AI use cases

Finance Healthcare Retail Manufacturing Security

Product Automating
Fraud Detection Diagnosis Identify threats
Recommendations assembly

Defects
Treatment Facial recognition
identification

8
Relationship between AI, ML and DL

All the three learning models


are interconnected with each
other. ƌƚŝĨŝĐŝĂů/ŶƚĞůůŝŐĞŶĐĞ
 Artificial Intelligence (AI):
Broader concept of “making
machines smart.”
 Machine Learning: Current D ĂĐŚŝŶĞ>ĞĂƌŶŝŶŐ

application of AI that machines


learn from data using ĞĞƉ
mathematical statistical models. >ĞĂƌŶŝŶŐ
 Deep Learning: Using Neural
Networks to solve some hard
problems with automatic
feature selection.
9
Machine Learning

The field of study that gives computers the ability


to learn from data without being explicitly
programmed
Traditional approach
 Write rules for the system
 Evaluate performance
 Static approach
Machine Learning
 Learn from data
 Feed data to a ML algorithm for training
 Performance improves with more data it sees

10
Machine Learning (Example)

Spam Detection (Identify incoming email is spam


or not)
Traditional approach
 Write rules for the system
 If email contain specific words etc. “free”, “lottery”, “you
won” etc.

11
Machine Learning (Example)
Spam Detection (Identify incoming email is spam or not)
Machine Learning
 Learn from data
 Show spam and non-spam emails to algorithm (Past Data)
 Algorithm learns important features from the data (Training)
 Once trained, ML algorithm can predict new emails (Future Data)

12
Machine Learning (Example)
Training
 Feeding data to a ML algorithm
 Algorithm learns important features from the data
 Features of a spam email? (Count the number of times specific word
appears like Lottery, win, etc., )
 Features to recognize images? (Eye size, distance between eyes,
Nose size)
 Think of features as inputs
 Creates a trained model
Testing / Prediction
 Once trained, test the model on unseen data (New Data)
 Is the new email spam or not spam

13
Deep Learning

Subset of Machine Learning that uses Artificial


Neural Networks
Artificial Neural Networks
 Human brain consists of interconnected neurons
 Mathematical model of neuron
 Train neurons to predict output
 Demo
 Deep Learning
 If network has more than one hidden
layer

14
Machine learning algorithms

Machine Learning

Supervised
Unsupervised Reinforcement (Learn
(Data with output while running)
(Data without output labels)
labels)

Classification Clustering Model Based

Regression

15
Machine learning algorithms
Supervised Learning:
 Supervised learning is used for labeled datasets
 Training data is provided
 Training data is labeled.
 Each data input contains its desired output as well.
 Usually last column in dataset
 Once training is complete. The label of unseen (new) data is
predicted
 Performance is measured based on how accurately the new data
is predicted

16
Machine learning algorithms

Classification
 The computer learns to classify things into categories based
on given examples
 The system learns a model/function called classifier that
maps input to a discrete output
 Models: ANN, SVM, CNN
 Applications: Images classification, speech recognition,
Email spam detection, etc.
Classification:
Predict a class label for an input

17
Machine learning algorithms

Large Language Models (LLMs)


 Large language models (LLMs) are trained on a large
amount of textual data and learn patterns, and
structures of the language.
 LLMs are under the domain of natural language
processing (NLP), which can generate human-like text
and answer questions.
 Models: Transformer Series
 Applications: OpenAI, ChatGPT etc.

18
Machine Learning Pipeline

The various stages that are involved in building


an accurate and effective machine learning
pipeline include the following steps:

Data Feature Data


Preprocessing
Collection Extraction Distribution

Model Model
ML Algorithm
Deployment Evaluation

19
Scikit-learn

Opensource machine learning library


Supports supervised and unsupervised machine
learning
Contains various preprocessing, machine
learning and model evaluation algorithms that can
be used for various tasks including classification
Also contains some popular datasets that can be
imported

20
Machine Learning Pipeline

Data Collection:
 The data can be in various formats such as text,
tabular, image etc.
 Important to have good quality of data
 Use pandas to load the dataset or sklearn module

import pandas as pd from sklearn import datasets


data=pd.read_csv('ortho_dataset.csv') iris = datasets.load_iris()

Data Feature Data


Preprocessing
Collection Extraction Distribution

Model Model
ML Algorithm
Deployment Evaluation

21
Example

Classifying orthopedic patients based on


biomechanical features
Class Labels
 Abnormal
 Normal
Features
 Six features
Binary Classification
 Only two output labels
 Normal or Abnormal
Multiclass Classification
22
Machine Learning Pipeline

Preprocessing:
 Data is cleaned and transformed into something that is usable by
machine learning algorithms.
 Data normalization, dealing with outliers, transforming categorical
features, histogram equalization and data augmentation, among
other tasks.
 Example: Label encoder transforms categorical data to numeric.

from sklearn.preprocessing import LabelEncoder


data['class']=label_encoder.fit_transform(data['class'])

Data Feature Data


Preprocessing
Collection Extraction Distribution

Model Model
ML Algorithm
Deployment Evaluation

23
Preprocessing
 Display data info

 Data head

24
Preprocessing
 Class Distribution (plt histogram plot)

25
Preprocessing
 Convert categorical data to numerical data
from sklearn.preprocessing import LabelEncoder
label_encoder = LabelEncoder()
data['class']=label_encoder.fit_transform(data['class'])

 Before Encoding

 After Encoding

26
Machine Learning Pipeline

Feature Extraction:
 Also known as feature engineering,
 Extract useful features from the preprocessed data
 Various methods such as Discrete Wavelet Transform,
Fourier Transform, Morphological opening etc.

Data Feature Data


Preprocessing
Collection Extraction Distribution

Model Model
ML Algorithm
Deployment Evaluation

27
Machine Learning Pipeline

Data Distribution:
 Separate inputs and outputs from the dataset
 Output is usually the last column in the dataset for supervised
classification.

Data Feature Data


Preprocessing
Collection Extraction Distribution

Model Model
ML Algorithm
Deployment Evaluation

28
Data Distribution in Supervised Learning

Data

Inputs (x) Outputs/Label (y)


x=data.drop('class', axis=1) y=data['class']

29
Machine Learning Pipeline

Data Distribution:
 Next, a split is performed on the data that distributes it randomly
into two sets called training and testing.
 Usually 70% Training, 30% Testing
 train_test_split function in the model_selection
module of sklearn library

from sklearn.model_selection import train_test_split


x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3)

Data Feature Data


Preprocessing
Collection Extraction Distribution

Model Model
ML Algorithm
Deployment Evaluation

30
Data Distribution in Supervised Learning

Inputs (x) Outputs/Label (y)


x=data.drop('class', axis=1) y=data['class']

x_train y_train

x_test y_test

from sklearn.model_selection import train_test_split


x_train, x_test, y_train, y_test = train_test_split(x, y,
test_size=0.3,random_state=4)

31
Machine Learning Pipeline

Model Selection / ML Algorithm:


 A ML model is trained using the training data from the previous
step
 The model learns patterns from the data and determines a
separation boundary that can make accurate predictions for
unknown inputs.
 Scikit library contains various machine learning algorithms that
can be imported

Data Feature Data


Preprocessing
Collection Extraction Distribution

Model Model
ML Algorithm
Deployment Evaluation

32
ML Algorithm

Support Vector Machines (SVM)


 A popular algorithm to calculate decision boundaries
between classes
from sklearn import svm

Consider the following


data

33
ML Algorithm

Support Vector Machines (SVM)


 Finds a maximum margin
separator - i.e., boundary that
maximizes the distance between
any of the data points
 Support Vectors are those
datapoints that the margin pushes
up against and influence the
position and orientation of the
from sklearn import svm
vectors model = svm.SVC(kernel=‘linear’)
 The support vectors (on bothmodel.fit(x_train, y_train)

sides) form the margin

34
Kernel

The Kernel is the method that transform the input data to


higher-dimensional space to be able to separate the data.
The choice of the kernel depends on the input data

35
Kernel

The wrong kernel (i.e. method of data separation)


may result in less accurate output.

Linear kernel Polynomial kernel


accuracy is low accuracy is low

36
Machine Learning Pipeline

Model Evaluation:
 The performance of the system is evaluated on the test data based
on various metrics.
 Accuracy in a classification model determines the fraction of total
predictions that were correctly classified.
 Sensitivity, Specificity, Confusion Matrix etc.
 Different performance metrics can be imported from sklearn module
from sklearn.metrics import accuracy_score
y_pred = model.predict(x_test)
accuracy=accuracy_score(y_test, y_pred)
Data Feature Data
Preprocessing
Collection Extraction Distribution

Model Model
ML Algorithm
Deployment Evaluation

37
Confusion Matrix

Also known as error matrix


Table layout that allows visualization of the
performance of an algorithm
 TP = True Positive (Number of positive data (normal
patients) classified correctly)
 TN = True Negative (Number of negative data
(abnormal patients correctly classified)

38
Confusion Matrix

 FP = False Positive (Number of negative data


(abnormal patients) wrongly classified as positive
(normal))
 FN = False Negative (Number of positive data (normal
patients) wrongly classified as negative(abnormal)).
The matrix can be used to calculate
Sensitivity(recall), Specificity, Precision etc.

39
Accuracy Measurements

Confusion matrix shows the number of times the algorithm


gave correct answers compared to the actual ones.
Predicted
Actual
Yes No

Yes 36 5 [ 36
6
5
15 ]
No 6 15

Accuracy = =

Sensitivity = =

Specificity =

40
Accuracy Measurements

The ideal case for confusion matrix is that the


predicted output is the same as the actual ones.
Predicted
Actual
Yes No

Yes 41 0 [ 41
0
0
21 ]
No 0 21

41
Machine Learning Pipeline

Model Deployment:
 Once the model provides satisfactory results on the
training and testing data, it is deployed in the
production environment to make predictions on unseen
data.

Data Feature Data


Preprocessing
Collection Extraction Distribution

Model Model
ML Algorithm
Deployment Evaluation

42
Exercise

Open the Week11_ML_EX.ipynb notebook in


google colab
Import the ortho_dataset.csv file into the current
folder of colab
Execute the training and testing process
described in the previous slides but with random
state of the train test split set to 10.
Add code for the commented blocks given in the
end.

43

You might also like