0% found this document useful (0 votes)

6 views13 pages

Lec 2

lecture 2 AI course

Uploaded by

elmihassanf

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views13 pages

Lec 2

lecture 2 AI course

Uploaded by

elmihassanf

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

02/22/2024

1
02/22/2024

 Asking the right question

 For example, let's say you want
"Bob spent years planning,
to create a pipeline to determine executing, and optimizing how to
conquer a hill. Unfortunately, it
loan default prediction.
turned out to be the wrong hill."
 Your question can be:
 For a given loan, will it default or not?
 When will the loan default?
 How much money will be received for a given loan?
 What will be the profit made on a given loan?
 What will be the profit made on a given loan without using
disallowed input features?

 Collecting the right data for your pipeline.

 For the loan default example, some of the features that
may or may not be relevant to make accurate
predictions are
Feature Name Description Importance
Name Name of applicant None
Address Home address Low
Annual Income Applicants Income High
Debts Money currently borrowed by
High
the applicant
Credit history How well he/she has been
returning borrowed money Medium
Job Type Govt employee or businessman
or contractor etc Medium
Tax Paid Tax payed in the last financial
year None/High

2
02/22/2024

 Data transformation tier that processes the raw data;

some of the transformations that need to be done are:
 Data Cleansing

 Filtration

 Aggregation

 Augmentation

 Consolidation

 Storage

 Before we can apply our model to new measurements, we need to

know whether it actually works—that is, whether we should trust
its predictions.
 We cannot use the data we used to build the model to evaluate it.
 This is because our model can always simply remember the whole
training set, and will therefore always predict the correct label for
any point in the training set
 To assess the model’s performance, we show it new data (data that
it hasn’t seen before) for which we have labels.
 This is usually done by splitting the labeled data we have collected
into two parts
 Training data
 Testing data
 and sometimes into three:
 Training data
 Validation data
 Testing data

3
02/22/2024

 Once we split the data it is now time to run the

training and test data through a series of models and
assess the performance of a variety of models and
determine how accurate each candidate model is.
 This is an iterative process and various algorithms
might be tested until you have a model that sufficiently
answers your question.

 After we train our model with various algorithms

comes selection if the model that is optimal for the
problem at hand.
 We don't always pick the best performing model.

 An algorithm that performs well with the training data

might not perform well in production because it might
have overfitted the training data.
 At this point in time, model selection is more of an art
than a science

4
02/22/2024

 Once a model is chosen and finalized, it is now ready to

be used to make predictions.
 It is typically exposed via an API and embedded in
decision-making frameworks as part of an analytics
solution
 Some questions to consider in the deployment selection:
 Does the system need to be able to make predictions in real-
time (if yes, how fast: in milliseconds, seconds, minutes,
hours?)
 How often do the models need to be updated?
 What amount of volume or traffic is expected?
 What is the size of the datasets?
 Are there regulations, policies and other constraints that
need to be followed and abided by?

 Performance in the data science context does not mean

how fast the model is running, but rather how accurate
are the predictions.
 Data scientists monitoring machine learning models
are primarily looking at a single metric: drift.
 Drift happens when the data is no longer a relevant or
useful input to the model.
 Data can change and lose its predictive value.

 Data scientists and engineers must monitor the models

constantly to make sure that the model features
continue to be like the data points used during the
model training.

5
02/22/2024

 Yield Estimation and Prediction using AI

4 5 6 7

8 9
3

1 2

 Let’s assume that a hobby botanist is interested in

distinguishing the species of some iris flowers that she
has found. She has collected some measurements
associated with each iris:
the length and width of
the petals and the length
and width of the sepals,
all measured in
centimeters.

6
02/22/2024

 She also has the measurements of some irises that

have been previously identified by an expert botanist
as belonging to the species setosa, versicolor, or
virginica. For these measurements, she can be certain
of which species each iris belongs to. Let’s assume that
these are the only species our hobby botanist will
encounter in the wild.
 Problem Definition

Our goal is to build a machine learning model that can

learn from the measurements of these irises whose
species is known, so that we can predict the species for a
new iris.

 Data Ingestion
from sklearn.datasets import load_iris
iris_dataset = load_iris()

7
02/22/2024

Data Preparation
The iris data in sklearn is already clean

Visualize
Data

8
02/22/2024

Data Segregation
 scikit-learn contains a function that shuffles the
dataset and splits it for you: the train_test_split
function

Model Training
 Let’s use a k-nearest neighbors classifier,

 To make a prediction for a new data point, the

algorithm finds the point in the training set that is
closest to the new point.
 Then it assigns the label of this training point to the
new data point.
 In k-nearest neighbors, instead of using only the closest
neighbor to the new data point, we can consider any
fixed number k of neighbors in the training (for
example, the closest three or five neighbors). Then, we
can make a prediction using the majority class among
these neighbors.

9
02/22/2024

 Import Model

 Train the model

 Test the
model

 Model Evaluation
 This is where the test set that we created earlier comes
in. This data was not used to build the model, but we
do know what the correct species is for each iris in the
test set.

10
02/22/2024

 At the core of every AI system lies a fundamental truth:

The quality and quantity of data it ingests are paramount
to its effectiveness.
 Data is the lifeblood

that fuels AI algorithms,

allowing them to learn,
adapt and make
decisions.

1. Relevance: Data should be relevant to the problem being

addressed by the machine learning model. Irrelevant data can
introduce noise and decrease the model's performance.
2. Quality: High-quality data is accurate, consistent, and
reliable. It should be free from errors, missing values, and
inconsistencies that could adversely affect model training and
performance.
3. Quantity: Sufficient data is needed to train machine learning
models effectively. While the amount of required data varies
depending on the complexity of the problem and the
algorithm used, having a large and diverse dataset generally
improves model performance.
4. Representativeness: Data should be representative of the
real-world scenarios the model will encounter. It should cover
the full range of possible inputs and outcomes to ensure the
model generalizes well to unseen data.

11
02/22/2024

5. Variety: Diverse datasets encompassing different data types,

formats, and sources can provide richer insights and help the model
learn more robust representations. This includes structured data
(e.g., numerical data), unstructured data (e.g., text, images), and
semi-structured data (e.g., JSON).
6. Balance: Imbalanced datasets, where certain classes or categories
are overrepresented or underrepresented, can lead to biased
models. Balancing the dataset ensures that the model learns from
all classes equally, improving its ability to make accurate
predictions.
7. Temporal relevance: In many cases, the temporal aspect of data
is crucial. Time-series data, for example, often exhibits patterns and
trends over time that are essential for predictive modeling.
8. Ethical Considerations: Data should be collected and used
ethically, respecting privacy, consent, and fairness principles.
Biased or discriminatory data can lead to biased and unfair machine
learning models.
9. Accessibility: Data should be accessible and properly documented
to facilitate collaboration, reproducibility, and transparency in
machine learning research and development.

 https://www.kaggle.com/

 https://huggingface.co/

 https://datasetsearch.research.google.com/

 https://www.microsoft.com/en-
us/research/project/microsoft-research-open-data/

12
02/22/2024

 Select a problem to solve using AI in any area of your

choice.
 Find and download a relevant dataset available online.
 Generate/collect datapoints in addition to the
downloaded dataset.
 Write a single page report on
 Problem to solve
 Dataset details
 Number of datapoint
 List of features
 Source of dataset.
 How did you generate the datapoints.
 Link to the downloaded dataset.
 Email the report to me at [email protected]

Machine Learning - A First Course For Engineers and Scientists
No ratings yet
Machine Learning - A First Course For Engineers and Scientists
348 pages
Capstone Project
No ratings yet
Capstone Project
40 pages
APS1070 Lecture (3) Slides
No ratings yet
APS1070 Lecture (3) Slides
70 pages
Unit6 Part3 General Procedure
No ratings yet
Unit6 Part3 General Procedure
19 pages
Module 3 Data Science Machine Learning
No ratings yet
Module 3 Data Science Machine Learning
53 pages
ML 02 Dataset-Feature Selection PDF
No ratings yet
ML 02 Dataset-Feature Selection PDF
44 pages
Evaluating Model Performance: Evaluation Strategies: Train/Validation/Test
No ratings yet
Evaluating Model Performance: Evaluation Strategies: Train/Validation/Test
127 pages
Lecture 5 - Feature Extraction, Model Building & Evaluation
No ratings yet
Lecture 5 - Feature Extraction, Model Building & Evaluation
35 pages
SML Book Draft Latest (001 046)
No ratings yet
SML Book Draft Latest (001 046)
46 pages
Unit 3
No ratings yet
Unit 3
53 pages
Air Quality Prediction Using Machine Learning
No ratings yet
Air Quality Prediction Using Machine Learning
29 pages
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
No ratings yet
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
43 pages
Unit-4 Data Mining
No ratings yet
Unit-4 Data Mining
19 pages
DS Model Steps
No ratings yet
DS Model Steps
8 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
Chapter 02 Overview - 4
No ratings yet
Chapter 02 Overview - 4
43 pages
Machine Learning Part: Domain Overview
No ratings yet
Machine Learning Part: Domain Overview
20 pages
Bilal Ahmed Shaik Data Mining
No ratings yet
Bilal Ahmed Shaik Data Mining
88 pages
Unit III - I
No ratings yet
Unit III - I
15 pages
Artificial Intelligence (Advance) Notes?
No ratings yet
Artificial Intelligence (Advance) Notes?
33 pages
AI 501 - Lesson 4 - Supervised Learning
No ratings yet
AI 501 - Lesson 4 - Supervised Learning
41 pages
Lecture 2
No ratings yet
Lecture 2
36 pages
User Guide 0.16.1 PDF
No ratings yet
User Guide 0.16.1 PDF
2,160 pages
Preface To The Second Edition V 1 1
No ratings yet
Preface To The Second Edition V 1 1
9 pages
Chapter 19
No ratings yet
Chapter 19
30 pages
Big Data Analytics - Unit 3
No ratings yet
Big Data Analytics - Unit 3
55 pages
SML Book Draft Latest
No ratings yet
SML Book Draft Latest
275 pages
Exploring, Transforming, and Summarizing Input Datasets For Building Classification Models
No ratings yet
Exploring, Transforming, and Summarizing Input Datasets For Building Classification Models
21 pages
Unit 3 DM
No ratings yet
Unit 3 DM
34 pages
Supervised Machine Learning
No ratings yet
Supervised Machine Learning
112 pages
Final ML
No ratings yet
Final ML
2 pages
Xchapter 1
No ratings yet
Xchapter 1
31 pages
DWDM - Unit - V
No ratings yet
DWDM - Unit - V
93 pages
Module 5.pptx - 20250608 - 201231 - 0000
No ratings yet
Module 5.pptx - 20250608 - 201231 - 0000
43 pages
Classification Notes
No ratings yet
Classification Notes
14 pages
Machine Learning
No ratings yet
Machine Learning
57 pages
ML Unit II Modelling Notes
No ratings yet
ML Unit II Modelling Notes
18 pages
Minor Project
No ratings yet
Minor Project
21 pages
Breaking Into AI!
No ratings yet
Breaking Into AI!
30 pages
Lecture Notes 2016
No ratings yet
Lecture Notes 2016
132 pages
Machine Learning
No ratings yet
Machine Learning
95 pages
Ds Notes Mca
No ratings yet
Ds Notes Mca
30 pages
ML Checklist PDF
No ratings yet
ML Checklist PDF
4 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
24 pages
ML SIG - Day 1
No ratings yet
ML SIG - Day 1
55 pages
How To Create A Python Model
No ratings yet
How To Create A Python Model
29 pages
Predictive Unit 1
No ratings yet
Predictive Unit 1
22 pages
Ethics, Uses and Abuses of ML
No ratings yet
Ethics, Uses and Abuses of ML
11 pages
Machine Learning Basics
No ratings yet
Machine Learning Basics
32 pages
AI Project Report: By: Neha Kalra (17csu122) and Prerna Pathak (17csu143)
No ratings yet
AI Project Report: By: Neha Kalra (17csu122) and Prerna Pathak (17csu143)
22 pages
04 - Model Selection
No ratings yet
04 - Model Selection
62 pages
Machine Learning Lecture1 - 26-27 Aug
No ratings yet
Machine Learning Lecture1 - 26-27 Aug
30 pages
Unit 4 - Question Bank and Answers
No ratings yet
Unit 4 - Question Bank and Answers
23 pages
3 DM Classification
No ratings yet
3 DM Classification
62 pages
Prediction of Breast Cancer Using Machine Learning Algorithms - 2nd Review
No ratings yet
Prediction of Breast Cancer Using Machine Learning Algorithms - 2nd Review
21 pages
DM Unit - 3
No ratings yet
DM Unit - 3
21 pages
Predictive Analytics and Data Mining: Charles Elkan Elkan@cs - Ucsd.edu May 31, 2011
No ratings yet
Predictive Analytics and Data Mining: Charles Elkan Elkan@cs - Ucsd.edu May 31, 2011
165 pages
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint
From Everand
Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint
Brian Knight
3/5 (1)
E Advanced Service Functional Blocks C-Arm C-Arm
100% (3)
E Advanced Service Functional Blocks C-Arm C-Arm
78 pages
C TS4FI 2023-Demo
No ratings yet
C TS4FI 2023-Demo
5 pages
Tài liệu không có tiêu đề
No ratings yet
Tài liệu không có tiêu đề
2 pages
tms320f28377d (데이터시트)
No ratings yet
tms320f28377d (데이터시트)
253 pages
Cisco DevNets01t03
No ratings yet
Cisco DevNets01t03
57 pages
FI01 - Us - Kap07 RD500 - 2015
No ratings yet
FI01 - Us - Kap07 RD500 - 2015
38 pages
52-199x Series Digital Mixers
No ratings yet
52-199x Series Digital Mixers
17 pages
IX Developer - User Guide - Abr18
No ratings yet
IX Developer - User Guide - Abr18
54 pages
Driver List
No ratings yet
Driver List
20 pages
Basic Computer Questions and Answers 21.02.2019
No ratings yet
Basic Computer Questions and Answers 21.02.2019
45 pages
Nessus Report: 21/mar/2012:16:20:52 GMT
No ratings yet
Nessus Report: 21/mar/2012:16:20:52 GMT
74 pages
FT Lenovo - V50a - 24IMB - AIO - Spec
No ratings yet
FT Lenovo - V50a - 24IMB - AIO - Spec
8 pages
Network Security Checklist: Ensuring Robust Protection Against Cyber Threats
No ratings yet
Network Security Checklist: Ensuring Robust Protection Against Cyber Threats
12 pages
Welcome To The TEC Business Intelligence (BI) RFI / RFP Template
No ratings yet
Welcome To The TEC Business Intelligence (BI) RFI / RFP Template
58 pages
Batch Analysis Tool
No ratings yet
Batch Analysis Tool
8 pages
A Computer Is An Electronic Device That Has Storage
No ratings yet
A Computer Is An Electronic Device That Has Storage
4 pages
Welcome To Diligent-BTS One Pager
No ratings yet
Welcome To Diligent-BTS One Pager
1 page
Thesis Formate Be
No ratings yet
Thesis Formate Be
80 pages
15B11CI212 - Theoretical Foundations of Computer Science Tutorial 11 Solutions Automata Theory
No ratings yet
15B11CI212 - Theoretical Foundations of Computer Science Tutorial 11 Solutions Automata Theory
7 pages
Topic Research Worksheet: Examples of Acceptable Internet Sources
No ratings yet
Topic Research Worksheet: Examples of Acceptable Internet Sources
16 pages
2024 04 21 11.55.01
No ratings yet
2024 04 21 11.55.01
694 pages
ARMA RFM 6T6R B20 360W Datasheet
No ratings yet
ARMA RFM 6T6R B20 360W Datasheet
4 pages
Class 12 Ip Practical Programs 2024-25
No ratings yet
Class 12 Ip Practical Programs 2024-25
37 pages
The Girl With The Broken Heart Lurlene Mcdaniel Download
No ratings yet
The Girl With The Broken Heart Lurlene Mcdaniel Download
27 pages
Analog Testing 02
0% (1)
Analog Testing 02
39 pages
B1 LEVEL - EnglishFileQskillsRE - RegistrationForStudents
No ratings yet
B1 LEVEL - EnglishFileQskillsRE - RegistrationForStudents
24 pages
Pricelist Anandam - Id 11 Juli 2023
No ratings yet
Pricelist Anandam - Id 11 Juli 2023
2 pages
Chapter I
No ratings yet
Chapter I
18 pages
.An Approach To Physical Design of 28nm Technology Based Processor Chip Using IC Compiler
No ratings yet
.An Approach To Physical Design of 28nm Technology Based Processor Chip Using IC Compiler
4 pages
Module 5 LO2 F (ICT)
No ratings yet
Module 5 LO2 F (ICT)
19 pages