Lec 2
Lec 2
1
02/22/2024
2
02/22/2024
Filtration
Aggregation
Augmentation
Consolidation
Storage
3
02/22/2024
4
02/22/2024
10
5
02/22/2024
4 5 6 7
8 9
3
1 2
11
12
6
02/22/2024
13
Data Ingestion
from sklearn.datasets import load_iris
iris_dataset = load_iris()
14
7
02/22/2024
Data Preparation
The iris data in sklearn is already clean
15
Visualize
Data
16
8
02/22/2024
Data Segregation
scikit-learn contains a function that shuffles the
dataset and splits it for you: the train_test_split
function
17
Model Training
Let’s use a k-nearest neighbors classifier,
18
9
02/22/2024
Import Model
Test the
model
19
Model Evaluation
This is where the test set that we created earlier comes
in. This data was not used to build the model, but we
do know what the correct species is for each iris in the
test set.
20
10
02/22/2024
21
22
11
02/22/2024
23
https://www.kaggle.com/
https://huggingface.co/
https://datasetsearch.research.google.com/
https://www.microsoft.com/en-
us/research/project/microsoft-research-open-data/
24
12
02/22/2024
25
13