Lecture Slides - ML - Part 2
Lecture Slides - ML - Part 2
Launch,
monitor,
maintain
Problem Definition
• Select SPECIFIC goals/problems!
• Avoid broad problems!
• Several questions to ask:
• What is the expected benefit?
• Is it supervised or unsupervised?
• Is it regression or classification (binary or multi-class)?
• fit_transform()
• Learns and uses the parameters needed to transform the data (e.g., mean, std.dev, etc.)
• Must be used to transform the TRAINING data
• transform()
• Uses the learned parameters to transform the data
• Can only be used after fit_transformed()is used
• Must be used to transform the TEST data
Data Transformation
• What does fit_transform() and transform() do?
Train set (learns)
Size Color Price Mean=25,550 Size White Red Black Price
Compact White 22,000 Stddev=3,537 1 1 0 0 -1.00
Compact Red 22,750 1 0 1 0 -0.79
Mid-size Black 25,000 2 0 0 1 -0.16
fit_transform()
Full-size White 32,000 3 1 0 0 1.82
Mid-size Red 26,000 2 0 1 0 0.13
Test set
Size Color Price transform()
Size White Red Black Price
Mid-size Purple 23,000 2 0 0 0 -0.60
Data Transformation
• Do not use fit_transform() on train and test separately!!!
Train set
Size Color Price Mean=23,250 Size White Red Black Price
Compact White 22,000 Stddev=1,275 1 1 0 0 -0.98
Compact Red 22,750 fit_transform() 1 0 1 0 Instead, use only
-0.39
transform()
Mid-size Black 25,000 2 0 0 1 1.37
Test set
Size Color Price Mean=46,000 Size White Red Blue Price Size White Red Black Price
Mid-size Blue 43,000 Stddev=2,944 1 0 0 1 2 0 0 0
-1.02 15.49
Full-size White 45,000 2 1 0 0 -0.34 unk 1 0 0 17.06
fit_transform()
Mid-size Red 50,000 1 0 1 0 1.36 2 0 1 0 20.98
Data Transformation (fit_transform)
• When to use fit_transform and transform()
Train Models
• Select the right algorithm(s) for the task:
Regression Tasks Classification Tasks
DecisionTreeRegressor DecisionTreeClassifier
SVR SCV
RandomForestRegressor RandomForestClassifier
Etc. Etc.
Discover and
Problem Data
Get the data visualize the
definition transformation
data