ORANGE DATA MINING Steps
ORANGE DATA MINING Steps
From TrainData, you would have noticed that the Feature Type for most of the columns is
Numeric Feature. In supervised learning models, we have both the features and the labels. The
labels are the output. Therefore, we need to define an output for our Palmer Penguin model.
We will assign species as our label since that is what we want to identify.
Therefore, we will change the Feature Type for species, from Categorical Feature to Categorical
Label. To do that, we will be using Select Columns.
After choosing a target label, we need to split the data
Connect widget Select Columns to widget Data Sampler. We can do that by dragging the output
from Select Columns to the input of Data Sampler.
After the connection is made, double-click on Data Sampler to open the properties tab.
How do we know if the data is actually split or not?
Let’s inspect on how the data is being split through Data Sampler.
We will be using Data Info.
Connect widget Data Sampler to the second widget Data Info. We can do that by dragging the
output from Data Sampler to the input of the second Data Info.
Take note of the connection name. We will change this. Double-click on the connection.
What do we do after having split the data?
After creating a model, we need to test the model and check its accuracy
Let’s try a couple of other classification algorithms
Now that we have found which model gives us the best results, we can use that one!
Since
the Random Forest algorithm is not working well with one of the species,
let’s use another algorithm