0% found this document useful (0 votes)
152 views36 pages

ORANGE DATA MINING Steps

The document outlines the steps for building an AI model to predict penguin species using Orange Data Mining. It covers data acquisition, cleaning, defining output labels, splitting the data, and testing various classification algorithms to determine the best model. The process includes connecting different widgets and inspecting data splits to ensure accuracy in predictions.

Uploaded by

sureshpetasvis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
152 views36 pages

ORANGE DATA MINING Steps

The document outlines the steps for building an AI model to predict penguin species using Orange Data Mining. It covers data acquisition, cleaning, defining output labels, splitting the data, and testing various classification algorithms to determine the best model. The process includes connecting different widgets and inspecting data splits to ensure accuracy in predictions.

Uploaded by

sureshpetasvis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

ORANGE DATA MINING

STEPS for AI model to predict the penguin species .


After Data Acquisition, what should we do next?
What is the mean value of culmen_length feature?
Now that the data is clean and without any missing values, what next?

From TrainData, you would have noticed that the Feature Type for most of the columns is
Numeric Feature. In supervised learning models, we have both the features and the labels. The
labels are the output. Therefore, we need to define an output for our Palmer Penguin model.
We will assign species as our label since that is what we want to identify.
Therefore, we will change the Feature Type for species, from Categorical Feature to Categorical
Label. To do that, we will be using Select Columns.
After choosing a target label, we need to split the data
Connect widget Select Columns to widget Data Sampler. We can do that by dragging the output
from Select Columns to the input of Data Sampler.
After the connection is made, double-click on Data Sampler to open the properties tab.
How do we know if the data is actually split or not?

Let’s inspect on how the data is being split through Data Sampler.
We will be using Data Info.
Connect widget Data Sampler to the second widget Data Info. We can do that by dragging the
output from Data Sampler to the input of the second Data Info.
Take note of the connection name. We will change this. Double-click on the connection.
What do we do after having split the data?
After creating a model, we need to test the model and check its accuracy
Let’s try a couple of other classification algorithms
Now that we have found which model gives us the best results, we can use that one!
Since
the Random Forest algorithm is not working well with one of the species,
let’s use another algorithm

You might also like