ML Lab - 2
ML Lab - 2
6. SOURCE CODE:
#Loading the iris dataset
from sklearn import datasets
iris = datasets.load_iris()
import pandas as pd
iris = datasets.load_iris()
y=iris.target_names
print(x)
#Dataset pre-processing
import numpy as np
from sklearn.model_selection import train_test_split
y = np.array(y)
print(X.shape)
print(y.shape)
Select the best attribute to split the data based on an attribute selection measure (e.g.,
information gain, Gini index).
1. Split the dataset into subsets where the selected attribute has distinct values.
2. Repeat the process recursively for each child node (until one of the stopping conditions
is met, like maximum depth or a single class in the node).
3. Assign the majority class (for classification) or mean value (for regression) to the leaf
node.
Steps:
For each feature, call le.fit_transform(column_name) to encode the
values.
Replace the original feature values with the encoded ones.
2. Verify the Preprocessed Data:
Print the updated DataFrame (df) to ensure all categorical features have
been successfully converted into numerical form.
Define dependent_var by selecting the play column (target variable) from the
DataFrame df.