0% found this document useful (0 votes)
8 views8 pages

Lab 3-ML

The document discusses the use of Random Forest Classifier and Random Forest Regressor for classification and regression tasks, respectively, using the Iris and Boston housing datasets. It outlines the steps for loading data, preprocessing, training, and evaluating models, along with example code. Additionally, it introduces IBM Watson AutoAI, a tool for automating the machine learning model building process, detailing steps for creating a project, uploading data, and running experiments.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views8 pages

Lab 3-ML

The document discusses the use of Random Forest Classifier and Random Forest Regressor for classification and regression tasks, respectively, using the Iris and Boston housing datasets. It outlines the steps for loading data, preprocessing, training, and evaluating models, along with example code. Additionally, it introduces IBM Watson AutoAI, a tool for automating the machine learning model building process, detailing steps for creating a project, uploading data, and running experiments.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Classification Algorithm: Random Forest Classifier

We'll use the Random Forest Classifier for the classification task. Random

Forest is an ensemble learning method that creates multiple decision trees and

combines their results to make predictions. It is robust and works well for both

binary and multi-class classification problems.

Steps:

 Load the dataset

 Preprocess data (train-test split)

 Train the model

 Evaluate the model

Example Code:

We'll use the Iris dataset, a popular classification dataset, where we predict the

species of Iris flowers based on their features.

# Import necessary libraries

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

from sklearn.ensemble import RandomForestClassifier

from sklearn.metrics import accuracy_score, classification_report,

confusion_matrix

# Step 1: Load the Iris dataset

iris = load_iris()

X = iris.data # features (sepal length, sepal width, petal length, petal width)

y = iris.target # target variable (species of the flower)

# Step 2: Train-test split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,

random_state=42)
# Step 3: Initialize the Random Forest Classifier model

rf_model = RandomForestClassifier(n_estimators=100, random_state=42)

# Step 4: Train the model

rf_model.fit(X_train, y_train)

# Step 5: Make predictions on the test set

y_pred = rf_model.predict(X_test)

# Step 6: Evaluate the model

accuracy = accuracy_score(y_test, y_pred)

print(f"Accuracy: {accuracy:.2f}")

# Additional evaluation metrics

print("\nClassification Report:\n", classification_report(y_test, y_pred))

print("\nConfusion Matrix:\n", confusion_matrix(y_test, y_pred))

Explanation:

1. Loading Data: We load the Iris dataset using load_iris() from

sklearn.datasets. The dataset contains 150 samples of iris flowers, and each

sample has 4 features (sepal length, sepal width, petal length, and petal

width).

2. Data Preprocessing: We split the dataset into training and testing sets using

train_test_split() (70% for training, 30% for testing).

3. Model Training: We use the RandomForestClassifier from

sklearn.ensemble to train the model. We set n_estimators=100 for the number

of trees in the forest.

4. Model Evaluation: We evaluate the model using accuracy_score, which

measures how many predictions are correct. Additionally, we print the

Classification Report and Confusion Matrix for detailed performance

analysis.
2. Regression Algorithm: Random Forest Regressor

Next, let's use Random Forest Regressor for a regression task. Random Forest

Regressor works similarly to the Random Forest Classifier but for continuous

target variables. It builds multiple decision trees and averages their predictions.

Steps:

 Load the dataset

 Preprocess data (train-test split)

 Train the model

 Evaluate the model

Example Code:

We'll use the Boston housing dataset, which contains information about various

properties of homes and the corresponding house prices (target variable).

# Import necessary libraries

from sklearn.datasets import load_boston

from sklearn.model_selection import train_test_split

from sklearn.ensemble import RandomForestRegressor

from sklearn.metrics import mean_squared_error, r2_score

# Step 1: Load the Boston housing dataset

boston = load_boston()

X = boston.data # features (e.g., crime rate, number of rooms, etc.)

y = boston.target # target variable (house prices)

# Step 2: Train-test split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,

random_state=42)

# Step 3: Initialize the Random Forest Regressor model

rf_regressor = RandomForestRegressor(n_estimators=100, random_state=42)


# Step 4: Train the model

rf_regressor.fit(X_train, y_train)

# Step 5: Make predictions on the test set

y_pred = rf_regressor.predict(X_test)

# Step 6: Evaluate the model

mse = mean_squared_error(y_test, y_pred)

r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error: {mse:.2f}")

print(f"R-squared: {r2:.2f}")

Explanation:

1. Loading Data: We load the Boston housing dataset using load_boston() from

sklearn.datasets. The dataset contains information about 506 homes and 13

features (such as crime rate, average number of rooms, etc.), and the target is

the median value of homes in thousands of dollars.

2. Data Preprocessing: We split the dataset into training and testing sets (70%

for training, 30% for testing).

3. Model Training: We use the RandomForestRegressor from

sklearn.ensemble to train the model. Again, we set n_estimators=100 for the

number of trees in the forest.

4. Model Evaluation: We evaluate the model using the Mean Squared Error

(MSE), which measures the average squared difference between the actual

and predicted values. We also use R-squared (R²), which tells us how well

the model explains the variance in the data (a value close to 1 means a good

fit).
Build Auto AI model in Watson

IBM Watson AutoAI is a tool that allows users to automate the process of building,

training, and deploying machine learning models. It provides an easy-to-use

platform that performs tasks such as data preprocessing, model selection, and

hyperparameter tuning automatically. This makes it a great tool for both novice and

experienced data scientists who want to quickly create and deploy machine

learning models.

Here's a step-by-step tutorial on how to build an AutoAI model in IBM Watson

Studio.

Step 1: Login to IBM Cloud Account

Before using IBM Watson AutoAI, you need an IBM Cloud account and access to

Watson Studio. If you don't have an account, follow these steps:

1. Login to IBM Cloud account:

2. Access Watson Studio:

o Go to IBM Watson Studio.

o Once logged in, you can access the tools for building AutoAI models.

Step 2: Create a New Project in Watson Studio

Once you have access to IBM Watson Studio, follow these steps to create a new

project.

1. Login to Watson Studio:

o Go to the IBM Watson Studio dashboard and sign in with your IBM

Cloud credentials.
2. Create a New Project:

o Click on "New Project".

o Select "Create an empty project".

o Choose the project name, description, and a location for your project.

For example, name it "AutoAI_Model_Training".

o Select the "Machine Learning" option.

o Click Create.

Step 3: Upload Data to Your Project

Before building an AutoAI model, you need data. You can upload a dataset to

Watson Studio for training.

1. Go to the "Assets" section in your project and click on "Add to project".

2. Choose "Data" and upload a CSV file or a dataset from a public repository.

o For example, you can use the Iris dataset, Titanic dataset, or any other

tabular dataset that you want to use for training the model.

o Once uploaded, the data will appear in the "Data" section of the

project.

Step 4: Create an AutoAI Experiment

Now you can create the AutoAI experiment to automatically build a model.

1. Go to the "Assets" tab of your project.

2. Click on "Add to project", then select "AutoAI experiment".

3. In the "Create AutoAI experiment" window:

o Select the dataset you uploaded earlier.

o Choose whether it is a classification or regression task. For example, if

the target variable is categorical, choose Classification; if it's

continuous, choose Regression.


o Provide a name for your experiment (e.g., "AutoAI_Iris_Experiment").

4. Start the experiment:

o Click Create to start the experiment. The AutoAI system will

automatically analyze the dataset and generate an optimal machine

learning pipeline based on the problem type (classification or

regression

Step 5: Data Preprocessing and Feature Engineering

AutoAI automatically performs data preprocessing tasks such as:

 Missing value imputation: Filling missing values with suitable replacements

(mean, median, or mode).

 Feature scaling: Standardizing numerical features for models like SVM,

Logistic Regression, etc.

 Feature selection: Identifying the most relevant features for model training.

You don’t need to manually perform these steps, as AutoAI handles them during

the experiment.

Step 6: Model Selection and Hyperparameter Tuning

After preprocessing the data, AutoAI automatically runs several models and selects

the best-performing ones based on your data. This includes:

 Model Selection: AutoAI uses several algorithms (e.g., Random Forest,

XGBoost, Neural Networks, etc.) to automatically choose the best one for

your task.

 Hyperparameter Tuning: AutoAI will perform hyperparameter optimization

to fine-tune the model for the best possible performance.


This process can take anywhere from a few minutes to several hours, depending on

the size and complexity of the dataset.

Step 7: Review and Compare the Models

Once the AutoAI experiment finishes running, you can view a Model Performance

Dashboard that compares different models' performance on several metrics (e.g.,

accuracy, F1 score, ROC-AUC for classification, or MSE, R² for regression).

1. Go to the "Experiments" tab and open your experiment.

2. Review the list of trained models and their evaluation metrics.

o You can view performance details like accuracy, precision, recall, F1-

score (for classification), or R², mean squared error (for regression).

o IBM Watson AutoAI will also show the top-performing pipeline,

which is usually the best model.

You might also like