0% found this document useful (0 votes)
13 views

ML Lab - 2

This is ML lab experiment

Uploaded by

Sindhu Dhavanam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

ML Lab - 2

This is ML lab experiment

Uploaded by

Sindhu Dhavanam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Roll No: 160621733015 Experiment No: 02

Name: Dhavanam Sindhu Date: 12/08/2024

6. SOURCE CODE:
#Loading the iris dataset
from sklearn import datasets

iris = datasets.load_iris()

#Data set Display

import pandas as pd
iris = datasets.load_iris()

df = pd.DataFrame(data= iris.data, columns=iris.feature_names) # Changed 'DatsFrame' to


'DataFrame'
x= df.head()

y=iris.target_names
print(x)

#Dataset pre-processing

import numpy as np
from sklearn.model_selection import train_test_split

X = [[1, 2], [3, 4], [5, 6], [7, 8]]


y = [0, 1, 0, 1]
X = np.array(X)

y = np.array(y)

print(X.shape)
print(y.shape)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)


print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)

Department of CSE, Stanley College of Engineering and Technology for Women


Roll No: 160621733015 Experiment No: 02

Name: Dhavanam Sindhu Date: 12/08/2024

Select the best attribute to split the data based on an attribute selection measure (e.g.,
information gain, Gini index).

1. Split the dataset into subsets where the selected attribute has distinct values.
2. Repeat the process recursively for each child node (until one of the stopping conditions
is met, like maximum depth or a single class in the node).

3. Assign the majority class (for classification) or mean value (for regression) to the leaf
node.

1. Code execution process


STEPS:
1. Opening colab:
 Go to Google Colab.
 You can either sign in with your Google account or use the "New Notebook”
button if you're already signed in.
2. Log in colab:
 If you're not logged in automatically, click the "Sign in" button at the top right
corner of the page.
 Enter your Google account credentials to access your Colab environment.

Department of CSE, Stanley College of Engineering and Technology for Women


Roll No: 160621733015 Experiment No: 02

Name: Dhavanam Sindhu Date: 12/08/2024

3. Import Required Libraries:


a. Import Libraries:
 Import the necessary libraries for data manipulation, model creation, and
visualization:
 pandas for handling datasets.
 numpy for numerical operations.
 matplotlib.pyplot for plotting.
 LabelEncoder from sklearn.preprocessing to encode categorical
variables.
 DecisionTreeClassifier from sklearn.tree for building the decision
tree model.
 plot_tree to visualize the decision tree.

Department of CSE, Stanley College of Engineering and Technology for Women


Roll No: 160621733015 Experiment No: 02

Name: Dhavanam Sindhu Date: 12/08/2024

4. Load and Visualize the Dataset:


1. Read the Dataset:
 Load the dataset using pd.read_csv() by providing the file path (e.g.,
'/content/tennis.csv').
 Display the dataset to verify that it was loaded correctly using df to print
the dataset.

Department of CSE, Stanley College of Engineering and Technology for Women


Roll No: 160621733015 Experiment No: 02

Name: Dhavanam Sindhu Date: 12/08/2024

5. Preprocess the Data


1. Label Encoding:
 For each categorical column in the dataset (outlook, temp, humidity, wind,
play), use the LabelEncoder() to convert the categorical values into
numerical values.
 Update the DataFrame df by replacing categorical values with their
encoded numerical equivalents for these features.

 Steps:
 For each feature, call le.fit_transform(column_name) to encode the
values.
 Replace the original feature values with the encoded ones.
2. Verify the Preprocessed Data:

 Print the updated DataFrame (df) to ensure all categorical features have
been successfully converted into numerical form.

Department of CSE, Stanley College of Engineering and Technology for Women


Roll No: 160621733015 Experiment No: 02

Name: Dhavanam Sindhu Date: 12/08/2024

6. Split Dataset into Features and Target:


Define Independent (Features) and Dependent (Target) Variables:
 Define independent_variable by dropping the target column (play) using df.drop().

 Define dependent_var by selecting the play column (target variable) from the
DataFrame df.

7. Build and Train the Decision Tree Classifier:


1. Initialize the Decision Tree Classifier:
 Create a model object by initializing DecisionTreeClassifier() from sklearn.tree.
2. Fit the Model:
 Train the decision tree model by calling the fit() method, passing the
independent_variable (features) and dependent_var (target) as arguments.
3. Evaluate the Model:
 Check the performance of the trained model on the same dataset using the score()
method.
 The model.score() method will return the accuracy score of the model on the
training data.

Department of CSE, Stanley College of Engineering and Technology for Women


Roll No: 160621733015 Experiment No: 02

Name: Dhavanam Sindhu Date: 12/08/2024

8. Create Input Data for Prediction:


1. Prepare Input Data:
 Define the input data for testing or prediction by creating a DataFrame with
the required number of features (replace features list with actual feature
names).
 For example, create a DataFrame input_data containing a single row with the
values [1, 2, 0, 1, 0].

9. Visualize the Decision Tree:


1. Plot the Decision Tree:
 Create a visual representation of the decision tree using plot_tree() from
sklearn.tree.
 Specify parameters such as:
 filled=True to color the nodes based on class labels.
 feature_names to label the nodes with the feature names from
df.columns.
 class_names to label the output classes using le.classes_.
2. Show the Decision Tree Plot:
 Use plt.show() to display the generated decision tree plot.

Department of CSE, Stanley College of Engineering and Technology for Women


Roll No: 160621733015 Experiment No: 02

Name: Dhavanam Sindhu Date: 12/08/2024

Department of CSE, Stanley College of Engineering and Technology for Women


Roll No: 160621733015 Experiment No: 02

Name: Dhavanam Sindhu Date: 12/08/2024

10. Posterior Probability:


a. Compute Likelihood for Each Feature:
 For or each feature in the test set, calculate the likelihood P(feature∣class)P(feature
| class)P(feature∣ class), i.e., the probability of observing a specific feature value
given the class label.
 For example, calculate the likelihood for Outlook = 'Sunny' when PlayTennis =
Yes.
 Formula:
P(feature value∣ class value)=Number of feature occurences with class /
Total number of class occurences

11. Bayes Theorem and Posterior Probability Calculation


a. Calculate Posterior Probability for Each Test Instance:
 For each test instance, use the prior probabilities and the likelihoods to
compute the posterior probability using Bayes’ Theorem.
 Bayes' Theorem:
P(class| features) ∝ P(class) × P(features| class)
 For PlayTennis = Yes:
P(Yes| features) ∝ P(Yes) × P(Outlook| Yes) × P(Temperature| Yes) ×
P(Humidity| Yes) × P(Wind| Yes)
 For PlayTennis = No:
P(No| features) ∝ P(No) × P(Outlook| No) × P(Temperature| No) ×
P(Humidity| No) × P(Wind| No)

12. Classification Decision:


a. Make Prediction for Each Test Instance:
 Compare the computed posterior probabilities for both classes (Yes and No).
 Assign the class with the higher posterior probability as the predicted class.

Department of CSE, Stanley College of Engineering and Technology for Women


Roll No: 160621733015 Experiment No: 02

Name: Dhavanam Sindhu Date: 12/08/2024

13. Evaluate Model Performance:


a. Evaluate the Model:
 Compute accuracy by comparing the predicted values with the actual values from
the test set.
 Use a confusion matrix to understand how well the model classifies each class.
 Accuracy Formula:
Accuracy = Number of correct predictions / Total number of predictions

b. Comparison with Built-in Naive Bayes:


 For validation, train a GaussianNB model from scikit-learn and compare the
predictions, accuracy, and confusion matrix with the manual implementation.

Department of CSE, Stanley College of Engineering and Technology for Women


Roll No: 160621733015 Experiment No: 02

Name: Dhavanam Sindhu Date: 12/08/2024

Department of CSE, Stanley College of Engineering and Technology for Women


Roll No: 160621733015 Experiment No: 02

Name: Dhavanam Sindhu Date: 12/08/2024

Department of CSE, Stanley College of Engineering and Technology for Women


Roll No: 160621733015 Experiment No: 02

Name: Dhavanam Sindhu Date: 12/08/2024

6. Preprocess Images with OpenCV (Advanced)

 To use OpenCV for custom preprocessing,we can convert images to grayscale,


resize them, or apply filters:

7. Save pre- processed Data

 Now we can save the preprocessed data to be reused later.

Department of CSE, Stanley College of Engineering and Technology for Women


Roll No: 160621733015 Experiment No: 02

Name: Dhavanam Sindhu Date: 12/08/2024

Department of CSE, Stanley College of Engineering and Technology for Women

You might also like