0% found this document useful (0 votes)
62 views

DSML Practical

This document describes building a decision tree model to predict customers' likelihood of buying lipstick based on their attributes. It provides the dataset, target variable, and outlines the ID3 algorithm used to construct the decision tree in steps. The algorithm calculates entropy at each node to select the optimal splitting attribute, recursively splitting the data and adding branches until reaching leaf nodes of single target variable values. For new test data of [Age < 21, Income = Low, Gender = Female, Marital Status = Married], the decision tree would predict the likelihood of a "Buys" or not.

Uploaded by

focusedbanda117
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views

DSML Practical

This document describes building a decision tree model to predict customers' likelihood of buying lipstick based on their attributes. It provides the dataset, target variable, and outlines the ID3 algorithm used to construct the decision tree in steps. The algorithm calculates entropy at each node to select the optimal splitting attribute, recursively splitting the data and adding branches until reaching leaf nodes of single target variable values. For new test data of [Age < 21, Income = Low, Gender = Female, Marital Status = Married], the decision tree would predict the likelihood of a "Buys" or not.

Uploaded by

focusedbanda117
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Name:Shreyas Satish Jagadale

Roll No. 322030


Batch B2
PRN 22110649

Assignment 3
AIM: Write a program to do: A dataset collected in a cosmetics shop showing details of customers
and whether or not they responded to a special offer to buy a new lip-stick is shown in table below.
Use this dataset to build a decision tree, with Buys as the target variable, to help in buying lipsticks in
the future. Find the root node of decision tree. According to the decision tree you have made from
previous training data set, what is the decision for the test data: [Age < 21, Income = Low, Gender =
Female, Marital Status = Married]?

OBJECTIVE: Describe the Data Science Process and explore components interaction. Apply
specific unsupervised machine learning algorithm for a particular problem.

ALGORITHM:
Here is a step-by-step algorithm for building a decision tree using the ID3 (Iterative Dichotomiser 3)
algorithm, a popular method for constructing decision trees in machine learning:

ID3 Algorithm for Decision Tree Construction:

Input:
- Data: The training dataset containing features and corresponding target labels.
- Attributes: The set of attributes/features available for classification.
- Target Variable: The variable we want to predict (e.g., "Buys" in the given cosmetics shop dataset).

Output:
- Decision Tree: A tree structure representing a sequence of decisions that can be followed to make
predictions.

Algorithm Steps:

1. If all examples in the dataset belong to the same class: - Return a leaf node
with the class label.

2. If the list of attributes is empty (no more features to split on):


- Return a leaf node with the most frequent class label in the dataset.

3. Calculate the entropy (or Gini index) of the dataset based on the target variable.

4. For each attribute in the attribute list:


- Calculate the information gain (or Gini gain) for the attribute.
- Select the attribute with the highest information gain (or lowest Gini index) as the splitting
attribute.
5. Create a decision tree node with the selected attribute as the root
- For each value of the selected attribute:
- Split the dataset based on the selected attribute value.
- Recursively apply the ID3 algorithm to the divided subsets of the data.
- Attach the resulting subtree to the corresponding branch of the root node.
6. Return the constructed decision tree.

Additional Notes:

- Entropy Calculation:
- Entropy measures the impurity or disorder of a dataset. For a binary classification
problem, entropyis calculated as: \( -p_+ \log_2(p_+) - p_- \log_2(p_-) \), where \(p_+\) and
\(p_-\) are the probabilitiesof positive and negative classes, respectively.

- Information Gain:
- Information gain measures the effectiveness of an attribute in classifying the
dataset. It is calculated as the difference between the entropy of the original dataset
and the weighted sum ofentropies after splitting on the attribute.

- Stopping Criteria:
- The tree construction stops when either all data points in a branch belong to the same
class or thereare no more attributes to split on.

- Tree Pruning (Optional):


After the tree is constructed, pruning techniques can be applied to avoid overfitting and improve the
tree's generalization ability on unseen data

Code:
Output:

CONCLUSION:
The provided Python code demonstrates the construction of a decision tree classifier
using a cosmetics shop dataset. The dataset includes information about customers'
age, income, gender, marital status, and whether they purchased a lipstick ("Buys"
column).

You might also like