DSML Practical
DSML Practical
Assignment 3
AIM: Write a program to do: A dataset collected in a cosmetics shop showing details of customers
and whether or not they responded to a special offer to buy a new lip-stick is shown in table below.
Use this dataset to build a decision tree, with Buys as the target variable, to help in buying lipsticks in
the future. Find the root node of decision tree. According to the decision tree you have made from
previous training data set, what is the decision for the test data: [Age < 21, Income = Low, Gender =
Female, Marital Status = Married]?
OBJECTIVE: Describe the Data Science Process and explore components interaction. Apply
specific unsupervised machine learning algorithm for a particular problem.
ALGORITHM:
Here is a step-by-step algorithm for building a decision tree using the ID3 (Iterative Dichotomiser 3)
algorithm, a popular method for constructing decision trees in machine learning:
Input:
- Data: The training dataset containing features and corresponding target labels.
- Attributes: The set of attributes/features available for classification.
- Target Variable: The variable we want to predict (e.g., "Buys" in the given cosmetics shop dataset).
Output:
- Decision Tree: A tree structure representing a sequence of decisions that can be followed to make
predictions.
Algorithm Steps:
1. If all examples in the dataset belong to the same class: - Return a leaf node
with the class label.
3. Calculate the entropy (or Gini index) of the dataset based on the target variable.
Additional Notes:
- Entropy Calculation:
- Entropy measures the impurity or disorder of a dataset. For a binary classification
problem, entropyis calculated as: \( -p_+ \log_2(p_+) - p_- \log_2(p_-) \), where \(p_+\) and
\(p_-\) are the probabilitiesof positive and negative classes, respectively.
- Information Gain:
- Information gain measures the effectiveness of an attribute in classifying the
dataset. It is calculated as the difference between the entropy of the original dataset
and the weighted sum ofentropies after splitting on the attribute.
- Stopping Criteria:
- The tree construction stops when either all data points in a branch belong to the same
class or thereare no more attributes to split on.
Code:
Output:
CONCLUSION:
The provided Python code demonstrates the construction of a decision tree classifier
using a cosmetics shop dataset. The dataset includes information about customers'
age, income, gender, marital status, and whether they purchased a lipstick ("Buys"
column).