0% found this document useful (0 votes)
17 views7 pages

ML Assignment-01

Uploaded by

Aakash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views7 pages

ML Assignment-01

Uploaded by

Aakash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

ML ASSIGNMENT -01

PART –A

Question-01:Define random forest algorithm?

ANS: The Random Forest algorithm is an ensemble learning method primarily used for classification
and regression tasks. It operates by constructing multiple decision trees during training and outputs the
mode of the classes (classification) or the mean prediction (regression) of the individual trees.

Question-02:Define logistic Regression?

ANS: Logistic Regression is a statistical method used for binary classification problems, where
the goal is to predict one of two possible outcomes (e.g., yes/no, true/false, 0/1). It estimates the
probability that a given input belongs to a particular class based on a set of features

Question-03:Define Support vector machine?

ANS: A Support Vector Machine (SVM) is a supervised learning algorithm used for classification and
regression tasks. It is particularly effective for binary classification problems. The key idea behind SVM is
to find the optimal boundary, known as a hyperplane, that best separates the data into different classes

PART-B

Question-01:Decission Tree with numwrical example?

ANS: A Decision Tree is a supervised learning algorithm used for both classification and
regression tasks. It splits the dataset into subsets based on the most significant feature to predict
the target variable, using a tree-like structure.

Key Concepts:

1. Root Node: Represents the entire dataset and splits into subsets.
2. Decision Nodes: Intermediate nodes where the data gets further split.
3. Leaf Nodes: The final nodes representing the output (class label in classification or value
in regression).
4. Splitting Criteria: Measures like Gini Index, Information Gain, or Variance Reduction
(for regression) are used to split the data at each node.
Example: Classification using a Decision Tree

Let's use a simple dataset to predict whether a person will buy a product based on age and
income:

Step-by-Step Construction of the Decision Tree

1. Choose the Best Splitting Attribute: To determine the best split, we use Information
Gain or the Gini Index. Let's assume we use Gini Index here.

The Gini Index for a split is calculated as:

where pi is the proportion of data points belonging to class iii.

2. Calculate Gini for Each Split: Let's consider splitting the data based on Age and
Income. We start with Age and find the best threshold for splitting.
o Age ≤ 35:
 Group 1 (Age ≤ 35): {Person 1, Person 2, Person 3}
 Group 2 (Age > 35): {Person 4, Person 5, Person 6, Person 7}
 For Group 1:
 2 people don’t buy (No), 1 person buys (Yes).
 Gini Index for Group 1:

 For Group 2:
 1 person doesn’t buy (No), 3 people buy (Yes).
 Gini Index for Group 2:

 Weighted Gini for this split :

o Similarly, we calculate for other possible splits (like Income) and find the split
that minimizes the Gini Index.
3. Split the Data: Assume that splitting by Age ≤ 35 gives the best Gini index. We now
create two branches:
o For Age ≤ 35, we check further conditions like Income.
o For Age > 35, we check the next best feature.
4. Repeat Until Stopping Criteria: The process repeats for each subset until either:
o All data points in a node belong to a single class.
o The maximum depth of the tree is reached.
o There are no further significant splits.

Final Decision Tree:


[Age <= 35]
/ \
No (Most) [Income]
/ \
Low (Yes) High (Yes)

This is a simplified example where:

 If the Age ≤ 35, the prediction is No.


 If the Age > 35, we look at Income:
o If the income is Low, the prediction is Yes.
o If the income is High, the prediction is Yes.

Advantages:

 Easy to understand and interpret.


 Handles both categorical and numerical data.
 Can model non-linear relationships.
Disadvantages:

 Prone to overfitting, especially with deep trees.


 Unstable: A small change in data can significantly change the tree structure.

Question-02:Linear Regression with example?

ANS: Linear Regression is a supervised learning algorithm used for predicting a continuous
target variable based on one or more independent (input) variables. It assumes a linear
relationship between the input variables (features) and the output variable (target). The goal is to
find the line (in the case of one feature) or the hyperplane (in the case of multiple features) that
best fits the data.

Key Concepts:

You might also like