0% found this document useful (0 votes)

34 views7 pages

DT RF

The document discusses decision trees for classification. It explains that decision trees consist of nodes, edges, and leaf nodes that classify examples. It also describes two main types of decision trees - classification trees for categorical targets and regression trees for continuous targets. The document then shows an example of building a decision tree classifier on a credit risk dataset to predict loan status.

Uploaded by

Vicky Vicky

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views7 pages

DT RF

Uploaded by

Vicky Vicky

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

12/24/21, 5:28 PM DT_RF.

ipynb - Colaboratory

DECISION TREE CLASSIFIER

A Decision Tree is a simple representation for classifying examples. It is a Supervised Machine

Learning where the data is continuously split according to a certain parameter.

1 #Decision Tree consists of :

2 #Nodes : Test for the value of a certain attribute.
3 #Edges/ Branch : Correspond to the outcome of a test and connect to the next node or
4 #Leaf nodes : Terminal nodes that predict the outcome (represent class labels or clas

A Decision Tree is a simple representation for classifying examples Decision variable is

Categorical. Target variable two or more compared to logistic regresstion you can have only 2
category............ Decision nodes are used to classify........... Also used for both classification and
regression problems.....

Based on the principal of entropy theroy? measure of randomness/ how much homogeneous
and heterogeneous data...? if homogeneous = Information Gain is 0 heterogenoeus Information
gain is maximum

Capture.JPG

1 # To understand the concept of Decision Tree consider the above example.

2 # Let’s say you want to predict whether a person is fit or unfit, given their informat
3 # eating habits, physical activity, etc.
4 # The decision nodes are the questions like ‘What’s the age?’, ‘Does he exercise?’,
5 # ‘Does he eat a lot of pizzas’? And the leaves represent outcomes like either ‘fit’,

There are two main types of Decision Trees: Classification Trees. Regression Trees.

1 # 1. Classification trees (Yes/No types) :

2 # What we’ve seen above is an example of classification tree,
3 # where the outcome was a variable like ‘fit’ or ‘unfit’.
4 # Here the decision variable is Categorical/ discrete.
5 # Such a tree is built through a process known as binary recursive partitioning.
6 # This is an iterative process of splitting the data into partitions
7 # and then splitting it up further on each of the branches.

Capture.JPG

1 # 2. Regression trees (Continuous data types) :

2 # Decision trees where the target variable can take continuous values (typically real

https://colab.research.google.com/drive/17egvNAuEBzG6AFyLgKTitvbn4VLj72KS#scrollTo=wVwwj1tgTTiO&printMode=true 1/7
12/24/21, 5:28 PM DT_RF.ipynb - Colaboratory

3 # (e.g. the price of a house, or a patient’s length of stay in a hospital)

Capture.JPG

1 import pandas as pd
2 cr = pd.read_csv(r"CreditRisk.csv")
3 cr.head()

Loan_ID Gender Married Dependents Education Self_Employed ApplicantIncome

0 LP001002 Male No 0.0 Graduate No 5849

1 LP001003 Male Yes 1.0 Graduate No 4583

2 LP001005 Male Yes 0.0 Graduate Yes 3000

Not
3 LP001006 Male Yes 0.0 No 2583
Graduate

4 LP001008 Male No 0.0 Graduate No 6000

1 cr.isnull().sum() # null values

Loan_ID 0
Gender 24
Married 3
Dependents 25
Education 0
Self_Employed 55
ApplicantIncome 0
CoapplicantIncome 0
LoanAmount 27
Loan_Amount_Term 20
Credit_History 79
Property_Area 0
Loan_Status 0
dtype: int64

1 cr.Gender = cr.Gender.fillna('Male')
2 cr.Self_Employed = cr.Self_Employed.fillna('Yes')
3 cr.Credit_History = cr.Credit_History.fillna(0)
4 cr.Dependents = cr.Dependents.fillna(0)
5 cr.LoanAmount = cr.LoanAmount.fillna(cr.LoanAmount.mean())
6 cr.Loan_Amount_Term = cr.Loan_Amount_Term.fillna(cr.Loan_Amount_Term.mean())
7 cr.Married = cr.Married.fillna("Yes")

1 cr.isnull().sum()

Loan_ID 0
Gender 0
Married 0
https://colab.research.google.com/drive/17egvNAuEBzG6AFyLgKTitvbn4VLj72KS#scrollTo=wVwwj1tgTTiO&printMode=true 2/7
12/24/21, 5:28 PM DT_RF.ipynb - Colaboratory

Dependents 0
Education 0
Self_Employed 0
ApplicantIncome 0
CoapplicantIncome 0
LoanAmount 0
Loan_Amount_Term 0
Credit_History 0
Property_Area 0
Loan_Status 0
dtype: int64

1 cr.Gender.replace({"Male" :1 , "Female":0} ,inplace = True)

2 cr.Married.replace({"No" :0 , "Yes":1} , inplace = True)
3 cr.Education.replace({"Graduate" :1 , "Not Graduate":0} , inplace = True)
4 cr.Self_Employed.replace({"No":0 , "Yes" :1 }, inplace = True)
5 cr.Property_Area.replace({"Semiurban" :1 ,"Urban": 2 , "Rural" :3} , inplace = True)
6 cr.Loan_Status.replace({"Y" :1 , "N" : 0}, inplace = True)
7 #cr.Married.replace({"No" :0 , "Yes" : 1} , inplace = True )

1 cr_x = cr.iloc[: ,1:12]

2 cr_y = cr.iloc[: , -1]
3 import sklearn
4 from sklearn.model_selection import train_test_split
5 cr_x_train , cr_x_test ,cr_y_train , cr_y_test = train_test_split(cr_x , cr_y , test_

1 import sklearn

1 from sklearn.tree import DecisionTreeClassifier

2 dtree = DecisionTreeClassifier( )
3 dtree.fit(cr_x_train, cr_y_train)

DecisionTreeClassifier()

1 pred_dt =dtree.predict(cr_x_test)

1 from sklearn.metrics import confusion_matrix

2 tab1 = confusion_matrix(pred_dt , cr_y_test)
3 tab1

array([[ 29, 29],

[ 28, 111]])

1 tab1.diagonal().sum() / tab1.sum() * 100

71.06598984771574

1 cr_x_train.head()

https://colab.research.google.com/drive/17egvNAuEBzG6AFyLgKTitvbn4VLj72KS#scrollTo=wVwwj1tgTTiO&printMode=true 3/7
12/24/21, 5:28 PM DT_RF.ipynb - Colaboratory

Gender Married Dependents Education Self_Employed ApplicantIncome Coapplic

813 1 1 0.0 1 1 1900

33 1 1 0.0 1 0 3500

161 1 1 0.0 1 0 7933

567 1 1 4.0 1 0 3400

475 1 1 2.0 1 1 16525

1 dtree.feature_importances_

array([0.02379164, 0.03859053, 0.02540569, 0.02247314, 0. ,

0.28371806, 0.09496931, 0.17680406, 0.03918504, 0.26114729,
0.03391524])

1 feature_score = pd.DataFrame({"Importance" : dtree.feature_importances_ , " Variable_N

2 : cr_x_train.columns})

1 feature_score
2

Importance Variable_Name

0 0.023792 Gender

1 0.038591 Married

2 0.025406 Dependents

3 0.022473 Education

4 0.000000 Self_Employed

5 0.283718 ApplicantIncome

6 0.094969 CoapplicantIncome

7 0.176804 LoanAmount

8 0.039185 Loan_Amount_Term

9 0.261147 Credit_History

10 0.033915 Property_Area

1 feature_score.sort_values(['Importance'] , ascending = False )

2
3

https://colab.research.google.com/drive/17egvNAuEBzG6AFyLgKTitvbn4VLj72KS#scrollTo=wVwwj1tgTTiO&printMode=true 4/7
12/24/21, 5:28 PM DT_RF.ipynb - Colaboratory

Importance Variable_Name

5 0.283718 ApplicantIncome

9 0.261147 Credit_History

7 0.176804 LoanAmount

6 0.094969 CoapplicantIncome

8 0.039185 Loan_Amount_Term

1 0.038591 Married

10 0.033915 Property_Area

2 0.025406 Dependents

0 0.023792 Gender

3 0.022473 Education

Random
4 Forest Model
0.000000 Self_Employed

1 # Random Forest
2 # It uses number of decision tree
3 # Ensemble technique( N number of samples and on each sample a DT is created)
4 # Each tree does a prediction and at the end Votes are taken
5 #--------------------------#
6 # for example if you have 1000 records and you are going to build 100 trees
7 # your 100 samples are created randomly
8 # few samples may have 50 records and 3 cols
9 # other samples may have different cobination of records
10 # finally each tree take the decision individually, final decision is taken by the vot
11 # records can be duplicated also - ramdomly
12 # maimum vote is for class 1 or class 0

Capture1.JPG

Capture.JPG

1 from sklearn.ensemble import RandomForestClassifier

1 rfc = RandomForestClassifier(n_estimators = 100)

2 # 100 number of trees are built/ called as hyper parameter
3 # if you keep on increase trees after somepoint it will become stable / no difference
4 # more number of trees might leads to overfitted problem also

1 rfc.fit(cr_x_train, cr_y_train)

RandomForestClassifier()

https://colab.research.google.com/drive/17egvNAuEBzG6AFyLgKTitvbn4VLj72KS#scrollTo=wVwwj1tgTTiO&printMode=true 5/7
12/24/21, 5:28 PM DT_RF.ipynb - Colaboratory

1 pred_rf=rfc.predict(cr_x_test )
2 pred_rf

array([1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1,
1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1,
0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0,
1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1,
1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0,
1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1,
1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1])

1 from sklearn.metrics import confusion_matrix

2 tab1 = confusion_matrix(pred_rf , cr_y_test)
3 tab1

array([[ 30, 19],

[ 27, 121]])

1 tab1.diagonal().sum() / tab1.sum() * 100

76.6497461928934

1 rfc.feature_importances_ # check the feature importance in RF also

array([0.02153618, 0.02438256, 0.04693238, 0.02281451, 0.02088224,

0.21140693, 0.12155668, 0.19767519, 0.05311922, 0.23494182,
0.04475229])

1 from sklearn.metrics import accuracy_score

2 accuracy_score(cr_y_test, pred_rf)

0.766497461928934

https://colab.research.google.com/drive/17egvNAuEBzG6AFyLgKTitvbn4VLj72KS#scrollTo=wVwwj1tgTTiO&printMode=true 6/7
12/24/21, 5:28 PM DT_RF.ipynb - Colaboratory

check 0s completed at 5:28 PM

https://colab.research.google.com/drive/17egvNAuEBzG6AFyLgKTitvbn4VLj72KS#scrollTo=wVwwj1tgTTiO&printMode=true 7/7

Loan Prediction
No ratings yet
Loan Prediction
33 pages
Solution
No ratings yet
Solution
41 pages
22K61A0654 2 Sasi Auto
No ratings yet
22K61A0654 2 Sasi Auto
24 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
13 pages
Udacity Machine Learning Analysis Supervised Learning
100% (1)
Udacity Machine Learning Analysis Supervised Learning
504 pages
Feature Engineering - 01
No ratings yet
Feature Engineering - 01
31 pages
Name: Suprit Darshan Shrestha Reg - no:19BCE2584: Lab DA1 Machine Learning Lab
No ratings yet
Name: Suprit Darshan Shrestha Reg - no:19BCE2584: Lab DA1 Machine Learning Lab
9 pages
Practice Test
No ratings yet
Practice Test
12 pages
MLA Lab 6:-Implementation of Decision Tree
No ratings yet
MLA Lab 6:-Implementation of Decision Tree
16 pages
MLT 1 - 7 Kanish
No ratings yet
MLT 1 - 7 Kanish
24 pages
CART+ +Loan+Delinquent+ +Student+File+0.1 - New - Ipynb Colaboratory
No ratings yet
CART+ +Loan+Delinquent+ +Student+File+0.1 - New - Ipynb Colaboratory
5 pages
Prathamesh KRAI
No ratings yet
Prathamesh KRAI
38 pages
5) Randomforest - Ipynb - Colaboratory
No ratings yet
5) Randomforest - Ipynb - Colaboratory
12 pages
DWDM Lab 3
No ratings yet
DWDM Lab 3
10 pages
Pca2 1
No ratings yet
Pca2 1
26 pages
Random Forest: The Algorithm in A Nutshell
No ratings yet
Random Forest: The Algorithm in A Nutshell
10 pages
Perform Prediction Using Regression Algorithm: Ex No: 1 Date
No ratings yet
Perform Prediction Using Regression Algorithm: Ex No: 1 Date
13 pages
05 E RandomForest LoanData
No ratings yet
05 E RandomForest LoanData
8 pages
Aiml 5-8
No ratings yet
Aiml 5-8
19 pages
#Group: B (ML) : Numpy NP Pandas PD
No ratings yet
#Group: B (ML) : Numpy NP Pandas PD
9 pages
St. John College of Engineering and Management, Palghar - Maharashtra
No ratings yet
St. John College of Engineering and Management, Palghar - Maharashtra
11 pages
Hands-On Activity 3.3 Random Forest Mantaring - Ipynb - Mantaring
No ratings yet
Hands-On Activity 3.3 Random Forest Mantaring - Ipynb - Mantaring
13 pages
Decision Trees
No ratings yet
Decision Trees
11 pages
Data Mining Assignment No. 1
No ratings yet
Data Mining Assignment No. 1
7 pages
MlLabManualdocx 2024 09 04 22 02 58
No ratings yet
MlLabManualdocx 2024 09 04 22 02 58
19 pages
Chapter 5 - Classification Problems
100% (1)
Chapter 5 - Classification Problems
25 pages
Homework 3
No ratings yet
Homework 3
10 pages
Titanic Akshaya
No ratings yet
Titanic Akshaya
12 pages
Pract5 1
No ratings yet
Pract5 1
3 pages
Imbalanced Dataset Customer Churn
No ratings yet
Imbalanced Dataset Customer Churn
9 pages
Practical No4 - 5 ML
No ratings yet
Practical No4 - 5 ML
11 pages
LOan Final
No ratings yet
LOan Final
6 pages
ML Lab Manual
No ratings yet
ML Lab Manual
25 pages
Classification Problems
No ratings yet
Classification Problems
53 pages
Logistic Regression
No ratings yet
Logistic Regression
4 pages
Import Numpy As NP Import Pandas As PD
No ratings yet
Import Numpy As NP Import Pandas As PD
7 pages
Unit Iii Machine Learning
No ratings yet
Unit Iii Machine Learning
19 pages
Machine Learning
No ratings yet
Machine Learning
16 pages
Ass3 v1
No ratings yet
Ass3 v1
4 pages
2021BCS0103
No ratings yet
2021BCS0103
7 pages
Lab 3
No ratings yet
Lab 3
6 pages
Prakhar - Week 5
No ratings yet
Prakhar - Week 5
8 pages
TD2345
No ratings yet
TD2345
3 pages
Experiment 2 FDL - Jupyter Notebook
No ratings yet
Experiment 2 FDL - Jupyter Notebook
2 pages
Najir Shaikh Practical 4
No ratings yet
Najir Shaikh Practical 4
4 pages
Decision Tree - Jupyter Notebook
No ratings yet
Decision Tree - Jupyter Notebook
4 pages
AIH Lab2
No ratings yet
AIH Lab2
10 pages
Shobit Sharma (2124399) ML Lab File PDF
No ratings yet
Shobit Sharma (2124399) ML Lab File PDF
19 pages
Soft Computing Lab Practical Assignment 2
No ratings yet
Soft Computing Lab Practical Assignment 2
10 pages
Da Lab3 221it064
No ratings yet
Da Lab3 221it064
6 pages
Da Lab3 221it084 Final
No ratings yet
Da Lab3 221it084 Final
6 pages
Random Forest
No ratings yet
Random Forest
8 pages
Da Lab Mannual
No ratings yet
Da Lab Mannual
25 pages
Hci Lab2 1
No ratings yet
Hci Lab2 1
4 pages
Expt7 ML2025 250306 143857
No ratings yet
Expt7 ML2025 250306 143857
5 pages
Practical 15 Python
No ratings yet
Practical 15 Python
6 pages
S6 - Data Mining Lab Experiments (Except 1)
No ratings yet
S6 - Data Mining Lab Experiments (Except 1)
6 pages
Know Your Dataset: Season Holiday Weekday Workingday CNT 726 727 728 729 730
No ratings yet
Know Your Dataset: Season Holiday Weekday Workingday CNT 726 727 728 729 730
1 page
Ashwin Report
No ratings yet
Ashwin Report
18 pages
Nithin Resume Backup Admin
No ratings yet
Nithin Resume Backup Admin
3 pages
Uipath Guide
No ratings yet
Uipath Guide
6 pages
WinForms Reference
No ratings yet
WinForms Reference
4 pages
Microsoft AI - 900
No ratings yet
Microsoft AI - 900
2 pages

DT RF

Uploaded by

DT RF

Uploaded by

12/24/21, 5:28 PM DT_RF.

DECISION TREE CLASSIFIER

A Decision Tree is a simple representation for classifying examples. It is a Supervised Machine

1 #Decision Tree consists of :

A Decision Tree is a simple representation for classifying examples Decision variable is

1 # To understand the concept of Decision Tree consider the above example.

1 # 1. Classification trees (Yes/No types) :

1 # 2. Regression trees (Continuous data types) :

3 # (e.g. the price of a house, or a patient’s length of stay in a hospital)

Loan_ID Gender Married Dependents Education Self_Employed ApplicantIncome

0 LP001002 Male No 0.0 Graduate No 5849

1 LP001003 Male Yes 1.0 Graduate No 4583

2 LP001005 Male Yes 0.0 Graduate Yes 3000

4 LP001008 Male No 0.0 Graduate No 6000

1 cr.isnull().sum() # null values

1 cr.Gender.replace({"Male" :1 , "Female":0} ,inplace = True)

1 cr_x = cr.iloc[: ,1:12]

1 from sklearn.tree import DecisionTreeClassifier

1 from sklearn.metrics import confusion_matrix

array([[ 29, 29],

1 tab1.diagonal().sum() / tab1.sum() * 100

Gender Married Dependents Education Self_Employed ApplicantIncome Coapplic

813 1 1 0.0 1 1 1900

161 1 1 0.0 1 0 7933

567 1 1 4.0 1 0 3400

475 1 1 2.0 1 1 16525

array([0.02379164, 0.03859053, 0.02540569, 0.02247314, 0. ,

1 feature_score = pd.DataFrame({"Importance" : dtree.feature_importances_ , " Variable_N

1 feature_score.sort_values(['Importance'] , ascending = False )

1 from sklearn.ensemble import RandomForestClassifier

1 rfc = RandomForestClassifier(n_estimators = 100)

1 from sklearn.metrics import confusion_matrix

array([[ 30, 19],

1 tab1.diagonal().sum() / tab1.sum() * 100

1 rfc.feature_importances_ # check the feature importance in RF also

array([0.02153618, 0.02438256, 0.04693238, 0.02281451, 0.02088224,

1 from sklearn.metrics import accuracy_score

check 0s completed at 5:28 PM

You might also like