0% found this document useful (0 votes)
34 views7 pages

DT RF

The document discusses decision trees for classification. It explains that decision trees consist of nodes, edges, and leaf nodes that classify examples. It also describes two main types of decision trees - classification trees for categorical targets and regression trees for continuous targets. The document then shows an example of building a decision tree classifier on a credit risk dataset to predict loan status.

Uploaded by

Vicky Vicky
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views7 pages

DT RF

The document discusses decision trees for classification. It explains that decision trees consist of nodes, edges, and leaf nodes that classify examples. It also describes two main types of decision trees - classification trees for categorical targets and regression trees for continuous targets. The document then shows an example of building a decision tree classifier on a credit risk dataset to predict loan status.

Uploaded by

Vicky Vicky
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

12/24/21, 5:28 PM DT_RF.

ipynb - Colaboratory

DECISION TREE CLASSIFIER

A Decision Tree is a simple representation for classifying examples. It is a Supervised Machine


Learning where the data is continuously split according to a certain parameter.

1 #Decision Tree consists of :


2 #Nodes : Test for the value of a certain attribute.
3 #Edges/ Branch : Correspond to the outcome of a test and connect to the next node or
4 #Leaf nodes : Terminal nodes that predict the outcome (represent class labels or clas

A Decision Tree is a simple representation for classifying examples Decision variable is


Categorical. Target variable two or more compared to logistic regresstion you can have only 2
category............ Decision nodes are used to classify........... Also used for both classification and
regression problems.....

Based on the principal of entropy theroy? measure of randomness/ how much homogeneous
and heterogeneous data...? if homogeneous = Information Gain is 0 heterogenoeus Information
gain is maximum

Capture.JPG

1 # To understand the concept of Decision Tree consider the above example.


2 # Let’s say you want to predict whether a person is fit or unfit, given their informat
3 # eating habits, physical activity, etc.
4 # The decision nodes are the questions like ‘What’s the age?’, ‘Does he exercise?’,
5 # ‘Does he eat a lot of pizzas’? And the leaves represent outcomes like either ‘fit’,

There are two main types of Decision Trees: Classification Trees. Regression Trees.

1 # 1. Classification trees (Yes/No types) :


2 # What we’ve seen above is an example of classification tree,
3 # where the outcome was a variable like ‘fit’ or ‘unfit’.
4 # Here the decision variable is Categorical/ discrete.
5 # Such a tree is built through a process known as binary recursive partitioning.
6 # This is an iterative process of splitting the data into partitions
7 # and then splitting it up further on each of the branches.

Capture.JPG

1 # 2. Regression trees (Continuous data types) :


2 # Decision trees where the target variable can take continuous values (typically real

https://colab.research.google.com/drive/17egvNAuEBzG6AFyLgKTitvbn4VLj72KS#scrollTo=wVwwj1tgTTiO&printMode=true 1/7
12/24/21, 5:28 PM DT_RF.ipynb - Colaboratory

3 # (e.g. the price of a house, or a patient’s length of stay in a hospital)

Capture.JPG

1 import pandas as pd
2 cr = pd.read_csv(r"CreditRisk.csv")
3 cr.head()

Loan_ID Gender Married Dependents Education Self_Employed ApplicantIncome

0 LP001002 Male No 0.0 Graduate No 5849

1 LP001003 Male Yes 1.0 Graduate No 4583

2 LP001005 Male Yes 0.0 Graduate Yes 3000

Not
3 LP001006 Male Yes 0.0 No 2583
Graduate

4 LP001008 Male No 0.0 Graduate No 6000

1 cr.isnull().sum() # null values

Loan_ID 0
Gender 24
Married 3
Dependents 25
Education 0
Self_Employed 55
ApplicantIncome 0
CoapplicantIncome 0
LoanAmount 27
Loan_Amount_Term 20
Credit_History 79
Property_Area 0
Loan_Status 0
dtype: int64

1 cr.Gender = cr.Gender.fillna('Male')
2 cr.Self_Employed = cr.Self_Employed.fillna('Yes')
3 cr.Credit_History = cr.Credit_History.fillna(0)
4 cr.Dependents = cr.Dependents.fillna(0)
5 cr.LoanAmount = cr.LoanAmount.fillna(cr.LoanAmount.mean())
6 cr.Loan_Amount_Term = cr.Loan_Amount_Term.fillna(cr.Loan_Amount_Term.mean())
7 cr.Married = cr.Married.fillna("Yes")

1 cr.isnull().sum()

Loan_ID 0
Gender 0
Married 0
https://colab.research.google.com/drive/17egvNAuEBzG6AFyLgKTitvbn4VLj72KS#scrollTo=wVwwj1tgTTiO&printMode=true 2/7
12/24/21, 5:28 PM DT_RF.ipynb - Colaboratory

Dependents 0
Education 0
Self_Employed 0
ApplicantIncome 0
CoapplicantIncome 0
LoanAmount 0
Loan_Amount_Term 0
Credit_History 0
Property_Area 0
Loan_Status 0
dtype: int64

1 cr.Gender.replace({"Male" :1 , "Female":0} ,inplace = True)


2 cr.Married.replace({"No" :0 , "Yes":1} , inplace = True)
3 cr.Education.replace({"Graduate" :1 , "Not Graduate":0} , inplace = True)
4 cr.Self_Employed.replace({"No":0 , "Yes" :1 }, inplace = True)
5 cr.Property_Area.replace({"Semiurban" :1 ,"Urban": 2 , "Rural" :3} , inplace = True)
6 cr.Loan_Status.replace({"Y" :1 , "N" : 0}, inplace = True)
7 #cr.Married.replace({"No" :0 , "Yes" : 1} , inplace = True )

1 cr_x = cr.iloc[: ,1:12]


2 cr_y = cr.iloc[: , -1]
3 import sklearn
4 from sklearn.model_selection import train_test_split
5 cr_x_train , cr_x_test ,cr_y_train , cr_y_test = train_test_split(cr_x , cr_y , test_

1 import sklearn

1 from sklearn.tree import DecisionTreeClassifier


2 dtree = DecisionTreeClassifier( )
3 dtree.fit(cr_x_train, cr_y_train)

DecisionTreeClassifier()

1 pred_dt =dtree.predict(cr_x_test)

1 from sklearn.metrics import confusion_matrix


2 tab1 = confusion_matrix(pred_dt , cr_y_test)
3 tab1

array([[ 29, 29],


[ 28, 111]])

1 tab1.diagonal().sum() / tab1.sum() * 100

71.06598984771574

1 cr_x_train.head()

https://colab.research.google.com/drive/17egvNAuEBzG6AFyLgKTitvbn4VLj72KS#scrollTo=wVwwj1tgTTiO&printMode=true 3/7
12/24/21, 5:28 PM DT_RF.ipynb - Colaboratory

Gender Married Dependents Education Self_Employed ApplicantIncome Coapplic

813 1 1 0.0 1 1 1900

33 1 1 0.0 1 0 3500

161 1 1 0.0 1 0 7933

567 1 1 4.0 1 0 3400

475 1 1 2.0 1 1 16525

1 dtree.feature_importances_

array([0.02379164, 0.03859053, 0.02540569, 0.02247314, 0. ,


0.28371806, 0.09496931, 0.17680406, 0.03918504, 0.26114729,
0.03391524])

1 feature_score = pd.DataFrame({"Importance" : dtree.feature_importances_ , " Variable_N


2 : cr_x_train.columns})

1 feature_score
2

Importance Variable_Name

0 0.023792 Gender

1 0.038591 Married

2 0.025406 Dependents

3 0.022473 Education

4 0.000000 Self_Employed

5 0.283718 ApplicantIncome

6 0.094969 CoapplicantIncome

7 0.176804 LoanAmount

8 0.039185 Loan_Amount_Term

9 0.261147 Credit_History

10 0.033915 Property_Area

1 feature_score.sort_values(['Importance'] , ascending = False )


2
3

https://colab.research.google.com/drive/17egvNAuEBzG6AFyLgKTitvbn4VLj72KS#scrollTo=wVwwj1tgTTiO&printMode=true 4/7
12/24/21, 5:28 PM DT_RF.ipynb - Colaboratory

Importance Variable_Name

5 0.283718 ApplicantIncome

9 0.261147 Credit_History

7 0.176804 LoanAmount

6 0.094969 CoapplicantIncome

8 0.039185 Loan_Amount_Term

1 0.038591 Married

10 0.033915 Property_Area

2 0.025406 Dependents

0 0.023792 Gender

3 0.022473 Education

Random
4 Forest Model
0.000000 Self_Employed

1 # Random Forest
2 # It uses number of decision tree
3 # Ensemble technique( N number of samples and on each sample a DT is created)
4 # Each tree does a prediction and at the end Votes are taken
5 #--------------------------#
6 # for example if you have 1000 records and you are going to build 100 trees
7 # your 100 samples are created randomly
8 # few samples may have 50 records and 3 cols
9 # other samples may have different cobination of records
10 # finally each tree take the decision individually, final decision is taken by the vot
11 # records can be duplicated also - ramdomly
12 # maimum vote is for class 1 or class 0

Capture1.JPG

Capture.JPG

1 from sklearn.ensemble import RandomForestClassifier

1 rfc = RandomForestClassifier(n_estimators = 100)


2 # 100 number of trees are built/ called as hyper parameter
3 # if you keep on increase trees after somepoint it will become stable / no difference
4 # more number of trees might leads to overfitted problem also

1 rfc.fit(cr_x_train, cr_y_train)

RandomForestClassifier()

https://colab.research.google.com/drive/17egvNAuEBzG6AFyLgKTitvbn4VLj72KS#scrollTo=wVwwj1tgTTiO&printMode=true 5/7
12/24/21, 5:28 PM DT_RF.ipynb - Colaboratory

1 pred_rf=rfc.predict(cr_x_test )
2 pred_rf

array([1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1,
1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1,
0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0,
1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1,
1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0,
1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1,
1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1])

1 from sklearn.metrics import confusion_matrix


2 tab1 = confusion_matrix(pred_rf , cr_y_test)
3 tab1

array([[ 30, 19],


[ 27, 121]])

1 tab1.diagonal().sum() / tab1.sum() * 100


2

76.6497461928934

1 rfc.feature_importances_ # check the feature importance in RF also

array([0.02153618, 0.02438256, 0.04693238, 0.02281451, 0.02088224,


0.21140693, 0.12155668, 0.19767519, 0.05311922, 0.23494182,
0.04475229])

1 from sklearn.metrics import accuracy_score


2 accuracy_score(cr_y_test, pred_rf)

0.766497461928934

https://colab.research.google.com/drive/17egvNAuEBzG6AFyLgKTitvbn4VLj72KS#scrollTo=wVwwj1tgTTiO&printMode=true 6/7
12/24/21, 5:28 PM DT_RF.ipynb - Colaboratory

check 0s completed at 5:28 PM

https://colab.research.google.com/drive/17egvNAuEBzG6AFyLgKTitvbn4VLj72KS#scrollTo=wVwwj1tgTTiO&printMode=true 7/7

You might also like