0% found this document useful (0 votes)

16 views11 pages

23MCA1030 - Ensemble - Classifiers - .Ipynb - Colaboratory

This document discusses using ensemble classifiers like random forest and gradient boosting to predict startup funding amounts using an Indian startup funding dataset. It performs preprocessing steps like encoding categorical variables, splitting the data into train and test sets, and generating input dataframes for the classifiers.

Uploaded by

vinayaksingh.fake

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views11 pages

23MCA1030 - Ensemble - Classifiers - .Ipynb - Colaboratory

Uploaded by

vinayaksingh.fake

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

3/29/24, 1:06 PM 23MCA1030_Ensemble_classifiers*.

ipynb - Colaboratory

Machine Learning Lab (PMCA507P)

Reg No: 23MCA1030

Name : Vinayak Kumar Singh

Exercise 9: Ensemble classifiers

Collab url : https://colab.research.google.com/drive/1pstRfqqvQFQfpBSjQ3MLnVt3xihmJ_pP?usp=sharing

Dataset url : https://www.kaggle.com/datasets/sudalairajkumar/indian-startup-funding/code?datasetId=1902&searchQuery=ens

Indian Startup Funding Dataset

keyboard_arrow_down Import necessary libraries

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
from pylab import rcParams
rcParams['figure.figsize'] = 25, 10
from datetime import datetime
from pandas.plotting import scatter_matrix
from sklearn.preprocessing import normalize
!pip install scikit-learn
from sklearn.model_selection import GridSearchCV

Requirement already satisfied: scikit-learn in /usr/local/lib/python3.10/dist-packages (1.2.2)

Requirement already satisfied: numpy>=1.17.3 in /usr/local/lib/python3.10/dist-packages (from scikit-learn) (1.25.2)
Requirement already satisfied: scipy>=1.3.2 in /usr/local/lib/python3.10/dist-packages (from scikit-learn) (1.11.4)
Requirement already satisfied: joblib>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from scikit-learn) (1.3.2)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn) (3.4.0)
https://colab.research.google.com/drive/1pstRfqqvQFQfpBSjQ3MLnVt3xihmJ_pP?authuser=2#scrollTo=59S_UT-sIMgM&printMode=true 1/11
3/29/24, 1:06 PM 23MCA1030_Ensemble_classifiers*.ipynb - Colaboratory

df = pd.read_csv("/content/startup_funding.csv")
df = df.drop(['Sr No','Remarks','SubVertical'],axis = 1)
df = df.dropna()
df = df.reset_index(drop=True)

count = df['Industry Vertical'].value_counts()

count.head(10)

Consumer Internet 582

Technology 309
eCommerce 126
Finance 53
Healthcare 43
ECommerce 37
E-Commerce 24
Logistics 23
Education 18
Food & Beverage 15
Name: Industry Vertical, dtype: int64

count = df['InvestmentnType'].value_counts()
plt.figure(figsize=(10,4))
sns.barplot(x = count.index, y = count.values, alpha=0.8)
plt.xticks(rotation='vertical')
plt.xlabel('Investment Type', fontsize=12)
plt.ylabel('Number of fundings made', fontsize=12)
plt.title("Type of Investment made", fontsize=16)
plt.show()

https://colab.research.google.com/drive/1pstRfqqvQFQfpBSjQ3MLnVt3xihmJ_pP?authuser=2#scrollTo=59S_UT-sIMgM&printMode=true 2/11
3/29/24, 1:06 PM 23MCA1030_Ensemble_classifiers*.ipynb - Colaboratory

https://colab.research.google.com/drive/1pstRfqqvQFQfpBSjQ3MLnVt3xihmJ_pP?authuser=2#scrollTo=59S_UT-sIMgM&printMode=true 3/11
3/29/24, 1:06 PM 23MCA1030_Ensemble_classifiers*.ipynb - Colaboratory
count = df['City Location'].value_counts()
plt.figure(figsize=(25,10))
sns.barplot(x = count.index, y = count.values, alpha=0.8)
plt.xticks(rotation='vertical')
plt.xlabel('Investment Location', fontsize=25)
plt.ylabel('Number of fundings made', fontsize=25)
plt.title("Type of Investment made", fontsize=30)
plt.show()

https://colab.research.google.com/drive/1pstRfqqvQFQfpBSjQ3MLnVt3xihmJ_pP?authuser=2#scrollTo=59S_UT-sIMgM&printMode=true 4/11
3/29/24, 1:06 PM 23MCA1030_Ensemble_classifiers*.ipynb - Colaboratory

keyboard_arrow_down Random Forest

from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

df.head()

Date Industry City Amount in

Startup Name Investors Name InvestmentnType
dd/mm/yyyy Vertical Location USD

Tiger Global Private Equity

0 09/01/2020 BYJU’S E-Tech Bengaluru 200000000.0
Management Round

Susquehanna
1 13/01/2020 Shuttl Transportation Gurgaon Series C 8048394.0
Growth Equity

2 09/01/2020 Mamaearth E-commerce Bengaluru Sequoia Capital India Series B 18358860.0

3 02/01/2020 https://www.wealthbucket.in/ FinTech New Delhi Vinod Khatumal Pre-series A 3000000.0

Next steps: Generate code with df
toggle_off View recommended plots

df = df[~df['Amount in USD'].isnull()]

train,test= train_test_split(df,test_size=0.2,random_state =10)

train_x = train.drop(['Amount in USD'],axis = 1)
train_y = train['Amount in USD']
test_x = test.drop(['Amount in USD'],axis = 1)
test_y = test['Amount in USD']
print(train_y)

https://colab.research.google.com/drive/1pstRfqqvQFQfpBSjQ3MLnVt3xihmJ_pP?authuser=2#scrollTo=59S_UT-sIMgM&printMode=true 5/11
3/29/24, 1:06 PM 23MCA1030_Ensemble_classifiers*.ipynb - Colaboratory

3 3000000.0
18 1500000.0
13 2000000.0
2 18358860.0
14 50000000.0
8 70000000.0
17 486000.0
16 150000000.0
237 4200000.0
12 30000000.0
11 12000000.0
1 8048394.0
0 200000000.0
15 231000000.0
4 1800000.0
9 50000000.0
Name: Amount in USD, dtype: float64

from sklearn.preprocessing import LabelEncoder

le1, le2, le3, le4, le5, le6 = LabelEncoder(), LabelEncoder(), LabelEncoder(), LabelEncoder(), LabelEncoder(), LabelEncoder()
le1.fit(df['InvestmentnType'])
le2.fit(df['Investors Name'])
le3.fit(df['Industry Vertical'].values)
le4.fit(df['Startup Name'].values)
le5.fit(df['City Location'].values)
le6.fit(df['Date dd/mm/yyyy'].values)

▾ LabelEncoder
LabelEncoder()

https://colab.research.google.com/drive/1pstRfqqvQFQfpBSjQ3MLnVt3xihmJ_pP?authuser=2#scrollTo=59S_UT-sIMgM&printMode=true 6/11
3/29/24, 1:06 PM 23MCA1030_Ensemble_classifiers*.ipynb - Colaboratory
train_df = pd.DataFrame(
{
'InvestmentType': le1.transform(train_x['InvestmentnType']),
'InvestorsName': le2.transform(train_x['Investors Name']),
'IndustryVertical': le3.transform(train_x['Industry Vertical']),
'StartupName': le4.transform(train_x['Startup Name']),
'CityLocation': le5.transform(train_x['City Location']),
'month': le6.transform(train_x['Date dd/mm/yyyy'])
})
test_df = pd.DataFrame(
{
'InvestmentType': le1.transform(test_x['InvestmentnType']),
'InvestorsName': le2.transform(test_x['Investors Name']),
'IndustryVertical': le3.transform(test_x['Industry Vertical']),
'StartupName': le4.transform(test_x['Startup Name']),
'CityLocation': le5.transform(test_x['City Location']),
'month': le6.transform(test_x['Date dd/mm/yyyy'])
})

test_df.head()

InvestmentType InvestorsName IndustryVertical StartupName CityLocation month

0 5 10 14 3 7 6

1 10 9 14 13 3 9

2 5 1 12 11 1 7

3 2 0 11 17 3 4

Next steps: Generate code with test_df

toggle_off View recommended plots

clf = GradientBoostingRegressor(learning_rate =0.1,max_depth = 11,min_samples_split =100,min_samples_leaf =20,n_estimators =40,

max_features =3,random_state =43)
clf.fit(train_df,train_y)

https://colab.research.google.com/drive/1pstRfqqvQFQfpBSjQ3MLnVt3xihmJ_pP?authuser=2#scrollTo=59S_UT-sIMgM&printMode=true 7/11
3/29/24, 1:06 PM 23MCA1030_Ensemble_classifiers*.ipynb - Colaboratory

▾ GradientBoostingRegressor
GradientBoostingRegressor(max_depth=11, max_features=3, min_samples_leaf=20,
min_samples_split=100, n_estimators=40,
random_state=43)

from sklearn.ensemble import RandomForestClassifier

rf_clf = RandomForestClassifier()

rf_clf.fit(train_df, train_y)

▾ RandomForestClassifier
RandomForestClassifier()

evaluate(rf_clf, train_df, train_y, test_df, test_y)

Train Results:
Mean Squared Error: 0.0
R2 Score: 1.0
Mean Absolute Error: 0.0

Test Results:
Mean Squared Error: 2960370221494809.0
R2 Score: 0.18112954819222815
Mean Absolute Error: 39287901.5
Train Results:
Mean Squared Error: 0.0
R2 Score: 1.0
Mean Absolute Error: 0.0

Test Results:
Mean Squared Error: 2960370221494809.0
R2 Score: 0.18112954819222815
Mean Absolute Error: 39287901.5

https://colab.research.google.com/drive/1pstRfqqvQFQfpBSjQ3MLnVt3xihmJ_pP?authuser=2#scrollTo=59S_UT-sIMgM&printMode=true 8/11
3/29/24, 1:06 PM 23MCA1030_Ensemble_classifiers*.ipynb - Colaboratory

keyboard_arrow_down Voting Regressor ensemble

from sklearn.ensemble import VotingRegressor
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor

lr = LinearRegression()
rf = RandomForestRegressor()
gb = GradientBoostingRegressor()

voting_reg = VotingRegressor(estimators=[('lr', lr), ('rf', rf), ('gb', gb)])

voting_reg.fit(train_df, train_y)

▸ VotingRegressor
lr rf gb
▾ LinearRegression ▾ RandomForestRegressor ▾ GradientBoostingRegressor
LinearRegression() RandomForestRegressor() GradientBoostingRegressor()

evaluate(voting_reg, train_df, train_y, test_df, test_y)

Train Results:
Mean Squared Error: 620569682990373.2
R2 Score: 0.8817093389093895
Mean Absolute Error: 17939311.084098116

Test Results:
Mean Squared Error: 6814934148937292.0
R2 Score: -0.885084563093143
Mean Absolute Error: 57982655.935594216

https://colab.research.google.com/drive/1pstRfqqvQFQfpBSjQ3MLnVt3xihmJ_pP?authuser=2#scrollTo=59S_UT-sIMgM&printMode=true 9/11
3/29/24, 1:06 PM 23MCA1030_Ensemble_classifiers*.ipynb - Colaboratory

keyboard_arrow_down Model Comparison

from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.linear_model import LinearRegression

rf_reg = RandomForestRegressor()
gb_reg = GradientBoostingRegressor(learning_rate=0.1, max_depth=11, min_samples_split=100, max_features=3, random_state=43)
lr_reg = LinearRegression()

keyboard_arrow_down Instance of the Voting Regressor:

from sklearn.ensemble import VotingRegressor

voting_reg = VotingRegressor(estimators=[('rf', rf_reg), ('gb', gb_reg), ('lr', lr_reg)])

rf_reg.fit(train_df, train_y)
gb_reg.fit(train_df, train_y)
lr_reg.fit(train_df, train_y)
voting_reg.fit(train_df, train_y)

▸ VotingRegressor
rf gb lr
▸ RandomForestRegressor ▸ GradientBoostingRegressor ▸ LinearRegression

Evaluate the performance of each model on the test data using the evaluate function

https://colab.research.google.com/drive/1pstRfqqvQFQfpBSjQ3MLnVt3xihmJ_pP?authuser=2#scrollTo=59S_UT-sIMgM&printMode=true 10/11
3/29/24, 1:06 PM 23MCA1030_Ensemble_classifiers*.ipynb - Colaboratory
print("Random Forest Regressor:")
evaluate(rf_reg, train_df, train_y, test_df, test_y)
print("\nGradient Boosting Regressor:")
evaluate(gb_reg, train_df, train_y, test_df, test_y)
print("\nLinear Regression:")
evaluate(lr_reg, train_df, train_y, test_df, test_y)
print("\nVoting Regressor:")
evaluate(voting_reg, train_df, train_y, test_df, test_y)

Random Forest Regressor:

Train Results:
Mean Squared Error: 589633423818955.0
R2 Score: 0.8876062923851433
Mean Absolute Error: 15931191.34875

Test Results:
Mean Squared Error: 3789714440126361.0
R2 Score: -0.04827604104250782
Mean Absolute Error: 50434138.644999996

Gradient Boosting Regressor:

Train Results:
Mean Squared Error: 5246142656308409.0
R2 Score: 0.0
Mean Absolute Error: 55362710.8125

Test Results:
Mean Squared Error: 3648533255409018.0
R2 Score: -0.009223796942487317
Mean Absolute Error: 54762289.1875

Linear Regression:
Train Results:
Mean Squared Error: 2668272215043784.0
R2 Score: 0.49138397679002765

https://colab.research.google.com/drive/1pstRfqqvQFQfpBSjQ3MLnVt3xihmJ_pP?authuser=2#scrollTo=59S_UT-sIMgM&printMode=true 11/11

Data Science Bootcamp (Day-01) (1) - Compressed
No ratings yet
Data Science Bootcamp (Day-01) (1) - Compressed
161 pages
Unit-5 Curve Fitting by Numerical Method
100% (2)
Unit-5 Curve Fitting by Numerical Method
10 pages
Machine Learning Algorithms PDF
100% (1)
Machine Learning Algorithms PDF
148 pages
ML Workshop
No ratings yet
ML Workshop
78 pages
Case Study Stock Market Prediciton
No ratings yet
Case Study Stock Market Prediciton
10 pages
OceanofPDF - Com Hands-On Machine Learning From Scratch - Venelin Valkov
No ratings yet
OceanofPDF - Com Hands-On Machine Learning From Scratch - Venelin Valkov
119 pages
Progress of GRADIENT BOOSTING ALGORITHM FOR ELECTRICITY THEFT DETECTION IN POWER UTILITIES
No ratings yet
Progress of GRADIENT BOOSTING ALGORITHM FOR ELECTRICITY THEFT DETECTION IN POWER UTILITIES
10 pages
Data Preprocessing in Machine Learning
No ratings yet
Data Preprocessing in Machine Learning
27 pages
ML Book Notes
No ratings yet
ML Book Notes
9 pages
DE - Python For Data Science - Machine Learning
No ratings yet
DE - Python For Data Science - Machine Learning
45 pages
MLA Lab Record (2024)
No ratings yet
MLA Lab Record (2024)
47 pages
5.1skewness and Kurtosis
No ratings yet
5.1skewness and Kurtosis
5 pages
Essentials of Biostatistics and Research
0% (1)
Essentials of Biostatistics and Research
6 pages
Algorithmeknn 121213175830 Phpapp02
No ratings yet
Algorithmeknn 121213175830 Phpapp02
52 pages
How To Prepare Your Dataset For Machine Learning in Python
No ratings yet
How To Prepare Your Dataset For Machine Learning in Python
14 pages
Articles Xgboost Classification With Smote-Enn Algorithm
No ratings yet
Articles Xgboost Classification With Smote-Enn Algorithm
11 pages
Anova
No ratings yet
Anova
56 pages
Slides On DataI
No ratings yet
Slides On DataI
33 pages
Weak AI Generative AI Strong AI:-Machine Learning Tutorial 1.supervised Leaning 2.un Supervised Learning 3.reinforcement Learning
No ratings yet
Weak AI Generative AI Strong AI:-Machine Learning Tutorial 1.supervised Leaning 2.un Supervised Learning 3.reinforcement Learning
53 pages
ML 21ai63
No ratings yet
ML 21ai63
26 pages
2018 02 Msu Data Science
No ratings yet
2018 02 Msu Data Science
65 pages
Ensemble Learning
No ratings yet
Ensemble Learning
52 pages
MACHINE LEARNING Manual
No ratings yet
MACHINE LEARNING Manual
36 pages
DM Assignment 2
No ratings yet
DM Assignment 2
23 pages
Deep Learning
No ratings yet
Deep Learning
25 pages
Deep Learning and Machine Learning: Lab Explanation
No ratings yet
Deep Learning and Machine Learning: Lab Explanation
34 pages
State Space Models and Kalman Filter
No ratings yet
State Space Models and Kalman Filter
41 pages
SVM, Neural Network and Random Forest in R
No ratings yet
SVM, Neural Network and Random Forest in R
45 pages
SVM Presentation
No ratings yet
SVM Presentation
27 pages
Beginner's Guide To Implementing A Simple Machine Learning Project - DeV Community
No ratings yet
Beginner's Guide To Implementing A Simple Machine Learning Project - DeV Community
9 pages
Unit 3 Ds
No ratings yet
Unit 3 Ds
10 pages
17 Ensemble Techniques Problem Statement
No ratings yet
17 Ensemble Techniques Problem Statement
28 pages
Building a GPA Calculator Web App with Vanilla HTML, CSS, and JavaScript.: A Practical Q&A Guide Using a GPA Calculator
From Everand
Building a GPA Calculator Web App with Vanilla HTML, CSS, and JavaScript.: A Practical Q&A Guide Using a GPA Calculator
Lumavalle Press
No ratings yet
Remaining ML Program
No ratings yet
Remaining ML Program
12 pages
Learning Algorithms & Models
No ratings yet
Learning Algorithms & Models
9 pages
Machine Learning Laboratory (BTCS619-18) B.Tech Cse 6Th 2024 EVEN
No ratings yet
Machine Learning Laboratory (BTCS619-18) B.Tech Cse 6Th 2024 EVEN
29 pages
ML (Prac1)
No ratings yet
ML (Prac1)
12 pages
Case Study - Classifier
No ratings yet
Case Study - Classifier
5 pages
Classifying Data Using Support Vector Machines (SVMS) in Python
No ratings yet
Classifying Data Using Support Vector Machines (SVMS) in Python
5 pages
ML Practical 205160694034
No ratings yet
ML Practical 205160694034
33 pages
Kartik MLP 4-9prg
No ratings yet
Kartik MLP 4-9prg
10 pages
Data Analysis ch1
No ratings yet
Data Analysis ch1
13 pages
Module 3.4 Classification Models, Case Study
No ratings yet
Module 3.4 Classification Models, Case Study
12 pages
SVM K NN MLP With Sklearn Jupyter NoteBo
No ratings yet
SVM K NN MLP With Sklearn Jupyter NoteBo
22 pages
Project 2
No ratings yet
Project 2
5 pages
Machine Learning With PySpark and MLlib - Solving A Binary Classification Problem - by Susan Li - Towards Data Science
No ratings yet
Machine Learning With PySpark and MLlib - Solving A Binary Classification Problem - by Susan Li - Towards Data Science
10 pages
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
36 pages
House Price Prediction: Project Description
No ratings yet
House Price Prediction: Project Description
11 pages
Table of Contents:: Predictnow - Ai Lets You Apply Machine Learning Predictions To Your Data Without Any Programming
No ratings yet
Table of Contents:: Predictnow - Ai Lets You Apply Machine Learning Predictions To Your Data Without Any Programming
15 pages
Time Series Analysis Homework Solutions
100% (1)
Time Series Analysis Homework Solutions
6 pages
B24 ML Exp-3
No ratings yet
B24 ML Exp-3
10 pages
S20220020307 Assignment 2
No ratings yet
S20220020307 Assignment 2
4 pages
Title: Implement Support Vector Machine Classifier: Department of Computer Science and Engineering
No ratings yet
Title: Implement Support Vector Machine Classifier: Department of Computer Science and Engineering
5 pages
ML Lab6
No ratings yet
ML Lab6
4 pages
Ashfatmaterial
No ratings yet
Ashfatmaterial
4 pages
Assignment II Machine Learning
No ratings yet
Assignment II Machine Learning
8 pages
ML Assignment-8
No ratings yet
ML Assignment-8
3 pages
ML
No ratings yet
ML
8 pages
Lab Program (SVM From Scratch)
No ratings yet
Lab Program (SVM From Scratch)
2 pages
ML Project
No ratings yet
ML Project
3 pages
Maxbox Starter60 Machine Learning
No ratings yet
Maxbox Starter60 Machine Learning
8 pages
ML Lab
No ratings yet
ML Lab
7 pages
Essential n8n Playbook
From Everand
Essential n8n Playbook
Leandro Calado
No ratings yet
Data Pre Processing
No ratings yet
Data Pre Processing
2 pages
Diabetes Case Study - Jupyter Notebook
100% (1)
Diabetes Case Study - Jupyter Notebook
10 pages
Classification
No ratings yet
Classification
4 pages
Solve All Questions Without - Choice
No ratings yet
Solve All Questions Without - Choice
4 pages
Statistics - Linear Regression - Correlation Worksheet PDF
No ratings yet
Statistics - Linear Regression - Correlation Worksheet PDF
2 pages
P-Value 0.2, 0.05 Data Is Not Normal Reject H0: Tests of Normality
No ratings yet
P-Value 0.2, 0.05 Data Is Not Normal Reject H0: Tests of Normality
2 pages
PSYC-2020A F23-W24 Herbert
No ratings yet
PSYC-2020A F23-W24 Herbert
9 pages
Salesforce Certified Platform Developer I CRT-450 Exam Preparation
From Everand
Salesforce Certified Platform Developer I CRT-450 Exam Preparation
Georgio Daccache
No ratings yet
Ann PM
No ratings yet
Ann PM
1 page
Statistics: Week 6 Sampling and Sampling Distributions
No ratings yet
Statistics: Week 6 Sampling and Sampling Distributions
51 pages
Eberhardt 2012 Estimating Panel Time Series Models With Heterogeneous Slopes
No ratings yet
Eberhardt 2012 Estimating Panel Time Series Models With Heterogeneous Slopes
11 pages
2-Siklus Regresi
No ratings yet
2-Siklus Regresi
27 pages
Sathyabama University: Register Number
No ratings yet
Sathyabama University: Register Number
4 pages
AMSM-Multivariate Inferential Statistics - Chapter Eight: November 2018
No ratings yet
AMSM-Multivariate Inferential Statistics - Chapter Eight: November 2018
9 pages
3.2 Joint Probability Density Functions: (JPDF) - For Example The JPDF P
No ratings yet
3.2 Joint Probability Density Functions: (JPDF) - For Example The JPDF P
1 page
CS273a Final Exam
No ratings yet
CS273a Final Exam
9 pages
Experimental Design and Modeling in Aquaculture
No ratings yet
Experimental Design and Modeling in Aquaculture
23 pages
T-Test: T-TEST PAIRS Seblm WITH Sesdh (PAIRED) /CRITERIA CI (.9500) /missing Analysis
No ratings yet
T-Test: T-TEST PAIRS Seblm WITH Sesdh (PAIRED) /CRITERIA CI (.9500) /missing Analysis
3 pages
Al Manja Hie 2020
No ratings yet
Al Manja Hie 2020
15 pages
Sampling Distribution and Central Limit Theorem: Session 2
No ratings yet
Sampling Distribution and Central Limit Theorem: Session 2
19 pages
DISS 700 Homework 8 Ezana D. Aimero
No ratings yet
DISS 700 Homework 8 Ezana D. Aimero
10 pages
STPM Physic Experiment 2 (Term 1, 2016) .
No ratings yet
STPM Physic Experiment 2 (Term 1, 2016) .
2 pages
Econometrics Practical 2
No ratings yet
Econometrics Practical 2
2 pages
Assignment 8 Pearson Correlation
No ratings yet
Assignment 8 Pearson Correlation
6 pages
Case Processing Summary
No ratings yet
Case Processing Summary
2 pages
Summ. Stat&Prob q3m3
No ratings yet
Summ. Stat&Prob q3m3
2 pages

23MCA1030 - Ensemble - Classifiers - .Ipynb - Colaboratory

Uploaded by

23MCA1030 - Ensemble - Classifiers - .Ipynb - Colaboratory

Uploaded by

3/29/24, 1:06 PM 23MCA1030_Ensemble_classifiers*.

Machine Learning Lab (PMCA507P)

Reg No: 23MCA1030

Name : Vinayak Kumar Singh

Exercise 9: Ensemble classifiers

Collab url : https://colab.research.google.com/drive/1pstRfqqvQFQfpBSjQ3MLnVt3xihmJ_pP?usp=sharing

Dataset url : https://www.kaggle.com/datasets/sudalairajkumar/indian-startup-funding/code?datasetId=1902&searchQuery=ens

Indian Startup Funding Dataset

keyboard_arrow_down Import necessary libraries

Requirement already satisfied: scikit-learn in /usr/local/lib/python3.10/dist-packages (1.2.2)

count = df['Industry Vertical'].value_counts()

Consumer Internet 582

keyboard_arrow_down Random Forest

Date Industry City Amount in

Tiger Global Private Equity

2 09/01/2020 Mamaearth E-commerce Bengaluru Sequoia Capital India Series B 18358860.0

3 02/01/2020 https://www.wealthbucket.in/ FinTech New Delhi Vinod Khatumal Pre-series A 3000000.0

train,test= train_test_split(df,test_size=0.2,random_state =10)

from sklearn.preprocessing import LabelEncoder

InvestmentType InvestorsName IndustryVertical StartupName CityLocation month

Next steps: Generate code with test_df

clf = GradientBoostingRegressor(learning_rate =0.1,max_depth = 11,min_samples_split =100,min_samples_leaf =20,n_estimators =40,

from sklearn.ensemble import RandomForestClassifier

evaluate(rf_clf, train_df, train_y, test_df, test_y)

keyboard_arrow_down Voting Regressor ensemble

voting_reg = VotingRegressor(estimators=[('lr', lr), ('rf', rf), ('gb', gb)])

evaluate(voting_reg, train_df, train_y, test_df, test_y)

keyboard_arrow_down Model Comparison

keyboard_arrow_down Instance of the Voting Regressor:

voting_reg = VotingRegressor(estimators=[('rf', rf_reg), ('gb', gb_reg), ('lr', lr_reg)])

Random Forest Regressor:

Gradient Boosting Regressor:

You might also like