23MCA1030 - Ensemble - Classifiers - .Ipynb - Colaboratory
23MCA1030 - Ensemble - Classifiers - .Ipynb - Colaboratory
ipynb - Colaboratory
df = pd.read_csv("/content/startup_funding.csv")
df = df.drop(['Sr No','Remarks','SubVertical'],axis = 1)
df = df.dropna()
df = df.reset_index(drop=True)
count = df['InvestmentnType'].value_counts()
plt.figure(figsize=(10,4))
sns.barplot(x = count.index, y = count.values, alpha=0.8)
plt.xticks(rotation='vertical')
plt.xlabel('Investment Type', fontsize=12)
plt.ylabel('Number of fundings made', fontsize=12)
plt.title("Type of Investment made", fontsize=16)
plt.show()
https://colab.research.google.com/drive/1pstRfqqvQFQfpBSjQ3MLnVt3xihmJ_pP?authuser=2#scrollTo=59S_UT-sIMgM&printMode=true 2/11
3/29/24, 1:06 PM 23MCA1030_Ensemble_classifiers*.ipynb - Colaboratory
https://colab.research.google.com/drive/1pstRfqqvQFQfpBSjQ3MLnVt3xihmJ_pP?authuser=2#scrollTo=59S_UT-sIMgM&printMode=true 3/11
3/29/24, 1:06 PM 23MCA1030_Ensemble_classifiers*.ipynb - Colaboratory
count = df['City Location'].value_counts()
plt.figure(figsize=(25,10))
sns.barplot(x = count.index, y = count.values, alpha=0.8)
plt.xticks(rotation='vertical')
plt.xlabel('Investment Location', fontsize=25)
plt.ylabel('Number of fundings made', fontsize=25)
plt.title("Type of Investment made", fontsize=30)
plt.show()
https://colab.research.google.com/drive/1pstRfqqvQFQfpBSjQ3MLnVt3xihmJ_pP?authuser=2#scrollTo=59S_UT-sIMgM&printMode=true 4/11
3/29/24, 1:06 PM 23MCA1030_Ensemble_classifiers*.ipynb - Colaboratory
df.head()
Susquehanna
1 13/01/2020 Shuttl Transportation Gurgaon Series C 8048394.0
Growth Equity
df = df[~df['Amount in USD'].isnull()]
https://colab.research.google.com/drive/1pstRfqqvQFQfpBSjQ3MLnVt3xihmJ_pP?authuser=2#scrollTo=59S_UT-sIMgM&printMode=true 5/11
3/29/24, 1:06 PM 23MCA1030_Ensemble_classifiers*.ipynb - Colaboratory
3 3000000.0
18 1500000.0
13 2000000.0
2 18358860.0
14 50000000.0
8 70000000.0
17 486000.0
16 150000000.0
237 4200000.0
12 30000000.0
11 12000000.0
1 8048394.0
0 200000000.0
15 231000000.0
4 1800000.0
9 50000000.0
Name: Amount in USD, dtype: float64
▾ LabelEncoder
LabelEncoder()
https://colab.research.google.com/drive/1pstRfqqvQFQfpBSjQ3MLnVt3xihmJ_pP?authuser=2#scrollTo=59S_UT-sIMgM&printMode=true 6/11
3/29/24, 1:06 PM 23MCA1030_Ensemble_classifiers*.ipynb - Colaboratory
train_df = pd.DataFrame(
{
'InvestmentType': le1.transform(train_x['InvestmentnType']),
'InvestorsName': le2.transform(train_x['Investors Name']),
'IndustryVertical': le3.transform(train_x['Industry Vertical']),
'StartupName': le4.transform(train_x['Startup Name']),
'CityLocation': le5.transform(train_x['City Location']),
'month': le6.transform(train_x['Date dd/mm/yyyy'])
})
test_df = pd.DataFrame(
{
'InvestmentType': le1.transform(test_x['InvestmentnType']),
'InvestorsName': le2.transform(test_x['Investors Name']),
'IndustryVertical': le3.transform(test_x['Industry Vertical']),
'StartupName': le4.transform(test_x['Startup Name']),
'CityLocation': le5.transform(test_x['City Location']),
'month': le6.transform(test_x['Date dd/mm/yyyy'])
})
test_df.head()
0 5 10 14 3 7 6
1 10 9 14 13 3 9
2 5 1 12 11 1 7
3 2 0 11 17 3 4
https://colab.research.google.com/drive/1pstRfqqvQFQfpBSjQ3MLnVt3xihmJ_pP?authuser=2#scrollTo=59S_UT-sIMgM&printMode=true 7/11
3/29/24, 1:06 PM 23MCA1030_Ensemble_classifiers*.ipynb - Colaboratory
▾ GradientBoostingRegressor
GradientBoostingRegressor(max_depth=11, max_features=3, min_samples_leaf=20,
min_samples_split=100, n_estimators=40,
random_state=43)
rf_clf.fit(train_df, train_y)
▾ RandomForestClassifier
RandomForestClassifier()
Train Results:
Mean Squared Error: 0.0
R2 Score: 1.0
Mean Absolute Error: 0.0
Test Results:
Mean Squared Error: 2960370221494809.0
R2 Score: 0.18112954819222815
Mean Absolute Error: 39287901.5
Train Results:
Mean Squared Error: 0.0
R2 Score: 1.0
Mean Absolute Error: 0.0
Test Results:
Mean Squared Error: 2960370221494809.0
R2 Score: 0.18112954819222815
Mean Absolute Error: 39287901.5
https://colab.research.google.com/drive/1pstRfqqvQFQfpBSjQ3MLnVt3xihmJ_pP?authuser=2#scrollTo=59S_UT-sIMgM&printMode=true 8/11
3/29/24, 1:06 PM 23MCA1030_Ensemble_classifiers*.ipynb - Colaboratory
lr = LinearRegression()
rf = RandomForestRegressor()
gb = GradientBoostingRegressor()
voting_reg.fit(train_df, train_y)
▸ VotingRegressor
lr rf gb
▾ LinearRegression ▾ RandomForestRegressor ▾ GradientBoostingRegressor
LinearRegression() RandomForestRegressor() GradientBoostingRegressor()
Train Results:
Mean Squared Error: 620569682990373.2
R2 Score: 0.8817093389093895
Mean Absolute Error: 17939311.084098116
Test Results:
Mean Squared Error: 6814934148937292.0
R2 Score: -0.885084563093143
Mean Absolute Error: 57982655.935594216
https://colab.research.google.com/drive/1pstRfqqvQFQfpBSjQ3MLnVt3xihmJ_pP?authuser=2#scrollTo=59S_UT-sIMgM&printMode=true 9/11
3/29/24, 1:06 PM 23MCA1030_Ensemble_classifiers*.ipynb - Colaboratory
rf_reg = RandomForestRegressor()
gb_reg = GradientBoostingRegressor(learning_rate=0.1, max_depth=11, min_samples_split=100, max_features=3, random_state=43)
lr_reg = LinearRegression()
rf_reg.fit(train_df, train_y)
gb_reg.fit(train_df, train_y)
lr_reg.fit(train_df, train_y)
voting_reg.fit(train_df, train_y)
▸ VotingRegressor
rf gb lr
▸ RandomForestRegressor ▸ GradientBoostingRegressor ▸ LinearRegression
Evaluate the performance of each model on the test data using the evaluate function
https://colab.research.google.com/drive/1pstRfqqvQFQfpBSjQ3MLnVt3xihmJ_pP?authuser=2#scrollTo=59S_UT-sIMgM&printMode=true 10/11
3/29/24, 1:06 PM 23MCA1030_Ensemble_classifiers*.ipynb - Colaboratory
print("Random Forest Regressor:")
evaluate(rf_reg, train_df, train_y, test_df, test_y)
print("\nGradient Boosting Regressor:")
evaluate(gb_reg, train_df, train_y, test_df, test_y)
print("\nLinear Regression:")
evaluate(lr_reg, train_df, train_y, test_df, test_y)
print("\nVoting Regressor:")
evaluate(voting_reg, train_df, train_y, test_df, test_y)
Test Results:
Mean Squared Error: 3789714440126361.0
R2 Score: -0.04827604104250782
Mean Absolute Error: 50434138.644999996
Test Results:
Mean Squared Error: 3648533255409018.0
R2 Score: -0.009223796942487317
Mean Absolute Error: 54762289.1875
Linear Regression:
Train Results:
Mean Squared Error: 2668272215043784.0
R2 Score: 0.49138397679002765
https://colab.research.google.com/drive/1pstRfqqvQFQfpBSjQ3MLnVt3xihmJ_pP?authuser=2#scrollTo=59S_UT-sIMgM&printMode=true 11/11