0% found this document useful (0 votes)
6 views1 page

Experiments Results: Machine Learning Prediction of Companies' Business Success

Uploaded by

yuginhou
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views1 page

Experiments Results: Machine Learning Prediction of Companies' Business Success

Uploaded by

yuginhou
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Machine Learning Prediction of Companies’ Business Success

Chenchen Pan ([email protected]), Yuan Gao ([email protected]), Yuzi Luo ([email protected])

INTRODUCTION EXPERIMENTS RESULTS


Thousands of companies are emerging around the world Data Preprocessing
Random Forests have the best accuracy while KNN has the highest F1 score
each year. Among them, some are successful, been ● Extracted and merged the companies’ information
and the highest AUC score.
acquired or IPO, while others may vanished. What makes from several original files.
this different and lead to the different endings for the ● Labelled all the data with 1 or 0 based on the
companies? In this project, we want to build a binary companies’ status. 1 = Acquired or IPO; 0 =
classification model to predict the success of companies. Otherwise.
Previous work using similar dataset only compared the ● Edited, filtered and selected meaningful features.
model between Logistic Regression and Random Forest. ○ category_list Audio|Mobile|Music
○ funding_total_usd 440000
We explored K-Nearest Neighbours (KNN) classifier,
○ country_code AUS
and use F1 score as the metric to compare the models. ○ funding_rounds 3
And found KNN performs better on this task. ○ Num_of_investor 3
○ funding_duration 425
○ first_funding_at_UTC 15461 For the confusion matrix, the TPR and FPR are 69.80% and 26.96% for Logistic
○ last_funding_at_UTC 15886 Regression, 84.04% and 39.16% for Random Forests, and 74.12% and 26.81% for KNN
○ label 0
respectively. Random Forests performs best on Confusion Matrix.
DATASET & METHODS ● Used up-sample method to balance the training set.
● Normalized numerical features.
Dataset ● Encoded text features using bag-of-words model.
The dataset we use is extracted from Crunchbase Data This table below shows the number of training,
Export containing 60K+ companies’ information evaluation and test data for original and up-sampled
updated to December 2015. dataset.

Logistic Regression
Logistic regression is s widely-used algorithm to We selected KNN model to run on test set with:
model a binary dependent variable with many Accuracy = 73.70% F1 score: 44.45%
independent variables.

Model Selection
Random Forest we present three metrics: FUTURE WORK REFERENCE
Random Forest is an ensemble learning method for
● Accuracy: The proportion we have predicted right.
classification with constructing a multitude of decision ● Include more features of the companies,
● F1 Score:
trees at training time and outputting the class that is ● Wei CP, Jiang YS, Yang CS. Patent
such as business description.
the mode of the classes. Analysis for Supporting Merger and
● Try more complex models, such as
K Nearest Neighbours Acquisition (M&A) Prediction: A Data
Neural Network and pre-trained word
We classify an object by a majority vote of its K Mining Approach[M]. Berlin: Springer,
embedding.
nearest neighbours. 2009: 187-200.
● AUC Score: Area under the ROC Curve, which is an ● Try kernel method as moving the data to
● Bento FRSR. Predicting Start-up Success
aggregate measure of performance across all possible higher dimensional space.
with Machine Learning[D]. Lisboa: NOVA
classification thresholds. ● Explore some new questions, such as
Information Management School, 2018.
● TPR = TP / (TP + FN), FPR = FP / (FP + TN) predicting the total funding size for a
9-83.
company (regression problem).

You might also like