R Project
R Project
1. The case
Crowdfunding platforms like IndieGoGo and Kickstarter have become increasingly popular,
enabling individuals, businesses, and non-profits to secure funding for their projects. The success of
these campaigns, often raising millions, highlights the unique opportunities these platforms offer for
entrepreneurs and innovators. Beyond financial gains, crowdfunding allows creators to test ideas,
receive instant feedback, and build a community or user base.
In this project, we explore the effects of quality and engagement on the amount of funds raised.
Formally, we want to support or reject the following hypotheses:
2. Hypotheses
3. Assumptions
In this project, we will use OLS regressions models which make the following assumptions:
• Linearity: the relationship between the independent variables and the dependent variable is
assumed to linear
• Normality of errors: The errors should be normally distributed
• Homoscedasticity: the variance of the errors is constant across all the levels of the
independent variables
Stockholm School of Economics. Sveavägen 65. Box 6501. SE-113 83 Stockholm. Sweden. Phone +46 8 736 90 00. www.hhs.se
• No or little multicollinearity: occurs when independent variables in the regression model
are highly correlated with each other
• No outliers
4. Variables
• When considering which control variables to include, we must assess the quality of the
variables and the logical reason for their inclusion:
– Images and video: These two make up the campaign quality variable and are
therefore excluded due to multicollinearity concerns.
– Business venture and social: These may contribute to higher fundraising by
signaling some form of seriousness. However, when they are both included, their
signficance decrease and so exclude business venture due to the risk of
multicollinearity.
– Creators: the number of creators in the team could impact the overall quality and
management of the campaign. More creators might contribute to diverse skills and
perspectives, influencing campaign outcomes. However, not adequate
– The length of the pitch and the presence of filler words can affect how backers
perceive the campaign. Including these controls helps address the potential impact
of pitch characteristics on engagement and funds raised.
5. Analyses
# Loading the data
df <- rio::import("./data/campaigns.Rds")
# You can use this code chunk, or create new ones, to complete your analyses
# Alternatively, you can create a table for the description of multiple variables.
For example, Montly Income and Age.
vtable::st(
# The dataset to describe
df,
# Statistics to display
summ = c("mean(x)", "sd(x)", "notNA(x)", "min(x)", "max(x)"),
Summary Statistics
Variable Mean S.D. N Min Max
Collected Funds 2572 3079 3000 500 55082
Campaign Quality 5.8 2.69 3000 0 9
Comments Count 15.7 24.5 3000 0 745
Updates Count 3.77 7.56 3000 0 148
Pitch Size 583 382 3000 6 5014
Social Venture 0.288 0.453 3000 0 1
Creators 2.1 2.03 3000 1 43
Looking at the summary statistics, we can conclude that the funds collected, the quality, number of
comments, uppdates and pitch sizes for the 3000 crowdfunding projects within the dataset vary
significantly and may thus contain outliers. Social is a dichotomous variable with a mean below 0,5.
This suggest that most projects are not social.
library(ggplot2)
There is a significant relationship between crowdfunding campaign quality and the total funds
collected, through the level of engagement of the backers in the campaign.
In our analysis, we’ve structured our variables based on their roles in the hypothesized relationship.
Collected funds are placed as the dependent variable, aligning with our hypothesis that it’s the
outcome being influenced. Campaign quality takes the role of the independent variable because our
hypothesis suggests its impact on the collected funds. As our hypothesis indicates the influence on
collected funds goes through engagement level which we thus has chosen as mediating variable. To
measure the engagement level, we selected the comment count variable, since it includes the
summary(model_campaign_quality)
Call:
lm(formula = collected_funds ~ campaign_quality + social + pitch_size +
creators, data = df)
Residuals:
Min 1Q Median 3Q Max
-5913 -1508 -824 467 51945
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 719.5823 163.4679 4.402 1.11e-05 ***
campaign_quality 131.1866 22.2175 5.905 3.93e-09 ***
social 653.2001 127.2642 5.133 3.04e-07 ***
pitch_size 0.7572 0.1479 5.120 3.25e-07 ***
creators 219.9207 27.4981 7.998 1.79e-15 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Call:
lm(formula = comments_count ~ campaign_quality + social + pitch_size +
creators, data = df)
Residuals:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.573736 1.318590 3.469 0.00053 ***
campaign_quality 0.825710 0.179214 4.607 4.25e-06 ***
social 4.340921 1.026558 4.229 2.42e-05 ***
pitch_size 0.003475 0.001193 2.913 0.00361 **
creators 1.441106 0.221809 6.497 9.56e-11 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Call:
lm(formula = collected_funds ~ campaign_quality + comments_count +
social + pitch_size + creators, data = df)
Residuals:
Min 1Q Median 3Q Max
-26106.6 -1106.2 -523.6 430.7 30670.5
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 362.4976 127.2558 2.849 0.00442 **
campaign_quality 66.7211 17.3222 3.852 0.00012 ***
comments_count 78.0729 1.7599 44.361 < 2e-16 ***
social 314.2919 99.1684 3.169 0.00154 **
pitch_size 0.4859 0.1151 4.222 2.49e-05 ***
creators 107.4095 21.5138 4.993 6.30e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Simulations: 500
plot(model_bootstrap)
# Open the project folder: you will find the results in results_side_by_side.html
As is evident by a comparison of our two tables, the pitch size and social control variables have a
significant relationship with collected funds. However, the explanatory power of the mediation
model only increases by a slight amount from 42% to 42.5%. This suggests that they should not be
included as they may contribute to over fitting.
There is a significant relationship between crowdfunding campaign quality and the level of
engagement of the backers in the campaign.
summary(model_h2)
Call:
lm(formula = comments_count ~ campaign_quality + social + pitch_size +
creators, data = df)
Residuals:
Min 1Q Median 3Q Max
-32.99 -10.22 -4.94 3.46 724.92
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.573736 1.318590 3.469 0.00053 ***
campaign_quality 0.825710 0.179214 4.607 4.25e-06 ***
social 4.340921 1.026558 4.229 2.42e-05 ***
pitch_size 0.003475 0.001193 2.913 0.00361 **
creators 1.441106 0.221809 6.497 9.56e-11 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The output from the model shows that there is a positive and significant relationship between the
campaign quality and comments count, p-value< 0.05, thus we reject the null hypothesis. A one unit
increase in campaign quality gives a 0.99 unit increase in comments count. We support hypothesis 2
that there is a significant relationship between campaign quality and the engagement level of the
backers such that a one unit increase in quality increases the engagement of the backers by 0,99 units.
The explanatory power of the model is low, given an adjusted R-squared of 0.019.
8. Hypothesis 3
The relationship between crowdfunding campaign quality and the level of engagement of the backers
in the campaign is moderated by the amount of effort the campaigners engage to interact with the
backers.
summary(model_h3)
Call:
lm(formula = comments_count ~ campaign_quality * updates_count +
social + pitch_size, data = df)
Residuals:
Min 1Q Median 3Q Max
-87.81 -9.44 -4.46 3.74 709.23
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.890297 1.334751 3.664 0.000253 ***
campaign_quality 0.933453 0.188972 4.940 8.26e-07 ***
updates_count 2.476165 0.259440 9.544 < 2e-16 ***
social 3.606693 1.004682 3.590 0.000336 ***
pitch_size 0.002021 0.001173 1.722 0.085138 .
campaign_quality:updates_count -0.224840 0.031805 -7.069 1.93e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We did not standardize the variables we used because they are discrete, not continuous variables,
comments_count, campaign_quality, updates_count. It is for example not possible for comments
count to take on a value of 1.2.
The output from the model shows a negative significant moderating effect between updates count
and comments count, p-value< 0.05, thus we reject the null hypothesis. We support hypothesis 3
that the relationship between crowdfunding campaign quality and the level of engagement of the
backers in the campaign is moderated by the amount of effort the campaigners engage to interact
with the backers is significant. The explanatory power of the model is low, given an adjusted R-
squared of 0.075.
9. Summary
We reject the null for all three hypothesis, thus our models shows statistical significant support for
the three hypothesis we tested. However, none of the models had a large explanatory power, using
0.6 as a benchmark, none of the models had an adjusted R-squared that exceeded this benchmark.