0% found this document useful (0 votes)
254 views33 pages

BC2406 S01 G02 Final Report

This report analyzes a group project that investigates whether an app's pricing model impacts its sales revenue. The group used regression analysis and text mining on app data from March 2013 to build three models targeting key predictors. The models found that adopting a free pricing model, especially for lifestyle, travel, and books apps, is most effective for boosting sales. Descriptions using frequent terms may mitigate negative effects of paid apps. Adopting freemium models is generally more effective than paid models for increasing app sales and revenue.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
254 views33 pages

BC2406 S01 G02 Final Report

This report analyzes a group project that investigates whether an app's pricing model impacts its sales revenue. The group used regression analysis and text mining on app data from March 2013 to build three models targeting key predictors. The models found that adopting a free pricing model, especially for lifestyle, travel, and books apps, is most effective for boosting sales. Descriptions using frequent terms may mitigate negative effects of paid apps. Adopting freemium models is generally more effective than paid models for increasing app sales and revenue.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 33

BC2406 Analytics I: Visual & Predictive Techniques

Semester 1, AY 2016/17

Group Project Report

Members: ANG ZHAN XIAN U1410930L

LIM HONG QUAN, LEROY U1421068C

MICHAEL UTAMA U1510178C

ONG JEFFREY U1421715C

TANG MIAO YI VALERIE U1410625C

Seminar Number: 01

Group Number: 02

Instructor: Dr. Lee Gun Woong

Submission Date: 7 November 2016


1. EXECUTIVE SUMMARY
Consumers are addicted to mobile applications with 79% of smartphone owners using at least
one app every day (Perez, 2014b). One of the many goals app developers have is to create
successful and popular apps. However, for an app to be successful, not only does it need a large
base of users, it also has to generate a sustainable stream of revenue for the developers.

There are several factors that affect total app revenue which are unique to the mobile app
market. Unlike physical products, apps can be listed on the app store for free or paid download.
Additionally, while physical products often differentiate themselves with their strong branding,
there are many apps in the market that share similar functions and lack significant brand names.
Hence, it is crucial for the app developers to find other ways and sales strategies to improve
their app revenues.

Therefore, we have identified the business problem to investigate if an App’s Pricing Model
have any significant impact on its Sales Revenue.

This business problem is then further decomposed into 3 more-specific tasks, resulting in 3
models built, each targeting different key predictors and therefore targeting different aspects
of the overall analysis.

Regression and text mining were two data mining techniques used to attempt to answer the
business problem. Using data collected on 15 March 2013, variables were identified for
regression analysis while app descriptions were used for text mining to identify keywords.

Based on the 3 models built from sub tasks, this report had identified multiple strategies to
increase sales. The first way to increase sales is for developers to launch an app as a “Free”
version with the assumption that a Freemium model is incorporated. In addition, apps in some
categories were shown to have better sales than others. Moreover, it is found that using
frequently used terms in an app’s product description may cushion the negative repercussions
of an app’s sales if an app is launched as a Paid version. Nevertheless, in this study, the
freemium model is found to be more effective in increasing sales as compared to paid model.
This is especially true for app which are found in LifeStyle, Travel, Books categories.

Some suggestions for new app developers can be interpreted directly from our model outcomes.
In order to boost an app’s sales, new entrants should first try to adopt the Freemium model
before even considering the Paid model - especially if the app is launched in the LifeStyle,
Travel, or Books categories. Such a strategy is useful and applicable to existing apps too.
Therefore, existing app developers can consider adopting the Freemium model. However
should they still wish to adopt the Paid model, they need to devise successful and strategic
product descriptions and keyword to mitigate any possible negative effects on an apps sales
due to its high price.

Page 2 of 33
2. BUSINESS UNDERSTANDING
2.1 Background
Market growth in terms of cumulative app downloads from Apple’s AppStore has been
exponentially increasing from July 2008 to September 2016 (Appendix A). The AppStore
today contains up to 2,685,676 mobile applications (Steel Media Ltd, 2016), with total
cumulative downloads of up to 140 billion (Statista, 2016).

2.2 Opportunities & Challenges


For users, there is a larger & expanded database and availability of mobile apps for them to
enjoy (i.e. opportunities). However, the challenge is that there are too many similar or
duplicative mobile apps increases user’s cognitive load searching for the suitable app. For
sellers, the higher number of downloads indicates expanded customer network for sellers to
sell their mobile apps to cater to customers’ heterogeneous preferences (i.e. opportunities).
However, a challenge they face is the rising level of competition because as many as 60,000
mobile apps are launched in the AppStore every month (Perez, 2014a).

2.3 Business Problem


Problem: Does the App Pricing Model have any impacts on App Sales Revenue?

To answer the business problem, we first decomposed the problem into subtasks that will use
data-mining techniques that can support our analysis. The subtasks are as follows:

● Task1: Analyse the relationships between Pricing Model, Frequently Used


Descriptions Terms, and App Categories to App Revenue.
○ Text-Mining
○ Regression Analysis
● Task2: Analyse the complementary effects between Pricing Model and Frequently
Used Descriptions Terms to App Revenue.
○ Text-Mining
○ Regression Analysis with Interaction Variables
● Task3: Analyse the complementary effects between Pricing Model and App Categories
to App Revenue.
○ Text-Mining
○ Regression Analysis with Interaction Variables

2.4 Hypothesis
We hypothesize that the revenue generated from an app will be affected by its pricing model
(Paid or Free), the category it belongs to and its product description.

2.5 Control Variables


● Number of Screenshots (Screenshot)
● Average score rating (StarsAllVersions)
● Size of App (Log_Size)
● Total number of reviews (Log_RatingsAllVersions)

Page 3 of 33
2.6 Key Predictors
Variable Rationale
Paid We predict that the pricing model that the app adopts will affect
(a binary variable that we the revenue it generates. This is because consumers may be price
will create based on the sensitive and may not be willing to spend money on apps based
variable Price where: on just the available product information, description and user
Paid=0 for Price=0 and; reviews. They might only be willing to spend money on it after
Paid=1 for Price>0) trying out the app for themselves. Therefore we expect that
setting a price for the initial download will impact the revenue
for the app.
Category We expect that the category of the app will affect the revenue it
generates. Consumers may be willing to pay a premium for a
Business app that they use for important matters because they
would place more value on the functions and quality of the
applications. However, for a Games app, consumers may not see
a need to pay for it as it is only for entertainment.
For this study, we will be examining apps from the Games,
Business, Education, Lifestyle, Entertainment, Travel, Books,
Health & Fitness, Food & Drink and Utilities categories.
Description We predict that app’s description in the app store will affect the
revenue it generates. For instance, having certain keywords like
“best” or “free” in the description might give the app a higher
chance of appearing on search results when the consumer looks
for apps. Also, the description allows the consumer to get an idea
of that the app is like. Therefore, a meaningful description would
increase the number of app downloads and thus increase revenue.

2.7 Empirical Approach

Text Mining
Based on descriptions from the files in the US_24 Category_Detailed folder, commonly used
words are extracted based on the different categories. From these words, the report will then
attempt to hypothesize a relationship between keywords used in Descriptions and Sales in the
respective categories. Developers should use these keywords more often, should there be
statistical proof that they exert a significant impact on an app’s sales revenue.

Linear Regression
Using the App Gross Rankings and transforming it into an indicator for sales revenue as a
dependent variable, our report will attempt to utilize the App Pricing Model, Categories, and
Descriptions as independent variables to form a regression model. This will be analysed to
evaluate the model explanatory power and prediction power.

Linear Regression with Interaction Variables


It might be too general to look at the variables individually, as some other interacting variables
has to be taken into considerations, for example different categories might need to adopt a
different pricing model, or what can you do to improve the app sales if it is already adopting a

Page 4 of 33
“Paid Model”. This part of the Data-Mining aims to find out if the interactions between Pricing
Model and Categories, Pricing Model and Frequently Used Descriptions Terms, have any
impact on the App Sales.

3. DATA PREPARATION

3.1 Description of Data’s Characteristics


Data Source: ● Log - from the purchase history of U.S. Apple App Store
Collection ● 15 March 2013
Date: ● Due to differences in time zones from the Apple’s System time and
the location where the collection of data is recorded, data with the
date later than 15 March 2013 is converted to 15 March 2013.
Sample Size: ● 300 of each Category of Apps
● 10 Categories of Apps: Games, Business, Education, Lifestyle,
Entertainment, Travel, Books, Health & Fitness, Food & Drink and
Utilities categories.
● Total of 3000 Apps data
Structure of ● Record
Data: ● Data that consists of a collection of records, each of which consists
of a fixed set of attributes.

3.2.1 Merging Datasets


We merged the following data files together: “GrossGames.csv”, “GrossBusiness.csv”,
“GrossEducation.csv”, “GrossLifeStyle.csv”, “GrossEntertainment.csv”, “GrossUtilities.csv”,
“GrossTravel.csv”, “GrossBooks.csv”, “Gross Health&Fitness.csv” and “Gross
Food&Drink.csv”.

3.2.2 Data Cleaning

Missing Values
We have identified that there are a small number of records with missing (NA) values, hence
we have chosen to eliminate them from our data.

Outliers
In general, outliers are variables that are over 3 standard deviations from the mean. We
analysed the relevant variables for app prices and found that Price, Screenshot, Size,
StarsAllVersions, RatingsAllVersions, StarsCurrentVersion, RatingsCurrentVersion contain
outliers. We analysed the data for app prices and determined that these data are considered to
be “extreme” based on our domain knowledge on the app market. Hence, outliers are removed
because they may have disproportionate influence on our model.

Page 5 of 33
3.2.3 Data Pre-Processing

Normalization
From the summary statistics, we have identified that there are a few variables (Size,
RatingsAllVersions) with large scales. These variables have their mean lower than their
standard deviations. Hence, we did normalization by doing a log-transformation on those
variables, to prevent them from dominating and skewing the results.

Variable Transformation
Sales is created by log transforming the app’s top gross rank. For this analysis, it is assumed
that the lower the rank, the higher the sales revenue volume. In addition, an app with Rank 1
in sales rank is assumed to have the same sales revenue volume as an app in another category
with Rank 1 in sales revenue rank.

Dummy variables of the app categories were created to facilitate the computation of the
regression analysis.

terms_score is the variable created from determining how many types of 20 most frequent
description terms were used in a particular app description. The 20 most frequent description
terms were determined through the use of text mining. A score of 1 is added to the total score
for an app for each type of the 20 most frequent description terms. Prior to the calculation of
terms_score, the app descriptions were converted into a corpus and subsequently a DTM to be
processed. The app descriptions were converted into lower-case. Then, parsing was done where
HTML tags, frequently appeared but less-important terms device, less-informative terms,
stopwords, numbers, white space, punctuations and App Store-related terms were removed
(See Appendix B). In addition, meaningful numbers were converted to characters for the
purpose of this text mining. Finally, stemming is done to reduce the terms to their root form.
From this result, we obtained the 20 most frequently used terms in app descriptions for the
calculation of terms_score (See Appendix C).
3.3 Summary Statistics of Variables
The summary statistics of the variables that will be used in the analysis are as follows (after
cleaning and pre-processing):

*NOTE: Categories has been omitted.

Page 6 of 33
3.4 Visualize the Associations Among the Key Variables

Using the correlations, we can identify the associations among the key variables. This is useful
in helping us identify any variables that are highly correlated, and if there were any, we will
have to remove the variable from our data to prevent skewing or biases. In our case, we can
see that the variables do not have very high correlations with each other. Hence, we can proceed
on with our analysis.

4. MODELING & EVALUATION


For the Regression Model, 3 different Models were used to address our business problem and
to make meaningful evaluations. The variables included in each models are:

Variables Model #1 Model #2 Model #3


Paid, Category, Term_Score

(Paid * Term_Score)

(Paid * Categories)

Control Variables

Page 7 of 33
4.1.1 Model 1: Linear Regression with Descriptors

Model 1 aims to find out how the different independent variables is affecting the dependent
variable (sales). The different variables which is used to predict sales are: Paid, Games,
Business, Education, Lifestyle, Entertainment, Travel, Books, Health, Food, Utilities, and
terms_score. 𝛽₀ (intercept) in this case do not have an interpretation because there will not be
sales if all of the variables are not present. Lastly, there is a degree of error for the regression
model.

4.1.2 Model 2: Linear Regression with Associated Descriptors (Paid interact Score)
To improve our R2 of Model 1 as well as to identify if there is any relationship between the
payment model and app descriptions, Model 2 will include an interaction variable between Paid
and terms_score to investigate the interactions between an App Pricing Model and App
Descriptions. Interaction variables give us new insights as to how different factors may interact
with each other to exert profound effects on sales. Therefore, Model 2 aims to find out how an
app’s textual product description (i.e. terms_score) interacts with Paid, should terms_score
be found to have a positive & significant impact on an app’s sales. The proposed regression
model is as follows:

4.1.3 Model 3: Linear Regression with Associated Descriptors (Paid interact Categories)
While Model 2 provides a useful insights regarding a mobile app’s product description on its
sales, we would also like to investigate how different Categories & Paid, when interacted, have
any significant impact on an app’s sales. This will consequently provide developers a better
guide on whether providing using a Paid Model or a Freemium Model in their respective app
categories will significantly boost app sales.

In Model 3, ten additional interaction variables were added into Model 1. These variables are
derived by interacting (i.e. multiplying) Paid with 9 app Categories (binary).

Page 8 of 33
4.2 Outcome
The outcome of the models are as follows (highlighted in green are our focus):

Independent Variables Model 1 Model 2 Model 3


Paid -0.149886*** -0.32917*** 0.046595
Games -0.789270*** -0.80275*** -0.642421***
Business 0.181168* 0.17675* 0.224316
Education 0.011862 0.01407 0.064614
Lifestyle 0.141442 . 0.13215 . 0.397966**
Entertainment -0.142515 . -0.15074 . 0.008207
Travel 0.299039*** 0.29931*** 0.731457***
Books 0.159797* 0.14640 . 0.421679**
Health -0.065456 -0.07078 0.080127
Food 0.382101*** 0.38315*** 0.407327*
Utilities Baseline

terms_score 0.008248 -0.01176 0.009597


interaction_Tscore_Paid NA 0.02890* NA
interaction_Paid_Games NA NA -0.179368
interaction_Paid_Business NA NA -0.050805
interaction_Paid_Education NA NA -0.045050
interaction_Paid_Lifestyle NA NA -0.358106*
interaction_Paid_Entertainment NA NA -0.187687
interaction_Paid_Travel NA NA -0.521252**
interaction_Paid_Books NA NA -0.359470*
interaction_Paid_Health NA NA -0.185283
interaction_Paid_Food NA NA -0.036620
interaction_Paid_Utilities Baseline

Screenshot -0.009078 -0.01135 -0.013846


StarsAllVersions 0.032531 0.02968 0.032207
Log_Size 0.035417* 0.03618* 0.038878*
Log_RatingsAllVersions 0.182130*** 0.18296*** 0.183436***

Page 9 of 33
4.3 Interpretation of Outcomes: Estimated Coefficients
Only Key Variables that are unique to that model will be interpreted, also only significant
variables are interpreted. (For a full list, please refer to Appendix D, E and F).

Apps in Game category performed worse than apps in Utilities category as it


decreased app revenue by 78.93% at 0.1% significance level.
Apps in Business category improved app revenue by 18.12% compared to apps
in Utilities category at 5% significance level.
Apps in Travel category improved app revenue by 29.99% compared to apps
in Utilities category at 0.1% significance level.
Apps in Books category improved app revenue by 15.98% compared to apps in
Model 1
Utilities category at 5% significance level.
Apps in Food category improved app revenue by 38.21% compared to apps in
Utilities category at 0.1% significance level.
Paid - Apps that were paid to download decreased app revenue by 14.99%
compared to freemium apps at 0.1% significance level.
terms_score - A one-unit increase in terms_score (one more term used)
increases app revenue by 0.82%.
interaction_Tscore_Paid - Apps in Paid version have complementary
relationships with its product descriptions. A Paid version of an app with
Model 2
product descriptions consisting of our analysis’ identified most frequent terms
may significantly increase an app’s revenue by 2.89%.
interaction_Paid_Lifestyle - Apps in Paid version have supplementary
relationships with Lifestyle category. A Paid version of an app belonging to the
Lifestyle category may significantly decrease an app’s revenue by 35.81%.
interaction_Paid_Travel - Apps in Paid version have supplementary
Model 3 relationships with Travel category. A Paid version of an app belonging to the
Travel category may significantly decrease an app’s revenue by 52.13%.
interaction_Paid_Books - Apps in Paid version have supplementary
relationships with Books category. A Paid version of an app belonging to the
Books category may significantly decrease an app’s revenue by 35.95%.

Explanations
Our terms_score was found to be insignificant in Model 1. However, on further analysis at
Model 2, we found that there is a complementary relationship between Paid and the
terms_score. This means while in general, app descriptions does not affect app revenue, it does
become more important when the app is adopting a Paid model, at 5% significance level. This
is likely due to the fact that when it comes to Paid Apps, the app description is one of the few
sources for a user to gain information about the app. Hence, the user will take into account
what the app descriptions promises when purchasing the app. Whereas, in the case of a Free
app, they can download the app first to experience it for themselves.
Analysing the interactions between an App’s pricing model and categories, we found
that in general, when a category is following a Paid Model, the revenue will decrease. This is
consistent with our findings from Model 1.
4.5 Model Evaluation (Diagnostic Test for Models)
Regarding our group’s regression analysis, we have derived the following results.

Page 10 of 33
Null Hypothesis & F-Test
The null hypothesis refers to a statistical event in which at the 0.1% significance level, all of
the predictor variables utilised in our regression analysis are jointly & highly likely to have a
zero effect on mobile app’s Sales. Our analysis shows a p-value of < 2.2e-16, therefore since
we reject the null hypothesis at 0.1% Significance Level, we can conclude that the predictor
variables (at least one) are jointly significant, and are jointly & not highly likely to have a zero
effect on Sales.

Evaluation of Model 1 Fit


(Estimated output of Model #1 is shown in Appendix D.)
The “adjusted R2 value” is 0.1653. This means approximately 17% of variation in app’s Sales
is explained by the selected predictors analysed in our regression analysis. The other 83%
remains unexplained by this model (more details will be explained in Section 5). This 83% of
variation could be explained by other predictor variables not found in our analysed dataset,
such as number of downloads/installations, number of un-installations (i.e. retention and churn
rate), keyword density of the app’s landing page, app usage statistics (how engaged an app’s
users are as well as how frequently they launch an app), and more (Walz, 2015).

Evaluation of Model 2 Fit


(Estimated output of Model 2 are shown in Appendix E.)
The “adjusted R2 value” is 0.1668. This means that predictor variables in Model 2 can only
explain approximately 17% of variation in an app’s Sales. Therefore, this result shows that the
additional variables of the interacting variables between product descriptions & Paid (i.e.
interaction_Tscore_Paid) did not have drastic improvement on the model explanatory power
on the variance.

Evaluation of Model 3 Fit


(Estimated output of Model #3 is shown in Appendix F.)
The “adjusted R2 value” is 0.1673. This means that predictor variables in Model 3 can only
explain approximately 17% of variation in an app’s Sales. Therefore, this result shows that the
additional interaction variables between Paid and the respective Product Categories did not
have drastic improvement on the model explanatory power on the variance.

Comparison of Explanatory Powers for All 3 Models

Values Model 1 Model 2 Model 3


R2 Value 0.1699 0.1717 0.1746
Adjusted R2 Value 0.1653 0.1668 0.1673
p-Value < 2.2e-16 < 2.2e-16 < 2.2e-16

As seen above, the model’s explanatory power (adjusted R-squared values), even though not
drastic, has been slightly improved from Model 1 (0.1653) to Model 2 (0.1668) and Model 3
(0.1673). Model 2 and Model 3 explains about 16.68% and 16.73% of variation respectively
in the dependent variable (i.e., sales). The F-statistic indicates that the null hypothesis should
be rejected and the predictors do have effects on Sales.

Page 11 of 33
Comparison of Errors

Errors Model 1 Model 2 Model 3


Min -1.8474 -1.8368 -1.7226
1st Quartile -0.5913 -0.5977 -0.5832
Median -0.1949 -0.1971 -0.2035
3rd Quartile 0.3913 0.3780 0.3750
Max 4.4267 4.4471 4.3862
MSE 0.785748 0.784053 0.781318

The maximum error of the 3 models is roughly about 4.4, suggesting that the model under-
predicted an app’s ranking by nearly e^4.4471 = 85 ranks for at least one observation. On the
other hand, 50% of errors fall within the 1Q and 3Q values. Therefore, the majority of
predictions were between e^(-0.5977) = 0.55 rank over an app’s true ranking and e^0.3913 =
1.48 rank under an app’s true ranking. Overall, the Error has been improved in Model 3 as
compared to Model 1 and 2.

Comparison of Residual Plots


To further analyse the Goodness of Fit of the models, we plotted the residual plots for all 3
models. From Appendix G, the residual plots suggests that there may be heteroscedasticity
(Statwing, n.d.). This means that the residuals get larger as the prediction moves from small to
large. While this is not inherently a problem. It is an indicator that the model can be improved.
In addition, the residual plots also seems to suggest that there is an imbalance in the Y-axis.
Again, this is an indicator that the model can be improved upon. This could likely be the cause
of the low R² value across the 3 models. One solution to improve the model would be to
normalise the variables. However, since we had already normalised the relevant variables
identified in 3.2.3, the likely cause of the above phenomena is due to missing variables in the
model. This is in line with our conclusion earlier that there are other variables not included in
the model that explains the remaining 83% of the variation in app’s Sales..

5. DEPLOYMENT: INSIGHTS AND IMPLICATIONS

5.1 Addressing the Business Problem (and Subtasks)


After conducting the Data-Mining Techniques, below are the summary of our findings using
the 3 different models.

Model Key Predictors Better Sales


Adopt the Freemium Model Yes

More of frequently used descriptions terms Yes


1
Provision of Apps into a certain category Games, Business, Travel,
Books, Food

Page 12 of 33
Adopt the Paid Model with more number of
2 Yes
frequently used descriptions terms
Adopt the Freemium Model with Apps of a All, with LifeStyle, Travel &
3
certain category Books being most significant.

Model Interpretation
Firstly, apps that adopt the Freemium Model are more likely to top the grossing
chart. Secondly, apps that have more frequently used descriptions terms in their
1 descriptions are more likely to top the grossing chart. Lastly, we have also
identified popular categories that are in the top grossing chart, namely: Games,
Business, Travel, Books, and Food.
Our findings suggest that if you are persistent in adopting the Paid Model, they
could mitigate the negative effects of the Paid Model by introducing more
2
frequently used description terms. As the number of frequently used descriptions
terms increases, it tends to cancel out the negative effects of the Paid Model.
This suggests that no matter which category a particular App belongs to, it should
3 always adopt the freemium model to top the grossing chart, especially in LifeStyle,
Travel, Books.
We therefore conclude our hypothesis “Revenue generated from an app will be
Overall
affected by its Pricing Model, Category and Product Description” is true.

With reference to Appendix D, in Model 1, the following variables are highly significant:
Log_RatingsAllVersions, Paid, Games, Travel, and Food. These independent variables have p-
value which are smaller than 0.001, which have an impact on the dependent variable (Sales).
In contrast, an app’s Screenshot, StarsAllVersions, Education, Lifestyle, Entertainment, Books,
Health, and terms_score are not significantly associated with an its Sales. Therefore these
variables do not predict the dependent variable (Sales).

With reference to Appendix E, predictor variables in Model 2 can only explain approximately
17% of variation in an app’s Sales. Therefore, this result shows that the additional variables of
the interacting variables between Product Descriptions & Paid (i.e. interaction_Tscore_Paid)
did not improve the model explanatory power on an app’s sales. Moreover, apps in Paid version
have complementary relationships with its descriptions. A Paid version of an app with product
descriptions consisting of our analysis’ identified most frequent terms may significantly
increase an app’s sales revenue by 2.89%, at 5% significance level. In other words, to mitigate
the negative effects on an app’s sales exerted by the higher app price in the Paid category,
developers need to write strategically useful and impactful product descriptions so as to prevent
any drop of sales, or even boost more sales.

With reference to Appendix F, despite having 10 additional variables, the predictor variables
in Model 3 is still not able to explain 83% of the variation of the model. All of these newly
introduced variables have negative coefficients. This implies that an app’s product Category
has supplementary relationships with its Paid version. We therefore infer that launching an
app as a Paid version, regardless of its product category, will not improve its sales. In fact,
these new variables exert a negative impact on sales, meaning launching an app (regardless of

Page 13 of 33
its product category) as a Paid version may most likely result in a decline in sales as compared
to launching it as a Free version. Moreover, only 3 out of these newly introduced variables
are statistically significant in their influence on an app’s sales. We can conclude that Travel,
Lifestyle and Books product categories exert negative impact on an app’s sales more
significantly than the other product categories when it is launched as a Paid version. For
example, a Paid version of an app belonging to the Lifestyle category may significantly
decrease an app’s revenue by 35.8%. Therefore, it is recommended that apps belonging in such
product categories are launched in the AppStore as Free versions.

5.2 Suggested Recommendations & Managerial Implications

Firstly, new entrants may use our findings as robust guidelines to help improve initial sales
performance. Assuming these new entrants can develop apps for any categories, based on our
regression analysis, it is suggested that they develop a Free version of an app based on the
Games or Travel product category, that is able to capture as many number of review
ratings as possible, so as to significantly boost an app’s sales.

For example, to increase number of review ratings (as validated and encouraged by Model 1),
App developers may consider utilizing an app review plugin such as Appriater which will
prompt users to review the app after they have used it a certain number of times or after a set
time period If the user taps on the “Rate” button, they are taken right to the AppStore where
they can pen their reviews (Kissmetrics, 2016). Alternatively, app developers may incentivize
their users to review the app, such as rewarding users of a game app certain amount of
EXP/rewards/points in exchange for their reviews.

Furthermore, app developers who design apps in the Travel, Lifestyle and Books product
categories are encouraged to launch them as a Free version, instead of launching them as a
Paid version. This is because doing the latter may significantly decrease app sales. Should app
developers found sales success in launching their apps as a Free version, and wishes to further
diversify their portfolios by launching it as a Paid version as the next strategic step, they need
to devise successful and strategic product descriptions & keyword presentation to mitigate
any possible negative effects on an app’s sales due to its high price, given the complementary
relationships between an app’s product description and its Paid version. This suggestion is also
corroborated by the fact that the interaction between an app’s Paid version & its product
descriptions somewhat exert a statistically significant impact on its sales.

A creative way for app developers to toy around with both the Free and Paid versions is to
decrease the price of a mobile app temporarily to Free for a limited period of time that
coincides with a certain season, say from mid-to-end December to take advantage of the
Christmas season (Rajput, 2016). App developers may utilize websites specializing in tracking
app price reductions, such as 148Apps & AppShopper, to analyse and determine the optimal
time period to keep the app as a Free version before restoring it back to the original Paid
version. Further research has shown that mobile apps who adopt this method continues to
attract high download frequency even after the apps are converted back to their Paid version.
In this way, app developers may also indirectly mitigate the negative effects that a high app
price has on sales, and sustain high sales revenue in a longer, more sustainable term.

Page 14 of 33
5.3 Limitations of Our Research & Analysis
Our analysis above have been largely focused on determining which app-specific attribute(s)
exert significant impact on a mobile app’s sales performance. However, we have not considered
the possibility of a producer to diversify its product portfolio and sell his/her products across
different categories. In a highly reputable research, it is found that such diversification is an
influentially paramount determinant to the high survival probability of a mobile app in
AppStore’s Top Charts, which consequently contributes significantly to a mobile app’s sales
performance (Lee, 2015).

Furthermore, another limitation of our research is that we focused only on the revenue
generated by an app through paid downloads and in-app purchases and ignored other possible
sources of revenue. For example, there are many free apps in the market that generate revenue
through advertisements. However, our study did not take into account advertising revenue
generated by an app as that information is not available in the calculation of gross rankings of
the apps in the Apple app store. Hence we have not investigated these other sales strategies that
developers can use to create a successful app.

Next, the analysis and findings of our research are based on a mobile app’s ranking information.
There are however several alternative methods to estimate an app’s sales revenue performance.
Additionally, the top-performing apps which appear at the top charts may aid users to make
his/her purchase decision faster and easier, because these apps will be promoted and flashed to
the users first when they first searched for the apps they are looking for. Unfortunately, the
limited availability of datasets provided inhibited our research from analysing a user’s
“potential preferential attachment mechanisms” in our analysis (Lee, 2015). Therefore, a longer
monitoring time period is necessary to evaluate if results may change or vary over a longer
time frame.

Lastly, this dataset is limited to Apple’s AppStore in the U.S. Future studies should include
analysis of mobile apps’ sales performance on other mobile app distribution platforms, such as
Google Play Store. This is because a mobile app’s sales performance may vary in differing
platforms, due to mitigating factors such as: different types and numbers of categories
available, different types of customer profile each mobile app distribution platform caters to
(for example, Google Play caters more to less affluent customers in Less Developed Countries
such as Indonesia and Brazil, whereas the Apple AppStore caters more to more affluent
customers in More Developed Countries such as Singapore & the U.S.), different App Store
Optimization (ASO) requirements, and more (Lee, 2015).

Page 15 of 33
6. APPENDICES

Appendix A: Exponentially Increasing Market Share of Apple’s AppStore

Page 16 of 33
Appendix B: Text Mining Parsing

HTML Tag Less Informative Terms Other Terms

u2019 apple Numbers

u’ iphone Stopwords

u” touch Punctuations

u2605 ipod Symbols

u2606 ipad Whitespace

u201c 3gs Other non alphanumeric


characters

u201d 3rd

u2011 2nd

u2013 4th

u2014 app

u2022 store

u2122 game

u2026 play

u2028 mobile

u2729 free

u20ac new

amp world

xae and

xa0 for

xa3 the

don to

won in

ing when

HTML Tag Less Informative Terms Other Terms

‘ll then

Page 17 of 33
www he

com she

than

can

get

one

also

just

need

Appendix C: 20 Most Frequent Terms

support help make email best


2186 1637 1624 1603 1577

read friend devic video find


1525 1517 1484 1456 1447

like note work list photo


1406 1380 1368 1363 1352

book user track creat file


1352 1348 1331 1323 1321

Page 18 of 33
Appendix D: Estimation Output of Model 1

Page 19 of 33
Appendix D: Estimation Output of Model 1 (Continued)

Variable Coefficient Interpretation Association

Screenshot -0.009078 A one-unit increase in Screenshot Negative


(one screenshot) decreases app Not
revenue by 0.91% Significant

StarsAllVersions 0.032531 A one-unit increase in Positive


StarsAllVersion (one star) Not
increases app revenue by 3.25% Significant

Log_Size 0.035417 A one-percent increase in Positive


Log_Size increases app revenue by Significant
0.0354%

Log_ 0.182130 A one-percent increase in Positive


RatingsAllVersions Log_RatingsAllVersions increases Significant
app revenue by 0.1821%

Paid -0.149886 Apps that were paid to download Negative


decreased app revenue by 14.99% Significant
compared to freemium apps

Games -0.789270 Apps in Game category performed Negative


worse than apps in Utilities Significant
category as it decreased app
revenue by 78.93%

Business 0.181168 Apps in Business category Positive


improved app revenue by 18.12% Significant
compared to apps in Utilities
category

Education 0.011862 Apps in Education category Positive


improved app revenue by 1.19% Not
compared to apps in Utilities Significant
category

Variable Coefficient Interpretation Association

Page 20 of 33
Lifestyle 0.141442 Apps in Lifestyle category Positive
improved app revenue by 14.14% Not
compared to apps in Utilities Significant
category

Entertainment -0.142515 Apps in Entertainment category Negative


performed worse than apps in Not
Utilities category by decreasing Significant
app revenue by 14.25%

Travel 0.299039 Apps in Travel category improved Positive


app revenue by 29.90% compared Significant
to apps in Utilities category

Books 0.159797 Apps in Books category improved Positive


app revenue by 15.98% compared Significant
to apps in Utilities category

Health -0.065456 Apps in Health category Negative


performed worse than apps in Not
Utilities category be decreasing Significant
app revenue by 6.55%

Food 0.382101 Apps in Food category improved Positive


app revenue by 38.21% compared Significant
to apps in Utilities category

Utilities Baseline

terms_score 0.008248 A one-unit increase in terms_score Positive


(one more term used) increases Not
app revenue by 0.82% Significant

Fitted Model

Page 21 of 33
Appendix D: Estimation Output of Model 2

Page 22 of 33
Appendix E: Estimation Output of Model 2 (Continued)

Variable Coefficient Interpretation Association

Screenshot -0.011352 A one-unit increase in Screenshot Negative


(one screenshot) decreases app Not
revenue by 1.14% Significant

StarsAllVersions 0.029683 A one-unit increase in Positive


StarsAllVersion (one star) Not
increases app revenue by 2.97% Significant

Log_Size 0.036180 A one-percent increase in Positive


Log_Size increases app revenue by Significant
0.0362%

Log_ 0.182962 A one-percent increase in Positive


RatingsAllVersions Log_RatingsAllVersions increases Significant
app revenue by 0.1830%

Paid -0.32917 Apps that were paid to download Negative


decreased app revenue by 32.92% Significant
compared to freemium apps

Games -0.802752 Apps in Game category performed Negative


worse than apps in Utilities Significant
category as it decreased app
revenue by 80.28%

Business 0.176746 Apps in Business category Positive


improved app revenue by 17.67% Significant
compared to apps in Utilities
category

Education 0.014068 Apps in Education category Positive


improved app revenue by 1.41% Not
compared to apps in Utilities Significant
category

Lifestyle 0.132153 Apps in Lifestyle category Positive


improved app revenue by 13.22% Not
compared to apps in Utilities Significant
category

Variable Coefficient Interpretation Association

Page 23 of 33
Entertainment -0.150740 Apps in Entertainment category Negative
performed worse than apps in Not
Utilities category by decreasing Significant
app revenue by 15.07%

Travel 0.299310 Apps in Travel category improved Positive


app revenue by 29.90% compared Significant
to apps in Utilities category

Books 0.146401 Apps in Books category improved Positive


app revenue by 14.64% compared Not
to apps in Utilities category Significant

Health -0.070777 Apps in Health category Negative


performed worse than apps in Not
Utilities category be decreasing Significant
app revenue by 7.08%

Food 0.383148 Apps in Food category improved Positive


app revenue by 38.31% compared Significant
to apps in Utilities category

Utilities Baseline

terms_score -0.01176 A one-unit increase in terms_score Negative


(one more term used) decreases Not
app revenue by 1.18% Significant

interaction_Tscore_ 0.028896 Apps in Paid version have Positive


Paid complementary relationships with Significant
its product descriptions. A Paid
version of an app with product
descriptions consisting of our
analysis’ identified most frequent
terms may significantly increase
an app’s revenue by 2.89%.

Page 24 of 33
Appendix E: Estimation Output of Model 2 (Continued)

Fitted Model

Page 25 of 33
Appendix F: Estimation Output of Model 3

Page 26 of 33
Appendix F: Estimation Output of Model 3 (Continued)

Variable Coefficient Interpretation Association

Screenshot -0.013846 A one-unit increase in Screenshot Negative


(one screenshot) decreases app Not
revenue by 1.38% Significant

StarsAllVersions 0.032207 A one-unit increase in Positive


StarsAllVersion (one star) Not
increases app revenue by 3.22% Significant

Log_Size 0.038878 A one-percent increase in Positive


Log_Size increases app revenue by Significant
0.03888%

Log_ 0.183436 A one-percent increase in Positive


RatingsAllVersions Log_RatingsAllVersions increases Significant
app revenue by 0.1834%

Paid 0.046595 Apps that were paid to download Positive


increases app revenue by 4.66% Significant
compared to freemium apps

Games -0.642421 Apps in Game category performed Negative


worse than apps in Utilities Significant
category as it decreased app
revenue by 64.24%

Business 0.224316 Apps in Business category Positive


improved app revenue by 22.43% Significant
compared to apps in Utilities
category

Education 0.064614 Apps in Education category Positive


improved app revenue by 6.46% Not
compared to apps in Utilities Significant
category

Lifestyle 0.397966 Apps in Lifestyle category Positive


improved app revenue by 39.80% Not
compared to apps in Utilities Significant
category

Variable Coefficient Interpretation Association

Page 27 of 33
Entertainment 0.008207 Apps in Entertainment category Positive
improved app revenue by 0.82% Not
compared to apps in Utilities Significant
category

Travel 0.731457 Apps in Travel category improved Positive


app revenue by 73.15% compared Significant
to apps in Utilities category

Books 0.421679 Apps in Books category improved Positive


app revenue by 42.17% compared Not
to apps in Utilities category Significant

Health 0.080127 Apps in Health category improved Positive


app revenue by 8.01% compared Not
to apps in Utilities category Significant

Food 0.407327 Apps in Food category improved Positive


app revenue by 40.73% compared Significant
to apps in Utilities category

Utilities Baseline

terms_score 0.009597 A one-unit increase in terms_score Positive


(one more term used) increases Not
app revenue by 0.96% Significant

interaction_Paid_ -0.179368 Apps in Paid version have Negative


Games supplementary relationships with Not
Games category. A Paid version of Significant
an app belonging to the Games
category do not exert any
significant impact on an app’s
revenue.

interaction_Paid_B -0.050805 Apps in Paid version have Negative


usiness supplementary relationships with Not
Business category. A Paid version Significant
of an app belonging to the
Business category do not exert any
significant impact on an app’s
revenue.

Variable Coefficient Interpretation Association

Page 28 of 33
interaction_Paid_E -0.045050 Apps in Paid version have Negative
ducation supplementary relationships with Not
Education category. A Paid Significant
version of an app belonging to the
Education category do not exert
any significant impact on an app’s
revenue.

interaction_Paid_L -0.358106 Apps in Paid version have Negative


ifestyle supplementary relationships with Significant
Lifestyle category. A Paid version
of an app belonging to the
Lifestyle category may
significantly decrease an app’s
revenue by 35.8%.

interaction_Paid_E -0.187687 Apps in Paid version have Negative


ntertainment supplementary relationships with Not
Entertainment category. A Paid Significant
version of an app belonging to the
Entertainment category do not
exert any significant impact on an
app’s revenue.

interaction_Paid_T -0.521252 Apps in Paid version have Negative


ravel supplementary relationships with Significant
Travel category. A Paid version of
an app belonging to the Travel
category may significantly
decrease an app’s revenue by
52.1%.

interaction_Paid_B -0.359470 Apps in Paid version have Negative


ooks supplementary relationships with Significant
Books category. A Paid version of
an app belonging to the Books
category may significantly
decrease an app’s revenue by
35.9%.

Variable Coefficient Interpretation Association

Page 29 of 33
interaction_Paid_ -0.185283 Apps in Paid version have Negative
Health supplementary relationships with Not
Health category. A Paid version of Significant
an app belonging to the Health
category do not exert any
significant impact on an app’s
revenue.

interaction_Paid_ -0.036620 Apps in Paid version have Negative


Food supplementary relationships with Not
Food category. A Paid version of Significant
an app belonging to the Food
category do not exert any
significant impact on an app’s
revenue.

interaction_Paid_U Baseline
tilities

Fitted Model

Page 30 of 33
Appendix G: Residual Plots

Model 1

Model 2

Page 31 of 33
Model 3

Page 32 of 33
6. REFERENCES

Kissmetrics. (2016). 5 Clever Ways to Increase Mobile App Reviews. Kissmetrics Blog: A
Blog About Analytics, Marketing And Testing. Retrieved on November 6, 2016, from
https://blog.kissmetrics.com/increase-mobile-app-reviews/.

Lee, G. W. (2015). Understanding the Determinants of Success in Mobile Apps Markets


(Doctoral dissertation, Arizona State University). Retrieved October 27, 2016, from
https://repository.asu.edu/attachments/150636/content/Lee_asu_0010E_14861.pdf.

Perez, S. (2014a). The App Store, Six Years Later. Retrieved on November 6, 2016,
from https://techcrunch.com/2014/07/10/the-app-store-six-years-later/

Perez, S. (2014b). Majority Of Digital Media Consumption Now Takes Place In


Mobile Apps. TechCruch. Retrieved October 27, 2016, from
https://techcrunch.com/2014/08/21/majority-of-digital-media-consumption-now-
takes-place-in-mobile-apps/.

Rajput, M. (2016, June 3). Ways to Determine The Best Pricing Model For Your App.
Entrepreneur India. Retrieved November 7, 2016, from
https://www.entrepreneur.com/article/276897.

Statista. (2016). Most popular Apple App Store categories in September 2016, by share of
available apps. Retrieved October 27, 2016, from
https://www.statista.com/statistics/270291/popular-categories-in-the-app-store/

Steel Media Ltd. (2016). Count of Active Applications in the App Store. Retrieved October 31,
2016, from http://www.pocketgamer.biz/metrics/app-store/app-count/.

Walz, A. (2015, May 27). Deconstructing the App Store Rankings Formula with a Little
Mad Science. Moz, Inc. Retrieved on October 30, 2016, from
https://moz.com/blog/app-store-rankings-formula-deconstructed-in-5-mad-science-
experiments.
Statwing. Interpreting residual plots to improve your regression. Retrieved November
8, 2016, from Interpreting residual plots to improve your regression,
http://docs.statwing.com/interpreting-residual-plots-to-improve-your-regression/

Page 33 of 33

You might also like