Machine Learning Based Recommendation System For Andriod Apps
Machine Learning Based Recommendation System For Andriod Apps
Abstract— Due to the extensive and ever-growing collection Although considerable work has been done in the domain
of android apps, it has become quite difficult for users to find of recommendation systems, the use of Machine Learning
apps that they are truly interested in. When looking for android algorithms to refine the results is an idea that is still fairly new.
app, the user has a particular goal in mind and wishes to find an The core of machine learning is to make autonomous models
app that can truly meets their needs and expectations. App that can learn on the basis of the data provided and automate
recommendation systems play a vital role in this regard by processes. When deployed for the sake of content
recommending apps to users based on their preference and recommendation, similar results are expected.
requirements. The goal of this project was to develop a machine
learning based app recommendation system to recommend Machine learning based app recommendation system is an
Android Apps to the users. We collected the screenshots and area where there is ample room for research and development.
metadata of a total of 12000 android apps, divided across ten Recommendation systems are known to improve both the
different categories. CNN model architecture was trained on process as well as the quality of decision making [4].
Vgg16 ImageNet weights. The model's prediction was followed
by Feature Extraction which was further used as a basis for This project is based on the utilization of machine learning
Linear Regression to finally recommend an app of interest to the algorithm for the sake of app recommendation for the users.
users. Screenshots of different apps were collected and presented to
the users. Based on user preferences, and the screenshots of
Keywords—recommendation system, machine learning, linear the apps, features are extracted that are then fed to the machine
regression, content-based filtering, feature extraction, Vgg16 learning model. The model learns via these extracted features
and uses the learning to recommend apps to the users in the
I. INTRODUCTION future. The basis of the app recommendation lies on the extent
App development is a booming field in the modern era. of similarities between the apps liked by the users and the apps
From leisure to education, learning to navigation, health and to be recommended.
fitness to tracking, you name it and there will be an android
application to serve the purpose. With such a wide range of II. OBJECTIVE
android apps available, and the fast pace at which newer apps The objectives of this research were to:
are being added to the list, it has surely become a challenge
for consumers to pick apps that they wish to use and install on • Deploy machine learning technique i.e.
their devices. [1] Convolutional Neural Network (CNN) for app
recommendation.
Recommendation system is an emerging area of research
that is concerned with using user data and intent to come up • Analyze the performance of CNN in the context
with relevant recommendations for the users. Utilizing the of app recommendation.
same approach. The aim is to come up with recommendations • Improve the performance of the recommendation
that align well with the taste and preferences of the users. system to enhance its reliability.
Two common approaches in this regard are Content-Based The paper discusses the methodology of the project
Filtering and Collaborative Filtering. [2] Content-based in detail. After the methodology, the results and findings of
filtering makes use of user’s preference paired with the the study are concluded and lastly, future work and further
content description whereas Collaborative filtering filters and areas of research and improvement have been mentioned in
recommends content based on content description and user
detail.
activity. Another approach i.e. Hybrid Content Filtering is also
used for this purpose. III. METHODOLOGY
As indicated by the name itself, Hybrid Content Filtering Fig 1 shows the basic methodology adopted in this paper. The
combines both Content as well as Collaborative filtering indicated steps are described below.
techniques to overcome the sparsity and cold start issues that
are associated with the former approaches [3]
Authorized licensed use limited to: Nanjing University. Downloaded on September 03,2021 at 11:51:40 UTC from IEEE Xplore. Restrictions apply.
A. Data Collection Fig 2 displays a sample of the web interface. We were able
to gather user reviews from nearly 50 users and the total app
reviews reached a number of more than 5300.
Authorized licensed use limited to: Nanjing University. Downloaded on September 03,2021 at 11:51:40 UTC from IEEE Xplore. Restrictions apply.
model architecture comprised of three convolutional layers between the input and the output variable as much as
and two dense layers followed by a SoftMax classifier. At the possible.
end of the first round of training, evaluation revealed that we
were only able to achieve an accuracy of 10%. Therefore, we At the end of this approach, we were able to minimize the
proceeded towards fine tuning of the model. Root Mean Square and limit it to a range of 0.15 to 0.3. This
was a considerable achievement since other similar projects
E. Fine Tuning for the purpose of development of recommendation systems
based on history data and the Netflix dataset were only able
The purpose of fine tuning was to enhance the to limit the RMS to 0.35-0.49 and 0.85-0.95 respectively [5]
performance of the previously trained model. To do so, we [9].
made use of the Vgg16 model weights. Vgg16 was an ideal
choice since it is based on a café model and was also ranked G. User Recommendation
number one in the ImageNet competition back in 2014 [6].
Using the two feature vectors, the linear regression
Vgg16 pre-trained ImageNet weights were loaded after graph was plotted. The model recommended the apps to the
initiating the convolutional base of the model. All the layers users based on the distance between the points of the liked
preceding the last convolutional blocks were frozen, and the apps and apps similar to the user preference on the graph.
previously trained model was added on top. This practice
In the end, the users were shown the apps based on their
boosted the accuracy of the model, and the new accuracy
liked apps. The apps recommended for the users are close to
came out to be 48.5%.
their initial preferences. Fig 3 shows the user interface of the
The accuracy of 48.5% may not seem to be that high, but app recommendation system.
considering the scope and the purpose of this specific project,
it is quite decent. Some of the reasons why the accuracy is not
high is because of the diversity of the dataset used. We have
a total of ten difference categories in which the apps have
been divided. Another point to be noted, that has a
considerable impact on the accuracy of the model is the fact
that the dataset is overlapping. For instance, a math game can
be classified as both a game category app as well an education
category app. Therefore, it is not possible for us to draw a
strict line between the categories of the apps. This leads to the
overlapping of the respective features and causes a lower
accuracy.
Figure 3 User Interface of the App Recommendation System
However, it is important to elaborate that since the goal
of the project is to develop a recommendation system,
therefore, merely relying on the accuracy of the model as an IV. RESULTS
evaluation metric would not be a wise approach. The true The results of this research project can be broadly
metric of evaluation of the model is the extent to which the classified into a total of three divisions since there were three
recommended apps are relevant as per the preferences of the different methodological phases of the project.
users [7]. The accuracy is just a measure of the extent of the
quality of feature extraction that is being done via the CNN. The first phase comprised of the implementation of the
CNN which itself led to an accuracy of 10% only. When the
F. Feature Extraction CNN model architecture was combined with the pre-trained
After hitting the accuracy benchmark of 48.5%, the model Vgg16 ImageNet weights, that led to further improvement and
retained the accuracy metric, and it did not increase beyond the accuracy increased up to 48.5%.
this point despite several retraining attempts. The reason for Lastly, features were extracted from the model weights
this considerably low accuracy was the fact that our dataset and the user preference based on the screenshots. These two
was quite diverse. If the same exercise was to be repeated for feature vectors were used for the sake of the implementation
apps belonging to the same category, deploying solely the of Linear Regression. The performance of Linear Regression
CNN architecture paired with the Vgg16 weights would have was measured via the Root Mean Square Error which came
led to a much higher accuracy score. The CNN architecture out to fall in the range of 0.15-0.3.
relies on multiple stages of feature extraction to enhance the
quality of features extracted [10]. But since the goal was to Since the Root Mean Square Error is considerably low,
enhance the reliability of the app recommendation of the therefore it is an indicator that the accuracy and the
model as much as possible therefore, we utilized feature performance of the model is quite reliable and dependable.
extraction to be able to deploy Linear Regression.
V. CONCLUSION
The trained model, along with the weights was loaded and
features were extracted for the construction of the feature We built an app recommendation system by utilizing
vector. The features were extracted from the middle layer of screenshots of different apps and user preferences, based on a
the CNN model. The second feature vector was constructed CNN, utilizing pre-trained models and paired it with linear
on the basis of user feedback [8]. Using these two feature regression to further enhance the relevance of the
vectors, one based on the model weights and the other one recommended apps. Linear Regression was further applied.
based on the user feedback, Linear Regression was Accuracy was used as a metric of evaluation for the CNN
implemented. The purpose of using the Linear Regression model performance while the goal of linear regression was to
prediction method was to reduce the root mean square error minimize the root mean square error as much as possible to
Authorized licensed use limited to: Nanjing University. Downloaded on September 03,2021 at 11:51:40 UTC from IEEE Xplore. Restrictions apply.
further enhance the reliability and the quality of the apps that We would also like to express gratitude for our families
are being recommended to the users. and well-wishers. Hafiz Talha Iqbal, Rao Bilal Khalid and
There is a lot of room for improvement for the project. Syeda Saamin Burhan are specially to be mentioned for their
These improvements will lead to much better results and also constant support and for being an amazing source of
enhance the reliability of the app recommendation. Firstly, the motivation and inspiration, always. A heartfelt thank you and
user preference scale can be converted into a numeric scale expression of gratitude is to be extended for Mr. Irfan Saeed
ranging between 1-5 rather than simply using Yes and No Sheikh, Mr. Hamza Irfan Sheikh and Salman Khan. Without
options. their assistance this presentation would have never been
possible.
Secondly, the app screenshots can be divided into
subcategories along with categories. So far, the apps were REFERENCES
divided into ten categories only. The division of categories
[1] P. P. P. D. A. ,. P. Singh, "Recommender systems: an overview,
into further subcategories would further concentrate the research trends, and future directions," Int. J. Business and Systems
problem. For instance, the game category can be further Research, vol. 15, pp. 14-52, 2021.
divided into action games, puzzle games, arcade games and [2] Y. B. F.O.Isinkaye, "Recommendation systems: Principles,
educational games etc. This would amplify the feature methods and evaluation," Egyptian Informatics Journal, vol. 16,
extraction and user preference mapping and strengthen the no. 3, pp. 261-273, 2015.
model’s learning. [3] R. G. R. G. R. V. F. Y. B. Pathak, "Empirical analysis of the
impact of recommender systems on sales," J Manage Inform Syst,
So far, the project relies on only Content-based filtering vol. 27, no. 2, pp. 159-188, 2010.
for the purpose of content recommendation. The same model [4] V. J. M. Neha Sharma, "An Analysis Of Convolutional Neural
can be trained on the basis of Collaborative and Hybrid Networks For Image Classification," Procedia Computer Science,
filtering as well. It is likely that these approaches would lead vol. 132, pp. 377-384, 2018.
to better and fine-tuned results. [5] Y. R. B. C. V. Koren, "Matrix factorization techniques for
recommender systems," IEEE, vol. 42, no. 8, pp. 30-37, 2009.
Focusing on feature extraction and utilizing affective
parameters for feature extraction from the app screenshots is [6] V. W. X. X. Q. Y. Zheng, "Towards mobile intelligence: Learning
from GPS history data for collaborative recommendation.,"
another approach that can be tried to further boost the Artificial Intelligence, Vols. 184-185, pp. 17-37, 2012.
accuracy of the model and also enhance the reliability of the
[7] Y. L. D. X. X. F. R. G. Donghui Wang, "A content-based
recommendations [10]. recommender system for computer science publications,"
Knowledge-Based Systems, pp. 1-9, 2018.
Lastly, we would also like to extend this project further in
the future by using different machine learning models and [8] L. Y. A. S. Y. T. Shuai Zhang, "Deep Learning Based
Recommender System: A Survey and New Perspectives," ACM
frameworks. A comparative study of different models and Computing Surveys, pp. 1-38, 2019.
approaches would also help us pick the best one.
[9] A. S. U. Z. A. S. Q. Asifullah Khan, "A Survey of the Recent
Architectures of Deep Convolutional Neural Networks," Artificial
ACKNOWLEDGMENT Intelligence Review, vol. 53, pp. 5455-5516, 2020.
We would like to extend a heartfelt thank you to our [10] V. W. X. X. Q. Y. Zheng, "Towards mobile intelligence: Learning
from GPS history data for collaborative recommendation.,"
teachers and respective departments. Without their guidance Artificial Intelligence, Vols. 184-185, pp. 17-37, 2012.
and aid, this project would have never been possible. Special
mention of our mentors Dr. Omer Usman, Dr. Ammara Tariq
and Umama Munir.
Authorized licensed use limited to: Nanjing University. Downloaded on September 03,2021 at 11:51:40 UTC from IEEE Xplore. Restrictions apply.