Collaborative Filtering Recommendation Algorithm

Uploaded by

sibylanderson520

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views6 pages

Collaborative Filtering Recommendation Algorithm

Uploaded by

sibylanderson520

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/282275966

Collaborative ﬁltering recommendation algorithm based on Hadoop and Spark

Article · June 2015

DOI: 10.1109/ICIT.2015.7125310

CITATIONS READS
23 923

2 authors, including:

Olgierd Unold
Wroclaw University of Science and Technology
123 PUBLICATIONS 666 CITATIONS

SEE PROFILE

All content following this page was uploaded by Olgierd Unold on 29 April 2018.

The user has requested enhancement of the downloaded file.

Collaborative Filtering Recommendation Algorithm
based on Hadoop and Spark

Bartosz Kupisz Olgierd Unold

Chair of Computer Engineering Chair of Computer Engineering
Wroclaw University of Technology Wroclaw University of Technology
Wyb. Wyspianskiego 27, 50-370 Wroclaw, Poland Wyb. Wyspianskiego 27, 50-370 Wroclaw, Poland
Email: [email protected] Email: [email protected]

Abstract—The aim of this work was to develop and compare Recently published works proved that it is possible to
recommendation systems which use the item-based collaborative parallelize collaborative filtering algorithms with Hadoop tech-
filtering algorithm, based on Hadoop and Spark. Data for the nology [4], [5], which is dedicated to solving complex and
research were gathered from a real social portal the users of distributable problems. Admittedly, however, this approach
which can express their preferences regarding the applications on based on MapReduce paradigm has not favorable scalability
offer. The Hadoop version was implemented with the use of the
and computation-cost efficiency if the data size increase.
Mahout library which was an element of the Hadoop ecosystem.
The authors original solution was implemented with the use of This paper presents a new solution to item-based CF based
the Apache Spark platform and the Scala programming language. on the Apache Spark platform - a new engine for large-
The applied similarity measure was the Tanimoto coefficient
scale data processing. We develop and compare item-based
which provides the most precise results for the available data.
The initial assumptions were confirmed as the solution based on collaborative filtering algorithm using two cluster computing
the Apache Spark platform turned out to be more efficient. frameworks: Hadoop’s disk-based MapReduce paradigm and
Spark’s in-memory based RDD paradigm. The implementation
of the CF algorithm based on Hadoop platform was done using
I. I NTRODUCTION the Mahout library [6], [7]. Paralleled item-based CF algorithm
for the Apache Spark platform was implemented in the Scala
Recommendation systems are the most recognizable and programming language.
currently the most widely used technique of machine learning
[1]. They have an undeniable impact on the personalization
of content on the Internet, so prevalent nowadays. There is II. T HE M AP R EDUCE PARADIGM
great interest in them, especially in e-Commerce, as their MapReduce is a paradigm for distributed programming,
implementation raises sales by an estimate of 8 to 10 percent. used for processing large amounts of data, with the use of
During the last few years, as a huge amount of data is used groups of connected computer units, i.e. a computer cluster.
for generating recommendations, those systems are equipped Each computer unit is called a node. The paradigm was
with solutions characteristic of the problems of Big Data. introduced by the Google company in 2004 [8]. Its main asset
The task of recommendation systems is to present users is that it takes away from the programmer most problems of
with information about the items in which they might be parallel programming, such as node-to-node communication,
interested. The systems differ from one another in the way they management of cluster resources, and resistance to node fail-
analyze the available data which, in turn, make it possible to ures. The MapReduce model is inspired by the map and reduce
evaluate the degree to which users like particular products. The functions commonly used in functional programming.
techniques are used in personalized recommendation systems A programmers task is to provide the implementation of
can be classified into three basic groups: content-based, those the following two procedures (see Figure 1 for the MapReduce
based on collaborative filtering (abbreviated to CF), and hybrid workflow):
systems [2].
• map() takes a fragment of the input data expressed
This work focuses on systems which use collaborative
as a set of pairs (key, value) and, on the basis of
filtering, because of their universality. Note that in a CF
particular records, produces zero or more intermediate
algorithm, it is expensive to compute the similarity of users (or
pairs. The MapReduce library groups all intermediate
items) as the algorithm is required to search entire database to
values related to the same intermediate key K. Next,
find the potential neighbors for a target user (item). It requires
the values are transferred to the Reduce() function.
computation that increase linearly with the growth of both the
number of users and items (a CF algorithm has a worst case M ap(k1, w1) −→ list(k2, w2)
complexity of O(U I), where U is the number of users and
I is the number of items). Therefore, many algorithms are • reduce() takes the intermediate key K and a set of
either slowed down or require additional resources such as values for that key. Next, the values are joined. Each
computation power or memory. This is so-called the scalability call to the Reduce function usually returns one value,
problems, which is one of the key challenge for CFs [3]. but it can also return zero or more values.
Data
Reduce(k2, list(w2)) −→ list(w3)

The MapReduce paradigm is based on the assumption that

a large number of problems can be expressed as a finished set
of tasks (called jobs) and each job consists of the map and
reduce functions done in parallel. Data Data Data
Chunk1 Chunk2 Chunkn
The simplicity of the MapReduce paradigm makes it pos-
sible to effectively assign cluster resources to the performed
jobs and to take away the complexity of their synchronization
from the programmer but it also entails a lack of flexibility of
Map() Map() Map()
programming: a program can only consist of subsequent map
and reduce functions. Each phase of calculations can only be
begun after the completion of each instance of the previous
function. The only way to perform complex operations is to
implement a few MapReduce jobs, the output data of which Shuffle
Sorted map results Sorted map results
must be recorded on a hard disc and later uploaded again for (sorting)
the next job.
There have been several implementations of the MapRe-
duce paradigm, but the Apache Hadoop technology has been Reduce() Reduce()
the most successful for commercial purposes [9], [10].

III. C OLLABORATIVE FILTERING Output data Output data

The collaborative filtering algorithm is used in modern
recommendation systems and is very popular due to its uni- Fig. 1. The MapReduce model (based on [16])
versality (it is not dependent on the type of recommended
items) and efficiency [11], [12], [13]. It utilizes the knowl-
edge about the relationships between users and products for IV. A PARALLELED VERSION OF COLLABORATIVE
defining similarities in the evaluation methods, on the basis FILTERING
of which recommendations are made. With respect to the type In a paralleled version algorithm CF can be presented as a
of a considered similarity, the CF algorithm can be classified matrix equation:
as user-based or item-based [14]. The systems implemented
within the framework of this research used the item-based type
of the algorithm in which products are recommended to a user a b x xa yb
× =
(u) based on his or her previous purchases. The pseudocode c d ItemSim.
y U serP ref s.
xc yd U serRecs.
of the algorithm is presented below.
in which recommendations (the recommendation vector)
for each i item for which u has not expressed a preference for a given user are calculated by multiplying the similarity
for each j item for which u has expressed a preference matrix of items by the user vector (Item Sim. denotes Item
compute similarity s between i and j Similarity, User Prefs. - User Preferences, and User Recs.
add the value of preference u for j with weight s - User Recommendations). The similarity matrix of items
to the calculated weighted average is a symmetrical matrix n × n, where n is the number of
return items with the highest weighted average value items. Each element of the similarity matrix is a value of the
similarity between the two objects.
To compute a similarity between two items, we propose
the usage of the well-known Tanimoto coefficient (called also A. The Hadoop solution
Jaccard) [15]. This coefficient was chosen because of the The Mahout library [6] provides implementations of par-
highest precision for possessed data. The Tanimoto similarity alleled versions of algorithms in the field of machine learning
between two items i and j is defined as for the Hadoop platform. It focuses on classification, grouping,
and collaborative filtering algorithms. The Mahout library was
used for creating a recommendation system based on the
n(ci ∩ cj ) n(ci ∩ cj )
s(i, j) = = Apache Hadoop technology.
n(ci ∪ cj ) n(ci ) + n(cj ) − n(ci ∩ cj )
In the Mahout library, the two most important programs
which realize the paralleled CF algorithm based on items
where n(x) denotes the number of elements in the item-set are RecommenderJob (which calculates recommendations) and
(i.e. customer basket c). ItemSimilarityJob (which calculates the similarity matrix).
A full implementation of the paralleled version of the B. A description of the experiment
collaborative filtering algorithm based on items, according to
All experiments were performed on a cluster which in-
the MapReduce paradigm, is realized in the form of nine con-
cludes 3 nodes, one node as MasterNode and other 2 nodes
secutive jobs. During the realization of the program, particular
as DataNodes. Each node was equipped with Intel Xeon
data from the disc are read nine times and then recorded,
L5640 processor (2.27 Ghz, 6 cores) and 6GB of RAM. The
which has a significant impact on the reaction time of the
experiments run over Apache Hadoop Cloudera [19] version
system. That is why it was assumed that the implementation
4.1.3 (library Mahout version 0.7) and Spark version 0.7.3.
of the algorithm for the Spark platform would lead to greater
efficiency. The authors original solution of paralleled item-based CF
algorithm was implemented with the use of the Apache Spark
B. Apache Spark platform and the Scala programming language [20].
In April, 2012 (first works were conducted as early as The performance times depending on the amount of input
2010) the AMPLab at the University of California at Berke- data were measured in order to compare the efficiency of the
ley published an article [17] in which the idea of Resilient implemented systems. Subsets differing with respect to the
Distributed Datasets (abbreviated to RDD) was presented. The number of preferences were created for the experiment. The
Spark platform is an open implementation of RDD. The aim times taken to perform the computations of each system for
of the creators of the Spark platform was to remove the particular data subsets were measured 10 times. The presented
limitations of the restrictive MapReduce paradigm. Spark can values are averaged.
be viewed as a distributed computing platform in the same
context as the open-source implementation of the MapReduce Implemented solution consists of four major steps. The
paradigm Hadoop [9]. Spark is fully compatible with HDFS first step is to calculate the co-occurrences of items within
[18]. the preferences of individual users.

RDD is a partitioned collection of data which means that val ratings = file.map(line => {
val fields = line.split("\t")
particular elements of a collection can be divided among (fields(0).toInt, fields(1).toInt)
cluster nodes and processed in parallel. Additionally, we can })
distinguish the following characteristics of RDD: val ratings2 = ratings
val ratingPairs = ratings.join(ratings2).filter
{ case (user, (item1, item2)) =>
• They can only be created by operating on other sets item1 < item2 }
or while reading data from a file system because RDD
sets cannot be modified. Then, each item is assigned to the number of the expressed
• They are processed lazily and each set stores informa- preferences. For optimization reasons, this set is distributed
tion about the operations made on input data, which among all nodes via a variable of type Brodcast.
makes it possible to recover a lost part of RDD in case
val numRatersPerItem = ratings.groupBy { case (user, item) =>
of a node failure. item }.map { case (item, ratingsPerItem) =>
(item, ratingsPerItem.size) }
• Intermediate results of operations can be stored in val numRatersPerItemBC =
cache memory so computations for iterative algo- sc.broadcast(Map(numRatersPerItem.collect(): _*))
rithms can be made much faster.
Each pair of co-occurring items is assigned to a list of
V. E XPERIMENTAL STUDIES proper users.
The aim of the work was to implement and compare val usersForRatingPair = ratingPairs.map
systems which generate recommendations with the use of { case (user, (item1, item2)) =>
((item1, item2), user) }.groupByKey()
distributed computing platforms, Hadoop, and Spark. Because
of the limited time of access to the computing cluster the
implemented systems were limited to computing the similarity Using the received RDDs, it is possible to calculate the
matrices of items. similarity between items.
val similaritiesInput = usersForRatingPair.map
A. A description of the data { case (pair, users) =>
(pair, (users.size, numRatersPerItemBC.value(key._1),
The data for the work were taken from an Internet portal numRatersPerItemBC.value(key._2))) }
in which registered users can use the available entertainment val similarities = similaritiesInput.map
{ case (pair, (size, numRaters1, numRaters2)) =>
applications, such as games, quizzes, or tests. The portal users (pair, jaccardSimilarity(size, numRaters1, numRaters2)) }
can like a given application. The preferences used for providing
recommendations for users are expressed as Boolean values. C. The results
The set of data consists of 9,644,727 preferences (likes)
Figure 2 shows the results of the research for a system
expressed by 1,148,320 users with respect to 730 applications.
which generates an item similarity matrix and is implemented
The data are stored in the form of a text file. Particular
in the Apache Spark technology, compared to the ItemSimi-
preferences are separated by the new line sign (\n). A prefer-
larityJob program from the Mahout library.
ence consists of a user identifier and an application identifier,
separated by the tab sign (\t). User and application identifiers As the Spark platform has a very limited access to the
are unique integers. The size of the data is 106.5 MB. cluster resources and there is no possibility of modifying the
amount of input data were made for almost 10 million out of
145 million of the preferences present in the available data.
The conducted analyzes of the systems showed that the
system based on the Spark platform was more efficient, as had
been presumed. Although our implementation of Spark is still
a prototype, early experience with the system is encouraging.
We show that Spark can outperform Hadoop up to 10x for a
smaller number of examined preferences (ca. 2 mln)! For full
credibility, the research should be conducted with the use of a
greater amount of data (and available cluster memory).
In the course of this work it was noticed that Apache
Spark platform has features that justify its growing interest
in the world of Big Data. In-memory based RDD paradigm
Fig. 2. A breakdown of the dependence of the performance time of allows for more efficient work allocation between nodes,
computations on the amount of input data for ItemSimilarityJob and the Spark reducing costly read/write operations of intermediate results
implementation on hard drives, which directly translates into increased jobs
productivity. Furthermore, implementation of algorithms on
Spark is much more developer-friendly, because there is no re-
cluster configuration, the experiments have not been conducted quirement to express jobs in successive mapping and reduction
with the full amount of possessed data. The most numerous functions. Instead, the developer uses RDD transformations
dataset processed in its entirety was a set consisting of nearly and actions, which are largely based on Scala’s collections
10 million of preferences. The averaged time of performing the API. In our opinion, functional programming style seems to be
computations in the ItemSimilarityJob program was 3 minutes more natural for parallel computing. Last but not least, Spark
19 seconds, and in the Spark implementation – 1 minute 45 provides interactive shell (REPL), which allows for quick jobs
seconds, so the answer time was shortened almost by half in prototyping, and encourages experimentation.
the case of the Spark platform. The lower the amount of input
data, the greater was the difference. Further work on that topic should include completing the
implementation of the collaborative filtering algorithm for the
D. Conclusions Spark platform and a repeating the analyzes with the use of
a greater amount of data. Another possibility is to compare
It ought to be noted that, despite the considerable improve- the obtained data with the results of implementations of other
ment of efficiency when the Spark platform was used, the algorithms used in recommendation systems.
results do not represent the whole picture. The direct reason for
it is that the amount of input data for which the research was
done was significantly limited the obtained characteristic of R EFERENCES
the Hadoop implementation does not show a linear dependence
[1] C. Sammut and G. I. Webb, Encyclopedia of Machine Learning.
of time on the amount, and the increase of time between Springer, 2011.
700,000 and 10,000,000 preferences was only 20 seconds. [2] (2014, November) IBM what is big data? Bringing big data to the
Therefore, we can conclude that in the case of running a enterprise. [Online]. Available: http://www.ibm.com/big-data/us/en/
program consisting of five MapReduce tasks, which processed [3] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl, “Item-based collabo-
a small amount of data, most of the time was consumed by rative filtering recommendation algorithms,” in Proceedings of the 10th
actions such as starting particular MapReduce jobs, synchro- international conference on World Wide Web. ACM, 2001, pp. 285–
nizing them, nodes communication, and reading and recording 295.
operations, and not directly for the computations. In order to [4] Z.-D. Zhao and M.-S. Shang, “User-based collaborative-filtering recom-
obtain fully credible results, the measurements would have to mendation algorithms on hadoop,” in Knowledge Discovery and Data
Mining, 2010. WKDD’10. Third International Conference on. IEEE,
be repeated on a greater scale. 2010, pp. 478–481.
[5] J. Jiang, J. Lu, G. Zhang, and G. Long, “Scaling-up item-based
VI. C ONCLUSIONS AND FUTURE WORK collaborative filtering recommendation algorithm based on hadoop,” in
Services (SERVICES), 2011 IEEE World Congress on. IEEE, 2011,
Because of the limited time of access to the computer pp. 490–497.
cluster used for the realization of the work, the item-based [6] S. Owen, R. Anil, T. Dunning, and E. Friedman, Mahout in action.
CF algorithm based on the Apache Spark technology was Manning Publications Co., Greenwich, CT, USA, 2011.
restricted to an item similarity matrix which can be used [7] (2014, November) Apache mahout documentation. [Online]. Available:
for generating recommendations on the fly, separately for http://mahout.apache.org/
each user. The Mahout library provides an analogous program [8] J. Dean and S. Ghemawat, “Mapreduce: simplified data processing on
named ItemSimilarityJob which was used for the comparative large clusters,” Communications of the ACM, vol. 51, no. 1, pp. 107–
analysis. 113, 2008.
[9] (2014, November) Hadoop 1.1.2 documentation. [Online]. Available:
Another problem encountered during the realization of http://hadoop.apache.org/docs/stable/
the project was the limit of random access memory on the [10] T. White, Hadoop: The definitive guide. O’Reilly Media, Inc., 2012.
cluster for the Spark platform. Therefore, the measurements [11] P. Resnick and H. R. Varian, “Recommender systems,” Communications
of the performance time of the computations dependent on the of the ACM, vol. 40, no. 3, pp. 56–58, 1997.
[12] H. Tan and H. Ye, “A collaborative filtering recommendation algorithm
based on item classification,” in Circuits, Communications and Systems,
2009. PACCS’09. Pacific-Asia Conference on. IEEE, 2009, pp. 694–
697.
[13] S. Gong, H. Ye, and H. Tan, “Combining memory-based and model-
based collaborative filtering in recommender system,” in Circuits, Com-
munications and Systems, 2009. PACCS’09. Pacific-Asia Conference on.
IEEE, 2009, pp. 690–693.
[14] R. V. Tatiya and A. S. Vaidya, “A survey of recommendation algo-
rithms,” IOSR Journal of Computer Engineering, vol. 16, no. 6, pp.
16–19, 2014.
[15] L. Kaufman and P. J. Rousseeuw, Finding groups in data: an introduc-
tion to cluster analysis. John Wiley & Sons, 2009, vol. 344.
[16] T. Kajdanowicz, W. Indyk, and P. Kazienko, “Mapreduce approach to
relational influence propagation in complex networks,” Pattern Analysis
and Applications, pp. 1–8, 2012.
[17] M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma,
M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica,
“Resilient distributed datasets: A fault-tolerant abstraction for in-
memory cluster computing,” in Proceedings of the 9th USENIX
conference on Networked Systems Design and Implementation.
USENIX Association, 2012, pp. 2–2. [Online]. Available:
http://www.cs.berkeley.edu/˜matei/papers/2012/nsdi spark.pdf
[18] (2014, November) Spark documentation. [Online]. Available:
http://spark.apache.org/documentation.html
[19] (2014, November) Cloudera official web site. [Online]. Available:
http://www.cloudera.com/content/cloudera/en/home.html
[20] (2014, November) Scala programming language. [Online]. Available:
http://www.scala-lang.org

View publication stats

Book Recommender System Using Hadoop
100% (7)
Book Recommender System Using Hadoop
55 pages
Recommendation System Using Collaborative Filtering
No ratings yet
Recommendation System Using Collaborative Filtering
49 pages
2015 Liulu Ms
No ratings yet
2015 Liulu Ms
63 pages
Decentralising Big Data Processing
No ratings yet
Decentralising Big Data Processing
59 pages
Chapter 02 - Collaborative Recommendation
No ratings yet
Chapter 02 - Collaborative Recommendation
48 pages
Project Final Presentation (Autosaved)
No ratings yet
Project Final Presentation (Autosaved)
19 pages
VijayVerma CFRSfinal Formatted
No ratings yet
VijayVerma CFRSfinal Formatted
22 pages
Collaborative Filtering Beyond The User-Item Matrix A Survey of The State of The Art and Future Challenges
No ratings yet
Collaborative Filtering Beyond The User-Item Matrix A Survey of The State of The Art and Future Challenges
45 pages
A Survey of Collaborative Filtering-Based Recommender Systems From Traditional Methods To Hybrid Methods Based On Social Networks
No ratings yet
A Survey of Collaborative Filtering-Based Recommender Systems From Traditional Methods To Hybrid Methods Based On Social Networks
20 pages
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
No ratings yet
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
8 pages
Big Data Recommendation
No ratings yet
Big Data Recommendation
9 pages
61.a Big Data Analysis Method Based On Modified
No ratings yet
61.a Big Data Analysis Method Based On Modified
9 pages
UNIT3
No ratings yet
UNIT3
37 pages
Ijcse 2020 105727
No ratings yet
Ijcse 2020 105727
7 pages
6731 Documentation Seminar
No ratings yet
6731 Documentation Seminar
27 pages
Clubcf: A Clustering-Based Collaborative Filtering Approach For Big Data Application
No ratings yet
Clubcf: A Clustering-Based Collaborative Filtering Approach For Big Data Application
12 pages
Movie Recommender Engine Using Collaborative Filtering: Smart Innovation October 2018
No ratings yet
Movie Recommender Engine Using Collaborative Filtering: Smart Innovation October 2018
9 pages
Advances in Artificial Intelligence - 2009 - Su - A Survey of Collaborative Filtering Techniques
No ratings yet
Advances in Artificial Intelligence - 2009 - Su - A Survey of Collaborative Filtering Techniques
19 pages
Collab Survey
No ratings yet
Collab Survey
19 pages
10 26599@bdma 2018 9020012
No ratings yet
10 26599@bdma 2018 9020012
9 pages
A Deep Learning Model For Context Understanding in Recommendation Systems
No ratings yet
A Deep Learning Model For Context Understanding in Recommendation Systems
13 pages
Apply Hadoop For Creating Recommandation System
No ratings yet
Apply Hadoop For Creating Recommandation System
10 pages
Survey On Collaborative Filtering Technique in Recommendation System
No ratings yet
Survey On Collaborative Filtering Technique in Recommendation System
7 pages
A Survey On Recommendation System For Bigdata Using MapReduce Technology
No ratings yet
A Survey On Recommendation System For Bigdata Using MapReduce Technology
5 pages
Types of Recommendation Systems
No ratings yet
Types of Recommendation Systems
13 pages
Collaborative Filtering Approach For Big Data Applications in Social Networks
No ratings yet
Collaborative Filtering Approach For Big Data Applications in Social Networks
5 pages
Review of Clustering-Based Recommender Systems
No ratings yet
Review of Clustering-Based Recommender Systems
22 pages
A Hybrid Distributed Collaborative Filtering Recommender Engine Using Apache Spark
No ratings yet
A Hybrid Distributed Collaborative Filtering Recommender Engine Using Apache Spark
7 pages
Item Based Collaborative Filtering Research
No ratings yet
Item Based Collaborative Filtering Research
4 pages
Recommendation System Using Hadoop-2
No ratings yet
Recommendation System Using Hadoop-2
8 pages
Recommendation System Techniques and Related Issues A Survey
No ratings yet
Recommendation System Techniques and Related Issues A Survey
7 pages
1 s2.0 S156849462200076X Main
No ratings yet
1 s2.0 S156849462200076X Main
8 pages
Big Data Based Retail Recommender System of Non E-Commerce: IEEE - 33044
No ratings yet
Big Data Based Retail Recommender System of Non E-Commerce: IEEE - 33044
7 pages
Recommendation System
No ratings yet
Recommendation System
19 pages
International Journal of Computational Engineering Research (IJCER)
No ratings yet
International Journal of Computational Engineering Research (IJCER)
6 pages
A Survey and Critique of Deep Learning On Recommender Systems
No ratings yet
A Survey and Critique of Deep Learning On Recommender Systems
31 pages
Book Recommendation Using Collaborative Filtering IJERTV12IS040195
No ratings yet
Book Recommendation Using Collaborative Filtering IJERTV12IS040195
6 pages
AI Based Recommender
No ratings yet
AI Based Recommender
16 pages
Apache Mahout Essentials - Sample Chapter
No ratings yet
Apache Mahout Essentials - Sample Chapter
25 pages
CT 2
No ratings yet
CT 2
8 pages
Advances in Artificial Intelligence - 2009 - Su - A Survey of Collaborative Filtering Techniques
No ratings yet
Advances in Artificial Intelligence - 2009 - Su - A Survey of Collaborative Filtering Techniques
19 pages
Common Commands in ICC2 2 Place Stage
No ratings yet
Common Commands in ICC2 2 Place Stage
5 pages
2018, Qiao - Research On Personalized Recommendation of Distance Education Resources Based On Spark
No ratings yet
2018, Qiao - Research On Personalized Recommendation of Distance Education Resources Based On Spark
5 pages
Online Book Recommendation System Using Collaborative Filtering (With Jaccard Similarity)
No ratings yet
Online Book Recommendation System Using Collaborative Filtering (With Jaccard Similarity)
9 pages
Big Data
No ratings yet
Big Data
3 pages
Major Project Abstract
No ratings yet
Major Project Abstract
6 pages
Watkins NIST CSF Excel User Guide v6.0
No ratings yet
Watkins NIST CSF Excel User Guide v6.0
15 pages
Power System Load Flow Analysis Using Microsoft Excel
100% (2)
Power System Load Flow Analysis Using Microsoft Excel
21 pages
Labs Practical of Aws
No ratings yet
Labs Practical of Aws
3 pages
Discovering Computers 2012: Your Interactive Guide To The Digital World
No ratings yet
Discovering Computers 2012: Your Interactive Guide To The Digital World
52 pages
CS QP Xii 2020
100% (1)
CS QP Xii 2020
10 pages
Grade 9 AI QP Pattern and Unit 1 - Into To AI
No ratings yet
Grade 9 AI QP Pattern and Unit 1 - Into To AI
64 pages
WT Da All Practical Questions
100% (2)
WT Da All Practical Questions
100 pages
Playwright With Java
No ratings yet
Playwright With Java
6 pages
Steam Turbine - Various Applications
No ratings yet
Steam Turbine - Various Applications
2,388 pages
Dfs I Fusion v3 I Sep '22
No ratings yet
Dfs I Fusion v3 I Sep '22
8 pages
History of Digital Prepress
No ratings yet
History of Digital Prepress
4 pages
Consensus Map For Grade 3 Final
No ratings yet
Consensus Map For Grade 3 Final
3 pages
Touchless Touch Screen Technology
No ratings yet
Touchless Touch Screen Technology
15 pages
Emasters in Data Science Data Analytics
No ratings yet
Emasters in Data Science Data Analytics
12 pages
Ajju-Smart Money Concept
No ratings yet
Ajju-Smart Money Concept
32 pages
WT Sem 3-2
No ratings yet
WT Sem 3-2
64 pages
Newsletter - Moms Club of Eugene
No ratings yet
Newsletter - Moms Club of Eugene
6 pages
Fastest: Tested DNS Servers
No ratings yet
Fastest: Tested DNS Servers
9 pages
IRP Hi-Dynamic Product Description Revision 01-B
No ratings yet
IRP Hi-Dynamic Product Description Revision 01-B
16 pages
ONTAP 9.10.1 Performance Tech Spec
No ratings yet
ONTAP 9.10.1 Performance Tech Spec
1 page
Algorithm For Page Replacement
No ratings yet
Algorithm For Page Replacement
9 pages
Bluespec Compiler, Bluesim, Emvm, and Development Workstation Release Notes
No ratings yet
Bluespec Compiler, Bluesim, Emvm, and Development Workstation Release Notes
5 pages
Resume Aditya Dewangan 5y4m
No ratings yet
Resume Aditya Dewangan 5y4m
3 pages
Infoman Report
No ratings yet
Infoman Report
17 pages
COMP3331 Assignment
No ratings yet
COMP3331 Assignment
10 pages
Ekos Faq 2022
No ratings yet
Ekos Faq 2022
1 page
Using "Audacity®" For Language Teaching
No ratings yet
Using "Audacity®" For Language Teaching
28 pages
VLAN
No ratings yet
VLAN
31 pages
Multitech Conduit Ip67 Base Station: 16-Channel V2.1 Geolocation Eu868 For Europe
No ratings yet
Multitech Conduit Ip67 Base Station: 16-Channel V2.1 Geolocation Eu868 For Europe
4 pages
MATLAB Data Science
From Everand
MATLAB Data Science
Henry Codwell
No ratings yet
Flutter Full-Stack
From Everand
Flutter Full-Stack
HAROLD WHITES
No ratings yet
Programming AI Workloads with Habana Gaudi SDK: The Complete Guide for Developers and Engineers
From Everand
Programming AI Workloads with Habana Gaudi SDK: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Graphcore Poplar Programming and Optimization: The Complete Guide for Developers and Engineers
From Everand
Graphcore Poplar Programming and Optimization: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Accelerated Computing with HIP
From Everand
Accelerated Computing with HIP
Yifan Sun
4.5/5 (2)
Learning Hadoop 2
From Everand
Learning Hadoop 2
Garry Turkington
4/5 (1)
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)
Graph Layout Support for Model-Driven Engineering
From Everand
Graph Layout Support for Model-Driven Engineering
Miro Spönemann
No ratings yet
GraphX in Practice: Definitive Reference for Developers and Engineers
From Everand
GraphX in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Principles of MapReduce Systems: Definitive Reference for Developers and Engineers
From Everand
Principles of MapReduce Systems: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Efficient Parallel Computing with Dask: Definitive Reference for Developers and Engineers
From Everand
Efficient Parallel Computing with Dask: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
OpenACC Programming Essentials: Definitive Reference for Developers and Engineers
From Everand
OpenACC Programming Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
StarPU: Parallel Computing and Task Scheduling Techniques
From Everand
StarPU: Parallel Computing and Task Scheduling Techniques
Richard Johnson
No ratings yet
Practical High Performance Computing: Definitive Reference for Developers and Engineers
From Everand
Practical High Performance Computing: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet

Collaborative Filtering Recommendation Algorithm

Uploaded by

Collaborative Filtering Recommendation Algorithm

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Collaborative ﬁltering recommendation algorithm based on Hadoop and Spark

Article · June 2015

The user has requested enhancement of the downloaded file.

Bartosz Kupisz Olgierd Unold

The MapReduce paradigm is based on the assumption that

III. C OLLABORATIVE FILTERING Output data Output data

View publication stats

You might also like