0% found this document useful (0 votes)

22 views9 pages

61.a Big Data Analysis Method Based On Modified

This document discusses a big data analysis method based on a modified collaborative filtering recommendation algorithm. It first provides background on big data and recommendation systems. It then discusses related work in recommendation systems and collaborative filtering algorithms. Specifically, it notes that traditional collaborative filtering algorithms using Pearson correlation can have errors, so this study modifies the similarity value calculation using normal recovery similarity measure to address this issue. The document proposes implementing this modified collaborative filtering algorithm in a Hadoop cloud environment to improve efficiency for large-scale data. It measures execution times with different numbers of nodes and compares to a single machine to analyze speedup and efficiency.

Uploaded by

sk k

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views9 pages

61.a Big Data Analysis Method Based On Modified

Uploaded by

sk k

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Open Phys.

2019; 17:966–974

Research Article

Nan Yin*

A Big Data Analysis Method Based on Modified

Collaborative Filtering Recommendation
Algorithms
https://doi.org/10.1515/phys-2019-0102
Received Oct 18, 2019; accepted Nov 20, 2019
1 Introduction
Abstract: With the rapid development of e-commerce, With the progress of information technology, big data is
collaborative filtering recommendation system has been also called large data, which refers to a large amount of
widely used in various network platforms. Using recom- information. When the amount of data is so complex that
mendation system to accurately predict customers’ pref- the database system cannot store, calculate, process, and
erences for goods can solve the problem of information analyze the information that can be interpreted in a rea-
overload faced by users and improve users’ dependence sonable time, it is called big data. These massive data con-
on the network platform. Because the recommendation tain useful information, such as unknown correlation, hid-
system based on collaborative filtering technology has the den patterns, potential market trends, etc., which may con-
ability to recommend more abstract or difficult to describe tain unprecedented knowledge and applications waiting
goods in words, the research related to collaborative filter- to be discovered [1]. However, due to the huge amount
ing technology has attracted more and more attention. of data and the rapid flow of data, traditional technology
According to the past research, in collaborative filtering al- is often unable to conduct efficient processing and analy-
gorithm, if Pearson correlation coefficient is used, errors sis, prompting relevant researchers to constantly develop
will occur under special circumstances. In this study, the a new generation of data storage equipment and technol-
normal recovery similarity measure is used to modify the ogy, hoping to extract those valuable information from
similarity value to correct the error value of a collaborative large data. Many companies are committed to meeting the
filtering recommendation algorithm. Based on this, a big needs of consumers. To satisfy the needs of consumers, the
data analysis method based on a modified collaborative researcher must first understand what users need. How
filtering recommendation algorithm is proposed. This re- to recommend what consumers need or like is the most
search implemented it in the cloud Hadoop environment, important step to satisfy the needs of consumers. The re-
and measure the execution time with 2, 5 and 8 nodes. searcher can make recommendations through the habits
Then the research compared it with the execution time of and preferences of consumers. Quantitative data can be
a single machine, and analyze its speedup ratio and effi- used as the basis for our analysis, and big data analysis
ciency. The experimental results show that the execution has become a link closely related to life.
time increases with the number of neighbors. When the Due to the explosive growth of digital information and
number of nodes is 5 and 8, the execution time is greatly the increasing number of visitors using the network, infor-
improved, which improves the efficiency of collaborative mation overload has become a potential challenge nowa-
filtering algorithm and can cope with massive data in the days. People want to get interesting information on the net-
future. work in real-time, which is also the main reason for the
increasing demand for recommendation systems. The rec-
Keywords: collaborative filtering; big data; cloud environ-
ommendation system can filter out important and useful
ment
information according to users’ preferences and interests.
PACS: 89.70.Eg, 83.85.Ns, 89.75.-k Therefore, the recommendation system can solve the prob-
lem of information overload. In addition, the recommen-
dation system can also predict products that may be of in-
terest to a particular user, depending on other users who
*Corresponding Author: Nan Yin: Business School, Nanjing Xi- have similar preferences with that user. That is to say, the
aozhuang University, Nanjing 211171, China;
content-based filtering and collaborative filtering are com-
Email: [email protected]

Open Access. © 2019 N. Yin, published by De Gruyter. This work is licensed under the Creative Commons Attribution 4.0 License
A Big Data Analysis Method Based on Modified Collaborative Filtering Recommendation Algorithms | 967

mon methods in the recommendation system. For users,

recommendation system can greatly shorten their time to
2 Discussions on Related Literature
browse a large amount of information and quickly select
products suitable for them; For service providers, import- 2.1 Recommendation System
ing recommendation system can help their customers find
products of interest in real-time, so that more consumers The recommendation system is a reference for recommend-
will be willing to buy products on the service platform and ing and providing consumers to buy goods. These sugges-
become loyal customers. tions are based on many decisions, such as what products
With the rapid development of the Internet, it also rep- do consumers buy? Which movie did the consumer see? Al-
resents that there are many open resources on the network, ternatively, what articles do consumers read online? Due
and the high proportion of new information increases, and to the explosive growth of digital information and the in-
there is no way to compare and analyze the filtering infor- creasing number of visitors using the network, informa-
mation, which makes it difficult for users to distinguish tion overload has become a potential challenge nowadays.
and filter the appropriate information, which also shows People want to get interesting information on the network
another common discussion topic of the Internet informa- in real-time, which is also the main reason for the increas-
tion overload. People search the resources on the network ing demand for recommendation systems. The recommen-
by the help of search engine, and recommendation system dation system can filter out important and useful informa-
is a kind of concept that provides the information needed tion according to users’ preferences and interests. There-
by users actively [2]. fore, the recommendation system can solve the problem
In order to meet the needs of different users in big data, of information overload. In addition, the recommendation
recommendation algorithms are generated, among which system can also predict products that may be of interest to
the collaborative filtering recommendation algorithm is a particular user, depending on other users who have sim-
one of them. Current collaborative recommendation algo- ilar preferences with that user [3].
rithms focus on the design of personal computers. In or- Recommendation system can greatly shorten the time
der to cope with the trend of massive data, the system for users to browse a large amount of information and
can know the user’s interests at the moment and meet the quickly select products suitable for them. For service
user’s needs in time. The speed of data processing is the providers, importing recommendation system can help
decisive key. The execution speed of the PC cannot meet their customers find products of interest in real-time so
the real-time requirement, so the combination of cloud that more consumers will be willing to buy products on the
and collaborative filtering algorithm has the value of im- service platform and become loyal customers. The opera-
plementation. tion process of recommendation system is as follows: first
According to past research, in a collaborative filtering collect user’s information, including preferences and pur-
algorithm, if the Pearson correlation coefficient is used, er- chased products, etc., then the system will learn and build
rors will occur in special cases. In this study, the Normal models independently, and finally predict products that
Recovery Similarity Measure is used to modify the similar- users may be interested in and recommend them, while the
ity value to correct the error value of the collaborative filter- system will collect user’s selected data and go back to the
ing recommendation algorithm, which is the basis of the first stage for repeated execution [4].
collaborative filtering algorithm. In order to reduce the additional cost of searching in-
There are two main purposes of this study. The first formation, the recommendation system can recommend
purpose of the research is to measure the running time potential information, services or products that users may
with 2, 5 and 8 nodes in the cloud Hadoop environment, need according to their preferences, interests, behaviors or
compare with the running time of a single computer, and needs [5]. Recommendation system is a system that helps
then analyze its acceleration and efficiency. The second users filter information. Its core task is not only to filter
purpose of the research is to analyze the prediction re- information effectively, but also to find out users’ pref-
sults by using three algorithms: the Jaccard similarity coef- erences and give users interested information [6]. With
ficient, Pearson similarity and Normal recovery similarity the support of recommender system, the flooding of in-
measure. formation and the complexity of online search can be re-
duced [7], and the convenience of searching and filtering
network data can be improved.
According to different methods, common recommen-
dation systems are divided into three types: collaborative
968 | N. Yin

filtering, content filtering and knowledge-based recom- positive integer. Usually, the higher the score is, the more
mendation. Content-based filtering represents the user’s positive the user likes to give feedback. If a user u i fails to
preferences by the characteristics of the project, sum- score an item p i , then r ij = 0, the information is stored and
marizes the user’s preferences through the user’s click expressed in the form of R:
through records or viewing times, and finds the items ⎛ ⎞
that meet the preferences as recommendations [8]. The r11 r12 · · · r1M
⎜r
⎜ 21 r22 · · · r2M ⎟
⎟
characteristic of collaborative filtering is to collect users’
r r32 · · · r3M ⎟
⎜ ⎟
evaluation of the project to evaluate users’ preference R=⎜ ⎜ 31 (1)
⎜ .. .. .. ⎟
⎟
model, and to evaluate the possible score of the project ⎝ . . . ⎠
by the same user group. The final knowledge-based recom- r N1 r N2 · · · r NM
mender can explain the relationship between needs and
recommended textbooks, and recommend specific text- The main purpose of collaborative filtering is to gener-
books to suitable users. In the process, learners contribute ate a list of product recommendation sequences for each
their own preference model, so that the recommendation user based on the information of a user’s item score matrix.
system can interact with it [9]. For this purpose, each collaborative filtering recommenda-
tion system will have an algorithm to predict the score of
each user to each item. Rating is used to generate a list of
2.2 Collaborative filtering algorithm for the recommendations.
Traditional collaborative filtering recommendation
recommendation system
will find similar items or users according to the similar-
Collaborative Filtering refers to other users’ past prefer- ity comparison between users or objects. The most basic
ences to other users based on their similar interests. The way is to add up and average the scores of similar users
similarity between the two is calculated by each user’s on items, and then get the scores of these users on the
past score on the item, which is used to calculate the simi- items, although it is reasonable and very theoretical. Ef-
larity between users. Collaborative filtering can be divided fective methods, but in the actual recommendation system
into user-based filtering and item-based filtering. Collab- data, the serious sparse data makes the similarity almost
orative filtering aims at identifying other users who have impossible to complete the comparison, and a large num-
similar preferences with target users, while Schafer et al. ber of users and items lead to a very time-consuming com-
argues that the recommendation of people-to-people cor- puting process.
relation refers to the relevance of users’ purchases on e-
commerce websites [10].
O’Donovan & Smyth [11] pointed out that collabora- 2.3 User-based Collaborative Filtering
tive filtering recommendation, also known as social filter- Algorithms
ing recommendation, is mainly based on user experience
or suggestions with similar attributes or interests as the User-based collaborative filtering algorithm is suitable
basis of providing personalized information. By recording when the number of items is much larger than that of
and comparing user product or service preference data, users, and users change less; Project-based Collaborative
users are divided into several communities with high de- filtering algorithm is suitable when the number of users is
gree of internal user relevance Cooperation recommenda- much larger than that of items, and the number of items
tion reference. Herlocker, konstan, & Riedl [12] also men- changes less. Because the number of items in this exper-
tioned that collaborative filtering system is to predict the iment is large and fixed, the user-based collaborative fil-
user’s preference for a certain transaction or information tering algorithm is adopted in this paper. User-based Col-
by connecting a group of people who have common inter- laborative Filtering, first proposed by Schafer et al. [14]
ests with the user. Herlocker, et al. [13] pointed out that the refers to a recommendation based on the similarity of pref-
operation principle of collaborative filtering is to automate erences between users. For example, recommend products
the process of word-of-mouth effect, and the suggestions that a consumer might like based on the relevance of goods
made by the system are based on the preferences of other purchased by other consumers on e-commerce websites.
users with similar preferences. The algorithm uses all User and Item databases to predict
Assuming that there is a user set of N users u i , 1 ≤ i ≤ User’s Item score. The most commonly used technique is
N, and an item set of M items p j , 1 ≤ j ≤ M, a user u i will the Nearest Neighbor Method, which identifies the users
express his/her idea of an item p jj as a score, but r ij as a who scored similar items and all scored similar items, i.e.
A Big Data Analysis Method Based on Modified Collaborative Filtering Recommendation Algorithms | 969

the users’ neighbors. Then the user predicts these items Then, the gradient descent is used to update x i
through other items scored by neighbors and uses the Top- [︃
(i) (i)
∑︁ (︂(︁ (j) )︁T (i) )︂ (︁ )︁
T
N recommendation method to recommend the first N items x =x −α θ x − y(i,j) θ(j) (6)
of interest. j:r(i,j)=1
The basic idea of user-based collaborative filtering al- n (︁
]︃
)︁
(i)
∑︁
gorithm is that if user A likes item a, user B likes item a, +λ xk
b, c, and user C likes item a and c, then user A is similar k=1

to user B and C because they both like a, and user who The resulting x i is the feature of the movie i.
likes a likes c, so recommend C to user A. The algorithm
uses the nearest-neighbor algorithm to find a user’s neigh-
bor set. The users of the set have similar preferences with 2.4 Collaborative Filtering Program
the user. The algorithm predicts the user according to the
neighbor’s preferences. The first step is similarity calculation: similarity calcula-
The mathematical model of collaborative filtering rec- tion between users or projects is the key step of collabora-
ommendation algorithm can be expressed as follows: for tive filtering. In collaborative filtering, common methods
each user, its optimization goal is: include cosine similarity, advanced cosine similarity and
∑︁ (︂(︁ (j) )︁T (i) )︂2 Pearson correlation coefficient.
1
J (j) = min θ x − y (i,j)
(2) The second step is neighbor selection: as long as dif-
θ(j) 2m (j)
i:r(i,j)=1 ferent users join the neighborhood, the accuracy of pre-
n (︁
λ )︁2 diction will change. Therefore, the researcher should care-
θ k (j)
∑︁
+
2m(j) fully select some neighbor active user methods, the tra-
k=1
ditional Top-N algorithm in N-neighbor prediction. In ad-
Among them, the θ j denotes the preference character- dition, people in different countries or regions are more
istics of the user j, x i denotes the characteristics of the likely to have different preferences. Therefore, when se-
movie i, y(i,j) denotes the rating of the user j on the movie lecting neighbors for active users, it is necessary to con-
i, i : r(i, j) = 1 denotes that the user j has rating on the sider the location of users. Because of the development of
movie i (not missing value), and m j denotes the number of mobile network, location information can be obtained by
the user j rating the movie. Since the left and right terms mobile client or IP address and sent to server for further
have m j , the above formula can also be written as follows: analysis. Usually, users can be divided into multiple parti-
(︂(︁ )︁
T
)︂2 tions according to their location. Users in the same parti-
(j) 1 ∑︁ (j) (i) (i,j)
J = min θ x −y (3) tion have priority in neighbor selection.
θ(j) 2
i:r(i,j)=1 The third step is prediction: based on neighborhood
n
λ ∑︁ (︁ (j) )︁2 similarity and score, rank the scores.
+ θk
2 The forth step is project ranking: Once the forecast is
k=1
obtained, the recommendation system needs to rank all
Then, the gradient descent is used to update θ j , and θ j items according to the forecast score. In order to improve
is the preference feature of the user j. the diversity of suggestions, projects with larger predic-
(︂(︁ )︁
T
)︂2 tions and lower popularity should rank higher.
(j) 1 ∑︁ (j) (i) (i,j)
J = min θ x −y (4) The fifth step is selecting the first n items: After sort-
θ(j) 2
i:r(i,j)=1 ing all the options, the first n items are provided to the
n
λ ∑︁ (︁ (j) )︁2 user, where n is the default parameter required before rec-
+ θk
2 ommending the task.
k=1

If the user’s preference for θ is known, then the step

can learn the movie’s feature x. For each movie, the opti- 2.5 Computation of Similarity
mization function is
(︂(︁ )︁ )︂2 As for the calculation of similarity, the existing basic meth-
1 ∑︁ T
J (i) = min θ(j) x(i) − y(i,j) (5) ods are based on vectors. In fact, the distance between two
x(i) 2
j:r(i,j)=1
vectors is calculated. The closer the distance is, the greater
n
λ ∑︁ (︁ (i) )︁2 the similarity is. In the two-dimensional user-item prefer-
+ xk
2 ence matrix of the recommended scenario, the researcher
k=1
970 | N. Yin

can use a user’s preference for all items as a vector to cal- The Jaccard distance, which measures dissimilarity be-
culate the similarity between users, or use all users’ pref- tween sample sets, is complementary to the Jaccard coeffi-
erence for one item as a vector to calculate the similarity cient and is obtained by subtracting the Jaccard coefficient
between items. from 1, or, equivalently, by dividing the difference of the
sizes of the union and the intersection of two sets by the
size of the union:
2.5.1 Pearson correlation coeflcient |A ∪ B| − |A ∩ B|
d J (A, B) = 1 − J(A, B) = (10)
|A ∪ B|
Pearson correlation coefficient has two concepts, one is
size or strength. In terms of absolute value, the greater the
absolute value, the higher the correlation between the two; 3 Improvement of Collaborative
the smaller the value, the lower the correlation between
the two. One is the direction symbol, that is, when the coef- Filtering Algorithms by Normal
ficients are positive or negative, the relationship between Restoration Similarity Measure
the two directions changes in the positive direction, one
becomes larger, one becomes smaller, and the other be- There are many different similarity algorithms in collabora-
comes smaller, which is called positive correlation; Nega- tive filtering algorithm. The core concept of Jacquard simi-
tive values change in reverse, one becomes larger and the larity coefficient can be seen from the following formulas:
other smaller. The smaller one is, the larger the other is,
A∩B
which is called negative correlation. If it is zero, one be- J(A, B) = (11)
A∪B
comes smaller and the other may become larger or smaller The number of items scored by user A and user B di-
or unchanged, that is zero correlation. vided by the number of items scored by user A or user B
Pearson correlation coefficient is generally used to cal- falls between 0 and 1.
culate the degree of tightness between two fixed-distance Pearson correlation coefficient is the most famous sim-
variables, and its value is between [−1, +1]. s x , s y are stan- ilarity algorithm, and its value falls between 1 and - 1. If
dard deviations of x and y samples. user-based collaborative filtering is used, the formula is as
∑︀
x i y i − nxy follows:
P(x, y) = (7)
(n − 1)s x s y
∑︀ (︀ )︀ (︀ )︀
r u,i − r u r v,i − r u
i∈I
∑︀ ∑︀ ∑︀
n xi yi − xi yi Sim(u, v) = √︂∑︀ (︀ (12)
= √︁ )︀2 √︂∑︀ (︀ )︀2
∑︀ 2 ∑︀ 2 √ ∑︀ 2 ∑︀ 2 r u,i − r u r v,i − r v
n xi − ( xi ) n yi − ( yi ) i∈I i∈I

I is an item with a score between user u and v. r u and

i represent user u’s score for item i, r v and i represent user
2.5.2 Jaccard similarity coeflcient
v’s score for item i, and r u and r v represent the average
value of all user u’s scores and the average value of all user
The Jaccard similarity coefficient is a statisticused for com-
v’s scores. If the collaborative filtering is based on goods,
paring the similarity and diversity of sample sets. The Jac-
the formula is as follows:
card coefficient measures similarity between finite sample ∑︀ (︀ )︀ (︀ )︀
r u,i − r l r u,j − r j √︃
sets, and is defined as the size of the intersection divided u∈U
∑︁ (︀ )︀2
by the size of the union of the sample sets: Sim(i, j) = √︂ ∑︀ (︀ )︀2 r u,j − r j (13)
r u,i − r l u∈U
A∩B u∈U
J(A, B) = (8)
A∪B U is an item with the same user rating between item i
If A and B are both empty, we define J(A, B) = 1. and j. r u and i represent user u’s rating of item i. r u and j
represent user u’s rating of item j. r i and r j represent the
0 ≤ J(A, B) ≤ 1 (9)
average value of all item i’s rating and the average value of
The MinHash min-wise independent permutations lo- all item j’s rating.
cality sensitive hashing scheme may be used to efficiently These two collaborative filtering algorithms use the
compute an accurate estimate of the Jaccard similarity co- same prediction formula, and user-based collaborative fil-
efficient of pairs of sets, where each set is represented by tering formula is:
a constant-sized signature derived from the minimum val- ∑︀
Rating v,i · sim(u, v)
ues of a hash function [15]. Score u,i = ∑︀ (14)
sim(u, v)
A Big Data Analysis Method Based on Modified Collaborative Filtering Recommendation Algorithms | 971

The meaning of the formula is represented by: user v

has a score, and user u has not scored all items multiplied
4 Experimental environment and
by user u, v similarity, divided by the sum of user u, v sim- methods
ilarity. The Item-based collaborative filtering formula is:
∑︀ The program language used in this study is R data analy-
Rating u,j · sim(i, j)
Score u,i = ∑︀ (15) sis language. One server and four hosts were selected as
sim(i, j)
hardware cloud environment to test on 2, 5 and 8 nodes re-
However, in some cases, errors may occur in the cal- spectively. The data used in this study are from the IMDB
culation of Pearson correlation coefficient. The following Film Scoring Website (http://www.imdb.com). A total of
results can be obtained when calculating the similarities 224836 score records were used [16, 17]. There are less than
between user u1 and user u2 , user u2 and user u3 : 20 users who delete scoring items from the data of this ex-
periment, and all users have the same score. Because the
Sim(u1 , u2 ) > Sim(u3 , u2 ) (16) accuracy of collaborative filtering algorithm will increase
with the increase of the value of k, the neighborhood k is
But in fact, the similarity between user u2 and u3
tested from 1 to 10 in the experiment process [18].
should be relatively high, because user u1 scores range
As a user-based collaborative filtering algorithm, the
from 1 to 5, while user u2 and u3 scores range from 2 to 4.
experimental structure is divided into four parts: (1) cal-
This study proposes an improved approach: using normal
culating the maximum and minimum scores of all users;
recovery similarity measure.
(2) calculating the similarity of all users; (3) calculating
dist (u, v) the prediction scores. (4) In another experiment, the same
Sim(u, v) = 1 − (17)
distmax data was used to recommend the item with the highest
√︂∑︀
(nr u,i − nr v,i )
2 prediction score, and the number of neighbors used was
i∈I 3. In this study, three different algorithms are used for pre-
=1− √︃
|I|
∑︀ 2
diction, namely, the Jaccard similarity coefficient, Pearson
(1 − 0) similarity and Normal recovery similarity measure.
k=1
√︃
∑︀ (︁ r u,i −r u min )︁2
r v,i −r v min
r u max −r u min − r v max −r v min
i∈I
=1− √︃
|I|
5 Research results and analysis
∑︀
1
k=1 The experiment first calculates the execution time of a sin-
The formula is simplified as follows: gle personal computer. As can be seen from Figure 1, where
√︃ the abscissa k is the number of neighbors, when the value
∑︀ (︁ r u,i −r u min r v,i −r v min
)︁2 of k increases, the running time will be greatly increased,
r u max −r u min − r v max −r v min because according to the formula, when the value of k in-
i∈I
Sim(u, v) = 1 − √︀ (18) creases, the time will be exponential growth.
|I |

The similarity between user u1 and u2 is less than that

between user u2 and u3 , and the similarity between user
u5 and u6 is 0. The formula of the prediction score is the
normal recovery similarity prediction formula:

sim(u, u′ ) · nr u′ ,i
∑︀
u′ ∈u
^r u,l = r u min + (r u max − r u min ) (19)
sim(u, u′ )
∑︀
u′ ∈u

r u min is the lowest score evaluated by user u, r u max is the

highest score evaluated by user u, Sim(u, u′ ) is the similar-
ity between user u and user u′ . In this paper, the similarity
measure of normal recovery is used as the basis of collab-
orative filtering algorithm. Figure 1: The running time of a personal computer (the number of
neighbors in abscissa K)
972 | N. Yin

Table 1: Eflciency comparison of personal computers with 2, 5 and 8 nodes

K PC 2 nodes Acceleration 5 nodes Acceleration 8 nodes Acceleration

Ratio Ratio Ratio
1 721 1146 0.624 529 1.452 345 2.445
2 2245 3046 0.654 1391 1.539 879 2.489
3 4456 6234 0.691 3156 1.482 1846 2.546
4 7397 11862 0.663 5256 1.383 2875 2.583
5 10695 15672 0.647 7145 1.584 4489 2.498
6 12341 22478 0.586 8763 1.389 5446 2.437
7 14619 22478 0.642 12189 1.498 6450 2.510
8 12147 32458 0.545 12487 1.587 7215 2.674
9 22462 36542 0.629 15655 1.445 8889 2.348
10 25246 35425 0.542 14586 1.478 11241 2.457

From Figure 2, it can be seen that when the number of

hardware resources and nodes in cloud environment is too
low (Curve 1), it is not suitable for cloud execution. How-
ever, when the number of hardware resources and nodes
in cloud environment is increased (Curve 3, 4), the collabo-
rative filtering algorithm can effectively accelerate the cal-
culation.
The formula used for calculating the acceleration ratio
is speedup=T a /T b ,
T a represents the running time of a personal computer,
T b represents Hadoop runtime.
Figure 2: Comparisons of running time between PC and Hadoop with
In another experiment, three different algorithms are
2, 5 and 8 nodes
used to calculate the result prediction. The experimental
results are completely consistent. It can be speculated that
Because Hadoop’s hardware environment consists of there are two reasons for this result. The first one is the
three hosts, it corresponds to two, five and eight nodes [19]. data set. Because the data source used in this experiment
Table 1 compares the performance of the PC with that of is the score of the website, it depends on the rater’s inter-
the two nodes in the case of adjusting the k value (k = 1-10). ests, so the matrix is sparse in numbers [21]. Users may
At two nodes, it happens to be executed by one host. Com- only want to evaluate their favorite projects, resulting in
pared with the execution of personal computer, it has more positive correlation of similarity, so the calculation of sim-
time to transmit and configure, so the execution efficiency ilarity will have similar results. The second reason is that
is not good. this research only recommend the highest project, so other
In Table 1, the performance of the PC with 5 nodes and possible projects may be ignored.
8 nodes is significantly improved compared with that of Table 2 is part of the recommendation results of three
the PC with 5 nodes and 8 nodes when the K value is ad- different algorithms. The results are expressed by the first
justed. In the case of five nodes, it can be seen that the ac- 10 users out of 100 users. The contents of the table are
celeration ratio is greater than 1, which means that the ex- movie numbers. Each column represents different users.
ecution speed of five nodes is about 0.5 times faster than From the table, it can be seen that the recommendation
that of a single computer [20]. In the case of 8 nodes, it results of each user in three different algorithms are the
can be seen that the acceleration ratio has been increased same.
to more than 2 times, about 2.5 times, and the maximum
acceleration ratio is 2.67 times when the number of neigh-
bors k equals 4.
Figure 2 shows a comparison of running time curves
of 2, 5 and 8 computing nodes between PC and Hadoop.
A Big Data Analysis Method Based on Modified Collaborative Filtering Recommendation Algorithms | 973

Table 2: Top 10 Recommended Results of the Three Algorithms faced with sparse matrix distribution, it will make predic-
tion difficult. In the follow-up study, the researcher can
Users Jaccard Normal Restoration Pearson try to find other recommended algorithms and improve-
similarity Similarity Measure similarity ment directions, such as the construction of the multi-
coeflcient agent model combined with neural network and collabo-
1 925468 925468 925468 rative filtering algorithm. Nowadays, with the increasing
2 24589 24589 24589 amount of data, using R language to analyze data in mas-
3 252465 252465 252465 sive data will encounter layer-by-layer obstacles, too long
4 52245774 52245774 52245774 analysis time, insufficient memory and so on. Using the
5 52547 52547 52547 methods of Hadoop Distributed File System (HDFS) and
6 38625 38625 38625 Map Reduce in Apache Hadoop Open Source Software can
7 3545562 3545562 3545562 improve computing efficiency and storage space manage-
8 2542588 2542588 2542588 ment and increase capacity.
9 75225 75225 75225
10 855265 855265 855265

References
6 Research conclusions [1] D’Angeac G.D., Big data: the management revolution, Harvard
Business Review, 2012, 90(10), 60-68.
With the increasingly frequent e-commerce transactions [2] Xu H.L., Wu X., Li X.D., Yan B.P., Comparison study of internet
nowadays, more and more sellers choose to sell goods on- recommendation system: comparison study of internet recom-
mendation system, Journal of Software, 2009, 20(2), 350-362.
line, which also brings a huge number of goods. In the
[3] Feng L., Guo W., Yu D., Gao Q., Gao K., Xue, Z., et al., Classification
past, the collaborative filtering recommendation system of different therapeutic responses of major depressive disorder
will treat each item as a feature to calculate, but in today’s with multivariate pattern analysis method based on structural
data form, it is unrealistic and massive. Users and com- mr scans, Plos One, 2012, 7(7), 1-11.
modities also bring about the problem of extremely sparse [4] Karabadji N.E.I., Beldjoudi S., Seridi H., Aridhi S., Dhifli W. Im-
proving memory-based user collaborative filtering with evolu-
data, resulting in the recommendation system operation
tionary multi-objective optimization, Expert System with Appli-
speed is too slow, or even unable to work. cations, 2018, 98, 153-165.
With the advent of cloud era, data growth rate is very [5] Rashid A.M., Albert I., Cosley D., Lam S.K., McNee S.M., Konstan
fast. In a massive data environment, when the researcher J.A., Riedl J., Getting to know you: learning new user preferences
need to find solutions to problems, execution speed will in recommender systems, Proceedings of the 7th International
be the key. In this paper, a collaborative filtering algo- Conference on Intelligent User Interfaces (January 13 - 16, 2002,
San Francisco, CA, USA), ACM New York, 2002, 127-134.
rithm modified by normal recovery similarity measure is
[6] Liang T.P., Lai H.J., Ku Y.C., Personalized content recommenda-
adopted, and the speed is improved by 2.67 times through tion and user satisfaction: Theoretical synthesis and empirical
the cloud environment simulation. With the increase of ac- findings, Journal of Management Information Systems, 2006,
tual data, the operation of personal computers will take 23(3), 45-70.
more time, and the ability to store data will be limited to [7] Xiao B., Benbasat I., E-commerce product recommendation
agents: Use, characteristics, and impact, Mis Quarterly, 2007,
a certain extent. Using MapReduce on Hadoop distributed
31(1), 137-209.
platform to distribute operation and data to different hosts [8] De Meo P., Quattrone G., Terracina G., Ursino D., An XML-based
can save a lot of time and data burden. Hadoop’s Dis- multiagent system for supporting online recruitment services,
tributed File System (HDFS) guarantees the correctness Systems, Man and Cybernetics, Part A: Systems and Humans,
of the data and restores the similarity measure normally. IEEE Transactions on Systems Man & Cybernetics Part A Systems
After modification, its prediction accuracy is improved. & Humans, 2007, 37(4), 464-480.
[9] Yoshii K., Goto M., Komatani K., Ogata T., Okuno H.G. An eflcient
The experimental results show that the execution time in-
hybrid music recommender system using an incrementally train-
creases with the number of neighbors. When the number able probabilistic generative model, IEEE T Audio Speech, 2008,
of nodes is 5 and 8, the execution time is greatly improved, 16(2), 435-447.
which improves the efficiency of collaborative filtering al- [10] Li J., Kai Z., Yang X., Peng W., Jie W., Mitra K., et al., Category pre-
gorithm and can cope with massive data in the future. ferred canopy-k-means based collaborative filtering algorithm,
Future Generation Computer Systems, 2018, 93, 1046-1054.
However, there are some shortcomings in this study.
[11] O’Donovan J., Smyth B., Trust in recommender systems, Pro-
For example, when the collaborative filtering algorithm is ceedings of the 10th international conference on Intelligent user
974 | N. Yin

interfaces (January 09 - 12, 2005, San Diego, CA, USA), ACM New [17] Gao W., Wang W., A tight neighborhood union condition on
York, 2005, 167-174. fractional-critical deleted graphs, Colloquium Mathematicum,
[12] Herlocker J.L., Konstan J.A., Riedl J., Explaining collaborative 2017, 149, 291-298.
filtering recommendations, Proceedings of the 2000 ACM Con- [18] Gao W., Wang W., New isolated toughness condition for fractional-
ference on Computer Supported Cooperative work (December critical graph, Colloquium Mathematicum, 2017, 147, 55-65.
02 - 06, 2000, Philadelphia, PA, USA), ACM New York, 2000, [19] Khalique C.M., Mhlanga I.E., Travelling waves and conservation
241-250. laws of a dimensional coupling system with korteweg-de vries
[13] Herlocker J.L., Konstan J.A., Borchers A., Riedl J., An algorithmic equation, Applied Mathematics & Nonlinear Sciences, 2018, 3,
framework for performing collaborative filtering, Proceedings 241-254.
of the 22nd Annual International ACM SIGIR Conference on Re- [20] Naeem M., Siddiqui M.K., Guirao J.L.G., Gao W., New and modi-
search and Development in Information Retrieval (August 15-19, fied eccentric indices of octagonal grid Om
n , Applied Mathematics
1999, Berkeley, CA, USA), ACM New York, 1999, 230-237. & Nonlinear Sciences, 2018, 3, 209-228.
[14] Good N., Schafer J.B., Konstan J.A., Borchers A., Sarwar B., Her- [21] Pandey P.K., A new computational algorithm for the solution
locker J. et al., Combining collaborative filtering with personal of second order initial value problems in ordinary differential
agents for better recommendations, 1999, 439-446. equations, Applied Mathematics & Nonlinear Sciences, 2018, 3,
[15] Edith C., Min-Hash Sketches, Springer New York, 2016. 167-174.
[16] Cantero A., Crespo F., Ferrer S., The triaxiality role in the spin-
orbit dynamics of a rigid body, Applied Mathematics & Nonlinear
Sciences, 2018, 3, 187-208.

Book Recommendation System Proposal Report
67% (3)
Book Recommendation System Proposal Report
20 pages
Ijcse 2020 105727
No ratings yet
Ijcse 2020 105727
7 pages
Book Recommendation Using Collaborative Filtering IJERTV12IS040195
No ratings yet
Book Recommendation Using Collaborative Filtering IJERTV12IS040195
6 pages
Comparison of Collaborative Filtering Algorithms: Limitations of Current Techniques and Proposals For Scalable, High-Performance Recommender Systems
No ratings yet
Comparison of Collaborative Filtering Algorithms: Limitations of Current Techniques and Proposals For Scalable, High-Performance Recommender Systems
33 pages
Online Book Recommendation System
100% (1)
Online Book Recommendation System
21 pages
AI Recommendation System
No ratings yet
AI Recommendation System
20 pages
Big Data Based Retail Recommender System of Non E-Commerce: IEEE - 33044
No ratings yet
Big Data Based Retail Recommender System of Non E-Commerce: IEEE - 33044
7 pages
6731 Documentation Seminar
No ratings yet
6731 Documentation Seminar
27 pages
An Item-Based Collaborative Filtering Recommendation Algorithm Using Slope
No ratings yet
An Item-Based Collaborative Filtering Recommendation Algorithm Using Slope
3 pages
International Journal of Computational Engineering Research (IJCER)
No ratings yet
International Journal of Computational Engineering Research (IJCER)
6 pages
Recommendation System Using Collaborative Filtering
No ratings yet
Recommendation System Using Collaborative Filtering
49 pages
Research Paper On Recommend On Er Systems
No ratings yet
Research Paper On Recommend On Er Systems
6 pages
Recommendation Item Based On Keyword Search Using Big Data: Jafar Sadik Kamadod Prof. Shrivatsa Koulgi
No ratings yet
Recommendation Item Based On Keyword Search Using Big Data: Jafar Sadik Kamadod Prof. Shrivatsa Koulgi
3 pages
Peng 2013
No ratings yet
Peng 2013
4 pages
Article 34
No ratings yet
Article 34
8 pages
IDEA - Collaborative Filtering Techniques in Recommendation Systems
No ratings yet
IDEA - Collaborative Filtering Techniques in Recommendation Systems
11 pages
4 - IEEE - DM - Collabrative Filtering User Intrest
No ratings yet
4 - IEEE - DM - Collabrative Filtering User Intrest
1 page
Recommendation System
No ratings yet
Recommendation System
19 pages
Synthetic Data Generation: A Beginner’s Guide
From Everand
Synthetic Data Generation: A Beginner’s Guide
Robert Johnson
No ratings yet
Final Project Report
No ratings yet
Final Project Report
18 pages
A Survey On Recommendation System For Bigdata Using MapReduce Technology
No ratings yet
A Survey On Recommendation System For Bigdata Using MapReduce Technology
5 pages
Survey On Collaborative Filtering Technique in Recommendation System
No ratings yet
Survey On Collaborative Filtering Technique in Recommendation System
7 pages
An Optimized Item-Based Collaborative Filtering Recommendation Algorithm
No ratings yet
An Optimized Item-Based Collaborative Filtering Recommendation Algorithm
5 pages
Time Based Collaborative Recommendation System by Using Data Mining Techniques
No ratings yet
Time Based Collaborative Recommendation System by Using Data Mining Techniques
7 pages
Dynmic Trust Based Two Layer
No ratings yet
Dynmic Trust Based Two Layer
10 pages
The Application of E-Commerce Recommendation System in Smart Cities Based On Big Data and Cloud Computing
No ratings yet
The Application of E-Commerce Recommendation System in Smart Cities Based On Big Data and Cloud Computing
20 pages
2018, Qiao - Research On Personalized Recommendation of Distance Education Resources Based On Spark
No ratings yet
2018, Qiao - Research On Personalized Recommendation of Distance Education Resources Based On Spark
5 pages
Numerical Similarity Measures Versus Jaccard For Collaborative Filtering
No ratings yet
Numerical Similarity Measures Versus Jaccard For Collaborative Filtering
14 pages
A Novel Collaborative Filtering Model Based On Combination of Correlation Method With Matrix Completion Technique
No ratings yet
A Novel Collaborative Filtering Model Based On Combination of Correlation Method With Matrix Completion Technique
8 pages
Online Book Recommendation System Using Collaborative Filtering (With Jaccard Similarity)
No ratings yet
Online Book Recommendation System Using Collaborative Filtering (With Jaccard Similarity)
9 pages
Review of Clustering-Based Recommender Systems
No ratings yet
Review of Clustering-Based Recommender Systems
22 pages
Movie Recommender Engine Using Collaborative Filtering: Smart Innovation October 2018
No ratings yet
Movie Recommender Engine Using Collaborative Filtering: Smart Innovation October 2018
9 pages
Machine Learning for the Web
From Everand
Machine Learning for the Web
Andrea Isoni
No ratings yet
Recommendation Systems: A Review
No ratings yet
Recommendation Systems: A Review
6 pages
Unit 4 - MLMM
No ratings yet
Unit 4 - MLMM
36 pages
v1 Covered
No ratings yet
v1 Covered
26 pages
Book Recommendation Using Collaborative Filtering IJERTV12IS040195
No ratings yet
Book Recommendation Using Collaborative Filtering IJERTV12IS040195
5 pages
IJE - Volume 29 - Issue 6 - Pages 788-796
No ratings yet
IJE - Volume 29 - Issue 6 - Pages 788-796
9 pages
Building Accurate and Practical Recomender System Usnig ML Classifier and CBF by Asma
No ratings yet
Building Accurate and Practical Recomender System Usnig ML Classifier and CBF by Asma
19 pages
Recommendation System Techniques and Related Issues A Survey
No ratings yet
Recommendation System Techniques and Related Issues A Survey
7 pages
Book Recommendation System Using Machine Learning
100% (1)
Book Recommendation System Using Machine Learning
3 pages
2023 Scopus Kids Hobby Prediction
No ratings yet
2023 Scopus Kids Hobby Prediction
6 pages
Advances in Artificial Intelligence - 2009 - Su - A Survey of Collaborative Filtering Techniques
No ratings yet
Advances in Artificial Intelligence - 2009 - Su - A Survey of Collaborative Filtering Techniques
19 pages
Data Mining 101: Core Concepts and Algorithms
From Everand
Data Mining 101: Core Concepts and Algorithms
Swarnalata Verma
No ratings yet
Advances in Artificial Intelligence - 2009 - Su - A Survey of Collaborative Filtering Techniques
No ratings yet
Advances in Artificial Intelligence - 2009 - Su - A Survey of Collaborative Filtering Techniques
19 pages
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
Unit III Collaborative Filtering Final
No ratings yet
Unit III Collaborative Filtering Final
65 pages
10 1109icesc48915 2020 9155879
No ratings yet
10 1109icesc48915 2020 9155879
7 pages
LITERATURE SURVEY ON RECOMMENDATION ENGINEaper
No ratings yet
LITERATURE SURVEY ON RECOMMENDATION ENGINEaper
9 pages
Seminar Report Final
No ratings yet
Seminar Report Final
46 pages
An Recommendation Algorithm Based On Weighted Slope One Algorithm and User-Based Collaborative Filtering
No ratings yet
An Recommendation Algorithm Based On Weighted Slope One Algorithm and User-Based Collaborative Filtering
4 pages
Mastering Data Mining Techniques
From Everand
Mastering Data Mining Techniques
Dhaanyalakshmi Ahuja
No ratings yet
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
UNIT3
No ratings yet
UNIT3
37 pages
Machine Learning Based Efficient Recommendation System For Book Selection Using User Based Collaborative Filtering Algorithm
No ratings yet
Machine Learning Based Efficient Recommendation System For Book Selection Using User Based Collaborative Filtering Algorithm
6 pages
Collab Survey
No ratings yet
Collab Survey
19 pages
Report 2
No ratings yet
Report 2
30 pages
Book Recommendation System
No ratings yet
Book Recommendation System
8 pages
An Improved Online Book Recommender System Using Collaborative Filtering Algorithm
No ratings yet
An Improved Online Book Recommender System Using Collaborative Filtering Algorithm
9 pages
Collaborative Filtering Using A Regression-Based Approach: Slobodan Vucetic
No ratings yet
Collaborative Filtering Using A Regression-Based Approach: Slobodan Vucetic
22 pages
Little Guide To Building Large Language Models in 2024
100% (1)
Little Guide To Building Large Language Models in 2024
65 pages
Wang 等 - 2019 - A Memory-Efficient Sketch Method for Estimating Hi
No ratings yet
Wang 等 - 2019 - A Memory-Efficient Sketch Method for Estimating Hi
10 pages
Spam Profile Detection On Instagram Using Machine Learning Algorithms On WEKA and RapidMiner
No ratings yet
Spam Profile Detection On Instagram Using Machine Learning Algorithms On WEKA and RapidMiner
5 pages
Scalable Entity Resolution
No ratings yet
Scalable Entity Resolution
66 pages
312 Course Project-1
No ratings yet
312 Course Project-1
16 pages
Chandana Combined Documentation PDF
No ratings yet
Chandana Combined Documentation PDF
66 pages
Locality-Sensitive Hashing
No ratings yet
Locality-Sensitive Hashing
10 pages
03.2 03.3 Shingling MinHash
No ratings yet
03.2 03.3 Shingling MinHash
32 pages
UNIT 2 Bigdata Mining and Analytics
No ratings yet
UNIT 2 Bigdata Mining and Analytics
18 pages
Minhash PDF
100% (1)
Minhash PDF
2 pages
Big Data
No ratings yet
Big Data
37 pages
Transactions On: Large-Scale Data-And Knowledge - Centered Systems XXVIII
No ratings yet
Transactions On: Large-Scale Data-And Knowledge - Centered Systems XXVIII
168 pages
Chapter 5
No ratings yet
Chapter 5
53 pages
MMD 02
No ratings yet
MMD 02
97 pages
Little Guide To Building Large Language Models in 2024
No ratings yet
Little Guide To Building Large Language Models in 2024
65 pages
Learning To Hash For Indexing Big Data - A Survey
No ratings yet
Learning To Hash For Indexing Big Data - A Survey
22 pages
Probabilistic Data Structures
No ratings yet
Probabilistic Data Structures
26 pages
Aurum A Data Discovery System
No ratings yet
Aurum A Data Discovery System
12 pages
Similarity Analysis
No ratings yet
Similarity Analysis
85 pages
A Hash-Based Co-Clustering Algorithm For Categorical Data
No ratings yet
A Hash-Based Co-Clustering Algorithm For Categorical Data
12 pages
MMD2
No ratings yet
MMD2
13 pages
Optimizing Information Leakage in Multicloud Storage Services
No ratings yet
Optimizing Information Leakage in Multicloud Storage Services
14 pages
Randomized Algo Harvey
No ratings yet
Randomized Algo Harvey
234 pages
Data Mining: Sketching, Locality Sensitive Hashing
No ratings yet
Data Mining: Sketching, Locality Sensitive Hashing
61 pages
Big Data Unit II
No ratings yet
Big Data Unit II
23 pages
HW 1
No ratings yet
HW 1
9 pages
Mining of Massive Datasets: Jure Leskovec Anand Rajaraman Jeffrey D. Ullman
0% (1)
Mining of Massive Datasets: Jure Leskovec Anand Rajaraman Jeffrey D. Ullman
17 pages
Theory of Locality Sensitive Hashing - CS246 Stanford (Slides)
No ratings yet
Theory of Locality Sensitive Hashing - CS246 Stanford (Slides)
52 pages
Module 2 Algorithm For Massive Datasets
No ratings yet
Module 2 Algorithm For Massive Datasets
79 pages
Mining Massive DataSets
No ratings yet
Mining Massive DataSets
54 pages

61.a Big Data Analysis Method Based On Modified

Uploaded by

61.a Big Data Analysis Method Based On Modified

Uploaded by

Open Phys.

A Big Data Analysis Method Based on Modified

mon methods in the recommendation system. For users,

If the user’s preference for θ is known, then the step

I is an item with a score between user u and v. r u and

The meaning of the formula is represented by: user v

The similarity between user u1 and u2 is less than that

r u min is the lowest score evaluated by user u, r u max is the

Table 1: Eflciency comparison of personal computers with 2, 5 and 8 nodes

K PC 2 nodes Acceleration 5 nodes Acceleration 8 nodes Acceleration

From Figure 2, it can be seen that when the number of

You might also like