A Research of Job Recommendation System Based On Collaborative Filtering

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

2014 Seventh International Symposium on Computational Intelligence and Design

A Research of Job Recommendation System Based on Collaborative Filtering

Yingya Zhang, Cheng Yang, Zhixiang Niu


Information Engineering School
Communication University of China
Beijing, China
[email protected], [email protected], [email protected]

Abstract—Dealing with the enormous amount of recruiting Section 4 shows results and evaluation of experiments and
information on the Internet, a job seeker always spends hours to tests. Section 5 concludes all aspects of the implementation.
find useful ones. To reduce this laborious work, we design and
implement a recommendation system for online job-hunting. In II. RELATED WORK
this paper, we contrast user-based and item-based collaborative
filtering algorithm to choose a better performed one. We also A. Recommendation Algorithms
take background information including students’ resumes and 1) Content-based filtering (CBF):
details of recruiting information into consideration, bring In Content-based methods[1], features of items are
weights of co-apply users (the users who had applied the abstract and compared with a profile of the user’s preference.
candidate jobs) and weights of student used-liked jobs into the In other words, this algorithm tries to recommend items that
recommendation algorithm. At last, the model we proposed is are similar to those that a user liked in the past. It is widely
verified through experiments study which is using actual data. applied in information retrieval(IR). However it performs
The recommended results can achieve higher score of precision
badly in multimedia field such as music or movie
and recall, and they are more relevant with users’ preferences
recommendation because it is hard to extract items attributes
before.
and obtain user’s preference sometimes.
Keywords-recommendation system; item-based collaborative 2) Collaborative Filtering (CF):
filtering; content-based filtering; Vector Space Model(VSM); CF is a popular recommendation algorithm that bases its
Mahout predictions and recommendations on the ratings or behavior
I. INTRODUCTION of other users in the system[2]. There are two basic type:
User-based CF and Item-based CF.
The increasing usage of Internet has heightened the need
for online job hunting. In the year of 2013, the amount of • User-based CF: find other users whose past rating
people who searched jobs on www.ganji.com is almost a behavior is similar to that of the current user and use
billion. According to Jobvite’s report 2014, 68% of online their ratings on other items to predict what the current
jobseekers are college graduates or post graduates. The key user will like.
problem is that most of job-hunting websites just display
• Item-based CF: Rather than using similarities between
recruitment information to website viewers. Students have to
users’ rating behavior to predict preferences, item–
retrieve among all the information to find jobs they want to
based CF uses similarities between the rating patterns
apply. The whole procedure is tedious and inefficient. In
of items[3]. Since finding similar items is easier than
addition, many E-commerce websites, the most general
finding similar users, and attributes of items are more
application of recommendation algorithms, uses collaborative
stable than users’ preference, item-based methods are
filtering algorithm without considering user’s resume and
suitable for off-line computing[4].
item’s properties----in this case, that means students’ resume
and details of recruiting information. So we proposed an Collaborative Filtering approaches often suffer from
improved algorithm based on item-based collaborative three problems: cold start, scalability and sparsity[3].
filtering. The aim of the present paper is to give an effective
method of recommendation for online job hunting. We hope B. Methods of Similarity Calculation
to offering students a personalized service that can help them 1) Cosine Similarity
find ideal jobs quickly and conveniently. Even so, the sparsity Cosine similarity uses two N-dimensional vector’s cosine
of user profile can be obstructive, further studies on filling value to indicate the degree of similarity between them. It is
users’ preference matrix with implicit behaviors will be widely used in information retrieval(IR).
summarized in our next study.
a ⋅b (1)
The remainder of this paper is divided into four sections. sim(a ⋅ b ) =
Section 2 introduces the related work about recommendation | a |⋅|b |
systems. Section 3 describes the design and algorithm of our 2) Tanimoto Coefficient
recommendation system for college students job hunting.

978-1-4799-7005-6/14 $31.00 © 2014 IEEE 534


533
DOI 10.1109/ISCID.2014.228
Tanimoto coefficient[5], also known as the Jaccard index, B. Item-based CF Deals With Boolean Data
measures similarity between finite sample sets, and is defined The traditional Item-based CF processes as follow: First,
as the size of the intersection divided by the size of the union for each job which current user applied in the past(we regard
of the sample sets. user-applied jobs as user-liked jobs), find out other users who
X ∩Y (2) applied this job(we regard these users as co-applied users),
Jaccard ( X , Y ) = and then find out other jobs these co-applied users also
X ∪Y
applied, except for the current job(user-liked jobs), uses these
3) Log Likelihood jobs as candidate set. The procedure are presented below:
Similar to Tanimoto coefficient, the Log likelihood
method calculate similarity based on the common preference for each jobi useri applied{
two users shared. Given the total number of items and the for each co-applied userj who applied jobi{
number of each user rated items, the final result is the
impossibility of that the two users have such common find out jobs that userj applied;
preference[6]. add these jobs to candidate set;
4) The City Block Distance }
The city block[7] distance is the sum of the lengths of the
projections of the line segment between the points onto the delete jobi from candidate set;
coordinate axes. }
D( x, y) = ∑i =0 | xi − y| (3)
n
Second, for every job Itemj in the candidate set
i
{Item1…Itemp}, compute the predict preference grade for it
The similarity calculation using the city block distance: according to the formula (5). At last, sort all the grades and
choose top N jobs as the result set. The procedure are
1 (4) presented below:
sim( x, y ) =
1 + d ( x, y )
for each itemj in candidate set{
compute pref (Ui,Itemj);
III. RECOMMENDATION SYSTEM OF STUDENTS JOB }
HUNTING(SJH)
sort these pref (Ui,Item);
A. Procedure of SJH Recommendation
There are four steps in our system as Fig. 1 shows:

k
sim ( Item l , Item j ) ∗ p (U i , Item l ) (5)
l =1
pref (U i , Item j ) =
∑ l sim ( Item l , Item j )
k

pref (Ui,Itemj) stands for the predict degree that Ui will


rates Itemj, sim(Iteml,Itemj) stands for similarity between
Iteml and Itemj, p(Ui,Iteml) is user Ui’s preference on Iteml.
Iteml is from set {Item1…Itemk} that Ui has applied in the
past.
In our case, users express their preference on items
through apply behaviors. In this condition, p(Ui,Iteml) is
boolean type , values (0,1). So we use original grading
formula as follow(6):
(6)
pref (U , Item ) = ∑
x
i j l =1
sim ( Iteml , Item j ) ∗ p (U i , Iteml )
o
Figure. 1. Procedure of SJH recommendation
If Ui had applied Iteml, p(Ui,Iteml) equals 1. Otherwise,
• Data Preprocessing: In this step, we clean the raw p(Ui,Iteml) equals 0.
data to filter useless data including inactive users and
expired recruiting information. C. Weight in Specific Fields
• Construct the item-based CF recommender: In this So far, our system recommends jobs to users simply
step, the recommender deals with the input data(apply according to apply records only. To improve quality of
records) and gives out original predict preference recommendation, we bring users’ resumes and details of
grades using item-based collaborative filtering recruiting information into the algorithm.
algorithm. 1) Weight of used-liked jobs
• Rescorer: re-compute grades of candidate items. Itemm comes from the job set Ui applied in the past
{Item1… Itemy}, stands for user-used-like items. We
• Sort all grades and push Top N jobs to users. compare the candidate job Itemj to these jobs, so that we can

534
535
aggravate the weights(7) of candidates that similar to user- A. Data Set
used-liked ones. The data set we used is collected from a job hunting web-
site during the years of 2012 and 2013, including 2,503 jobs,

y
sim ( Item m , Item j ) (7)
w h ( Item j ) = α +
m =1

y
7,610 students resumes, and 9,924 job apply records from
6,892 students. The structure of our data base is shown as
Itemm is a job that the user applied in the past. fallow:
∑ sim( Itemm , Item j ) stands for the average similarity between
y
m =1
1) Job apply records:
y
APPLY ID STUDENT ID JOB ID APPLY DATE
candidate job Itemj and user-used-liked jobs. α equals 1.
2) Details of Jobs:
2) Weight of co-apply users JOB JOB LOCATION TYPE FIELD CATEGORY
ID NAME NAME
We compare the users who had applied the candidate job
3) Student resumes:
Itemj (namely co-apply users) to the current user Ui, then STUDEN COLLEGE MAJOR DEGREE HOME GENDER
aggravate the weights(8) of candidates that has higher co- T ID PLACE
apply users’ similarities.
B. Indicators

z
sim (U n ,U i ) (8)
w (U
c i
)=β + n =1
1) Precision
z
Precision is the fraction of recommended items that are
Un is one of the co-apply users in {U1…Uz}, ∑n=1 sim(Un ,Ui)
z
relevant to the user, as (12) shows[8]:
z
means the average co-apply user similarity of a candidate job. | hits | (12)
P=
| recset |
Then, we have summarized the rescore preference grading
formula as (9): 2) Recall
Recall is the fraction of the items that are relevant to the
pref (U i , Itemj ) = pref (U , Item ) ∗ w ( Item ) ∗ w (U ) (9)
o i j h j c i user that are successfully recommended, as (13) shows[8]:
D. Similarity Method Dealing with Text | hits | (13)
R =
In student job hunting system, student resume information | testset |
and job descriptions are stored in the form of text in the
database. To compare the similarity between two pieces of 3) F1-Measure
information, we represent each piece of information as space F1 score is the harmonic mean of precision and recall.
vector and use cosine similarity distance calculation. 2∗ P∗ R
F1 = (14)
For example, job description is expressed as a vector like P+R
this: (job name, location, job type, field, category name). It is C. Contrast of User-based and Item-based Algorithm
represent by J = ( j , j , j , j , j ) ; student resume is expressed as
1 2 3 4 5 We evaluate user-based CF and item-based CF on our
a vector like this: (college, major, degree, home place, data set respectively. We use Log likelihood, City Block and
gender). It is represent by S = (s , s , s , s , s , s ) . Tanimoto similarity methods to recommend three jobs for
1 2 3 4 5 6
each user. In the user-based algorithm, the neighborhood
The similarity between two jobs or two students can be number is ten. The precision and recall is evaluated.
calculated by the formula (10) and (11):
TABLE I. PRECISION AND RECALL OF DIFFERENT RECOMMEDERS
sim ( J 1 , J 2 ) = cos(θ j ) (10)
Recommender(r_num=3) Similarity Precision Recall
User-Based CF Log
sim ( S 1 , S 2 ) = cos( θ u ) (11) (n=10) likelihood
62.82% 53.85%
City Block 83.33% 56.41%
Tanimoto 65.38% 53.85%
Item-Based CF Log
IV. EXPERIMENTS 58.33% 58.33%
likelihood
In this section, user-based and item-based CF algorithms City Block 0.00% 0.00%
Tanimoto 41.67% 41.67%
are tested on our data set respectively. Then item-based CF,
the better performed one, to be applied on the Student Job
Hunting recommendation system. At last, we evaluate the
performance of improved recommender that using used-liked
job and co-apply users weights based on item-based algorithm.
The implementation of our experiments is based on Apache
Mahout.

535
536
Figure. 4. Recall of user-based CF with different similarity methods

As Fig.3 and Fig.4 described, the result indicates that City


Figure. 2. Precision and recall of different recommeders
Block similarity can achieve higher precision and recall under
different neighborhood number situations.
Fig. 2 shows that all three user-based algorithm and item-
based algorithm which using Log likelihood similarity It should be noted that the Generic Recommender IR Stats
reached higher precision and recall than other two algorithms. Evaluator that Mahout offered just focuses on users who have
Under this circumstance, we continue to evaluate these four recommended items, while automatically ignoring users that
methods via some other variables, for example the number of can not be recommended. So the result of evaluation seemed
neighborhood. very good. In next section we will test the recommendation
results artificially.
D. Evaluation Of User-based CF
1) Contrast of different similarities 2) Similarity between used-liked jobs and recommended
Since user’s preference on jobs values (0, 1) in the job results
apply records, and Log likelihood, City Block and Tanimoto We chose ten students randomly and recommended two
are three methods of similarity calculation that are suitable for jobs for each one. The algorithm we used is user-based CF.
boolean data, in this experiment we recommended three items The City Block similarity method is applied when computing
to test precision and recall with these three methods. We similarities between users to divide neighborhood. N stands
chose different neighborhood numbers to reduce its influence. for the number of neighborhood. The result is displayed in
Table IV, The former value before ‘/’ is the average similarity
TABLE II. PRECISION OF USER-BASED CF WITH DIFFERENT between recommended jobs from original recommender and
SIMILARITY METHODS used-liked jobs. The latter value after ‘/’ is the average
Precision Log likelihood City Block Tanimoto similarity between recommended jobs from improved
recommender and used-liked jobs. The improved
n=5 75.00% 94.44% 77.78%
recommender added weights of co-apply users and used-liked
n=10 62.82% 83.33% 65.38% jobs.
n=20 56.41% 71.79% 56.41%
TABLE IV. SIMILARITY BETWEEN USED-LIKED JOBS AND
n=40 56.41% 71.79% 56.41% RECOMMENDED RESULTS FROM USER-BASED CF
Student ID n=200 n=100 n=50 n=30 n=20 n=5
1 4e585f8488ca47e99 0.51/ 0.48/ 0.39/ 0.39/ NaN/ NaN/
TABLE III. RECALL OF USER-BASED CF WITH DIFFERENT SIMILARITY 93891612a4cf45d 0.51 0.48 0.39 0.39 NaN NaN
METHODS
Recall Log likelihood City Block Tanimoto 2 8391a03fb0c8488ea 0.37/ 0.37/ 0.34/ 0.34/ NaN/ NaN/
5f6c7379d658e72 0.37 0.37 0.34 0.34 NaN NaN
n=5 53.85% 56.41% 53.85%
3 0104646004f2434b9 NaN/ NaN/ NaN/ NaN/ NaN/ NaN/
n=10 53.85% 56.41% 53.85% 51ae3300d0f4396 NaN NaN NaN NaN NaN NaN
n=20 56.41% 58.97% 56.41% 4 c4dc0c6c3a2b4facb3 NaN/ NaN/ NaN/ NaN/ NaN/ NaN/
358b463186f170 NaN NaN NaN NaN NaN NaN
n=40 56.41% 64.10% 56.41%
5 8554271e58594c4aa 0.65/ NaN/ NaN/ NaN/ NaN/ NaN/
a248e2495de24c9 0.65 NaN NaN NaN NaN NaN
6 d07a838d619f41f39 0.41/ 0.41/ 0.55/ 0.55/ 0.55/ NaN/
a11c06a8311174f 0.41 0.41 0.55 0.55 0.55 NaN
7 6ed1faf8eee9443c8b 0.42/ NaN/ NaN/ NaN/ NaN/ NaN/
47b42ee092b4e0 0.42 NaN NaN NaN NaN NaN
8 0c912ce344004f9da 0.65/ NaN/ NaN/ NaN/ NaN/ NaN/
634d52eb2131d49 0.65 NaN NaN NaN NaN NaN
9 d3ad698574874d0eb NaN/ NaN/ NaN/ NaN/ NaN/ NaN/
b4ff448d3d82dd4 NaN NaN NaN NaN NaN NaN
1 7bf24dedbf0c422e9e NaN/ NaN/ NaN/ NaN/ NaN/ NaN/
Figure. 3. Precision of user-based CF with different similarity methods 0 b03f26cfe10123 NaN NaN NaN NaN NaN NaN

536
537
NaN(Not a Number) means there is no valid As Fig.5 showed, when recommending three jobs for each
recommendation result on current user. Due to the apply student, the improved recommender had a little promotion at
records are very sparse, the user-based recommender can not precision, recall and F1 score. When the number of
offer valid recommendation service for every user. Thus it can recommended item came to two, as Fig.6 showed, all these
be seen in Student Job Hunting system, user-based algorithm three indicator score increased significantly. And the Reach
is unsatisfactory. rate remains as before. Because the sparseness of our apply
records dataset, recommender can only offer 3 or 4
E. Evaluation of Item-based CF recommended results for some students. To evaluate the
According to the result of section IV.C, when using Log overall recommender’s performance, we considered the
likelihood similarity method, item-based algorithm performed number of recommended items as three and two. If user U has
well. So we decided to use item-based CF and selected Log three recommended jobs---(job 1, job 2, job 3), when
likelihood method to compute candidate items’ similarities in recommending jobs for U, the improved recommender just re-
the Student Job Hunting recommendation system. ranking the three jobs recommended from the traditional
recommender, so that it has no influence on the precision,
1) The performance of improved recommender: We
recall and F1 score. However, when evaluating the Top 2
evaluate the capability of original item-based recommender recommended jobs, the improved recommender changed the
and the improved recommender that takes co-apply users’ order of these three jobs, so that the top 2 jobs are different
weight and used-liked jobs’ weight into account when from former ones. The increased scores suggest that the
recommending two or three jobs for each student. The r_num improved recommender works well for the reason that jobs
stands for number of recommended items. The results are take precedence (1st and 2nd recommended jobs) are better
recorded in Table V and Table VI. than latter ones (3rd recommended job).

TABLE V. PERFORMANCE OF IMPROVED RECOMMENDER IN SJH


2) Similarity between user’s preference and
SYSTEM(R_NUM=3) recommendation results from improved recommender: We
r_num=3 Precision Recall Reach F1-measure chose ten students randomly and recommended five jobs for
original each one. The algorithm we used is improved item-based CF.
50.62% 47.13% 93.10% 48.81%
recommender The Log likelihood similarity method is applied when
improved computing similarities between candidate items and current
51.85% 48.28% 93.10% 50.00%
recommender
item. Then we compare the recommended jobs and jobs that
user used to apply. The average similarities between them are
displayed in Table VII as follow:

TABLE VII. SIMILARITY BETWEEN USER’S PREFERENCE AND


RECOMMENDATION RESULTS IN SJH SYSTEM
Item-Based
Original Improved
(Log Student ID
likelihood)
Item-based CF Item-based CF

Figure. 5. Performance of improved recommender in SJH system(r_num=3) 9dd90881b2e040d69d


1 0.36 0.57
5917e4a0a8acde
f6b073ecbdd84f18a98
2 0.30 0.48
ed20bb33faf6f
TABLE VI. PERFORMANCE OF IMPROVED RECOMMENDER IN SJH
SYSTEM(R_NUM=2) f71ec48f99124875bf2
3 0.23 0.55
4dac64118d6d9
r_num=2 Precision Recall Reach F1-measure dd7f431ba21545d8aab
4 0.26 0.60
original 4123327416100
40.91% 37.50% 91.67% 39.13%
recommender aa7d690df4e841b7928
5 0.25 0.35
improved 4c4f196ee2767
59.09% 54.17% 91.67% 56.52%
recommender f7a02b9efff7489bb1b
6 0.37 0.37
57a2d01357a01
0036e442801349d68d
7 0.40 0.80
3c49e15e71d016
001bab4efaa54988acb
8 0.33 0.40
08fc56405f8db
35d36ba8617244829d
9 0.24 0.36
8ab9531503329b
4f63a6ffe4e04258ba9
10 0.36 0.66
e7a2de87c2401

Figure. 6. Performance of improved recommender in SJH system(r_num=2)

537
538
advantage of students’ implicit behaviors in process of job
hunting, which need further research.
ACKNOWLEDGMENTS
The work on this paper was supported by National Key
Technology R&D Program of China (2012BAH17F01-01,
2012AA011702-02, 2012BAH37F03, 2012BAH02B03), the
National Culture S&T Promotion Program (WHB1002) and
the SARFT Scientific and Technological Project (2012-
20,2011-34,2012-27). Supported by Program for New
Figure. 7. Similarity between user’s preference and recommendation results
in SJH system
Century Excellent Talents in University.

Fig.7 shows that using improved item-based algorithm, REFERENCES


the recommended results have higher similarity with user
used applied jobs. That means they are more relevant with
user’s preference before. [1] Pazzani M J, Billsus D. Content-based recommendation
systems[M]//The adaptive web. Springer Berlin Heidelberg, 2007:
325-341.
[2] Adomavicius G, Tuzhilin A. Toward the next generation of
V. CONCLUSION recommender systems: A survey of the state-of-the-art and possible
extensions[J]. Knowledge and Data Engineering, IEEE Transactions
On the basis of chapter 4, we evaluated user-based CF and on, 2005, 17(6): 734-749.I.S. Jacobs and C.P. Bean, “Fine particles,
item-based CF with different similarity calculation methods. thin films and exchange anisotropy,” in Magnetism, vol. III, G.T. Rado
Finally, we selected item-based CF algorithm for its better and H. Suhl, Eds. New York: Academic, 1963, pp. 271-350.
performance considering various factors. In addition, we also [3] Schafer J B, Frankowski D, Herlocker J, et al. Collaborative filtering
take co-apply users and user apply records in the past into recommender systems[M]//The adaptive web. Springer Berlin
Heidelberg, 2007: 291-324.
account. The test results indicate the improved recommender
[4] Sarwar B, Karypis G, Konstan J, et al. Item-based collaborative
with a rescorer is better than a traditional item-based one. Our filtering recommendation algorithms[C]//Proceedings of the 10th
student job hunting recommender achieved higher precision, international conference on World Wide Web. ACM, 2001: 285-295.
recall and F1 score. Furthermore, the recommended jobs are [5] Jannach D, Zanker M, Felfernig A, et al. Recommender systems: an
more relevant with students’ preferences. introduction[M]. Cambridge University Press, 2010.
[6] Anil R, Dunning T, Friedman E. Mahout in action[M]. Manning, 2011.
To further optimize the recommendation system and
[7] Dunning T. Accurate methods for the statistics of surprise and
ameliorate the sparsity of user profile, some methods of filling coincidence[J]. Computational linguistics, 1993, 19(1): 61-74.
users’ preference matrix can be utilized, for example, take
[8] Liang Xiang. Recommendation system in practice[J]. 2012.

538
539

You might also like