A Research of Job Recommendation System Based On Collaborative Filtering
A Research of Job Recommendation System Based On Collaborative Filtering
A Research of Job Recommendation System Based On Collaborative Filtering
Abstract—Dealing with the enormous amount of recruiting Section 4 shows results and evaluation of experiments and
information on the Internet, a job seeker always spends hours to tests. Section 5 concludes all aspects of the implementation.
find useful ones. To reduce this laborious work, we design and
implement a recommendation system for online job-hunting. In II. RELATED WORK
this paper, we contrast user-based and item-based collaborative
filtering algorithm to choose a better performed one. We also A. Recommendation Algorithms
take background information including students’ resumes and 1) Content-based filtering (CBF):
details of recruiting information into consideration, bring In Content-based methods[1], features of items are
weights of co-apply users (the users who had applied the abstract and compared with a profile of the user’s preference.
candidate jobs) and weights of student used-liked jobs into the In other words, this algorithm tries to recommend items that
recommendation algorithm. At last, the model we proposed is are similar to those that a user liked in the past. It is widely
verified through experiments study which is using actual data. applied in information retrieval(IR). However it performs
The recommended results can achieve higher score of precision
badly in multimedia field such as music or movie
and recall, and they are more relevant with users’ preferences
recommendation because it is hard to extract items attributes
before.
and obtain user’s preference sometimes.
Keywords-recommendation system; item-based collaborative 2) Collaborative Filtering (CF):
filtering; content-based filtering; Vector Space Model(VSM); CF is a popular recommendation algorithm that bases its
Mahout predictions and recommendations on the ratings or behavior
I. INTRODUCTION of other users in the system[2]. There are two basic type:
User-based CF and Item-based CF.
The increasing usage of Internet has heightened the need
for online job hunting. In the year of 2013, the amount of • User-based CF: find other users whose past rating
people who searched jobs on www.ganji.com is almost a behavior is similar to that of the current user and use
billion. According to Jobvite’s report 2014, 68% of online their ratings on other items to predict what the current
jobseekers are college graduates or post graduates. The key user will like.
problem is that most of job-hunting websites just display
• Item-based CF: Rather than using similarities between
recruitment information to website viewers. Students have to
users’ rating behavior to predict preferences, item–
retrieve among all the information to find jobs they want to
based CF uses similarities between the rating patterns
apply. The whole procedure is tedious and inefficient. In
of items[3]. Since finding similar items is easier than
addition, many E-commerce websites, the most general
finding similar users, and attributes of items are more
application of recommendation algorithms, uses collaborative
stable than users’ preference, item-based methods are
filtering algorithm without considering user’s resume and
suitable for off-line computing[4].
item’s properties----in this case, that means students’ resume
and details of recruiting information. So we proposed an Collaborative Filtering approaches often suffer from
improved algorithm based on item-based collaborative three problems: cold start, scalability and sparsity[3].
filtering. The aim of the present paper is to give an effective
method of recommendation for online job hunting. We hope B. Methods of Similarity Calculation
to offering students a personalized service that can help them 1) Cosine Similarity
find ideal jobs quickly and conveniently. Even so, the sparsity Cosine similarity uses two N-dimensional vector’s cosine
of user profile can be obstructive, further studies on filling value to indicate the degree of similarity between them. It is
users’ preference matrix with implicit behaviors will be widely used in information retrieval(IR).
summarized in our next study.
a ⋅b (1)
The remainder of this paper is divided into four sections. sim(a ⋅ b ) =
Section 2 introduces the related work about recommendation | a |⋅|b |
systems. Section 3 describes the design and algorithm of our 2) Tanimoto Coefficient
recommendation system for college students job hunting.
534
535
aggravate the weights(7) of candidates that similar to user- A. Data Set
used-liked ones. The data set we used is collected from a job hunting web-
site during the years of 2012 and 2013, including 2,503 jobs,
∑
y
sim ( Item m , Item j ) (7)
w h ( Item j ) = α +
m =1
y
7,610 students resumes, and 9,924 job apply records from
6,892 students. The structure of our data base is shown as
Itemm is a job that the user applied in the past. fallow:
∑ sim( Itemm , Item j ) stands for the average similarity between
y
m =1
1) Job apply records:
y
APPLY ID STUDENT ID JOB ID APPLY DATE
candidate job Itemj and user-used-liked jobs. α equals 1.
2) Details of Jobs:
2) Weight of co-apply users JOB JOB LOCATION TYPE FIELD CATEGORY
ID NAME NAME
We compare the users who had applied the candidate job
3) Student resumes:
Itemj (namely co-apply users) to the current user Ui, then STUDEN COLLEGE MAJOR DEGREE HOME GENDER
aggravate the weights(8) of candidates that has higher co- T ID PLACE
apply users’ similarities.
B. Indicators
∑
z
sim (U n ,U i ) (8)
w (U
c i
)=β + n =1
1) Precision
z
Precision is the fraction of recommended items that are
Un is one of the co-apply users in {U1…Uz}, ∑n=1 sim(Un ,Ui)
z
relevant to the user, as (12) shows[8]:
z
means the average co-apply user similarity of a candidate job. | hits | (12)
P=
| recset |
Then, we have summarized the rescore preference grading
formula as (9): 2) Recall
Recall is the fraction of the items that are relevant to the
pref (U i , Itemj ) = pref (U , Item ) ∗ w ( Item ) ∗ w (U ) (9)
o i j h j c i user that are successfully recommended, as (13) shows[8]:
D. Similarity Method Dealing with Text | hits | (13)
R =
In student job hunting system, student resume information | testset |
and job descriptions are stored in the form of text in the
database. To compare the similarity between two pieces of 3) F1-Measure
information, we represent each piece of information as space F1 score is the harmonic mean of precision and recall.
vector and use cosine similarity distance calculation. 2∗ P∗ R
F1 = (14)
For example, job description is expressed as a vector like P+R
this: (job name, location, job type, field, category name). It is C. Contrast of User-based and Item-based Algorithm
represent by J = ( j , j , j , j , j ) ; student resume is expressed as
1 2 3 4 5 We evaluate user-based CF and item-based CF on our
a vector like this: (college, major, degree, home place, data set respectively. We use Log likelihood, City Block and
gender). It is represent by S = (s , s , s , s , s , s ) . Tanimoto similarity methods to recommend three jobs for
1 2 3 4 5 6
each user. In the user-based algorithm, the neighborhood
The similarity between two jobs or two students can be number is ten. The precision and recall is evaluated.
calculated by the formula (10) and (11):
TABLE I. PRECISION AND RECALL OF DIFFERENT RECOMMEDERS
sim ( J 1 , J 2 ) = cos(θ j ) (10)
Recommender(r_num=3) Similarity Precision Recall
User-Based CF Log
sim ( S 1 , S 2 ) = cos( θ u ) (11) (n=10) likelihood
62.82% 53.85%
City Block 83.33% 56.41%
Tanimoto 65.38% 53.85%
Item-Based CF Log
IV. EXPERIMENTS 58.33% 58.33%
likelihood
In this section, user-based and item-based CF algorithms City Block 0.00% 0.00%
Tanimoto 41.67% 41.67%
are tested on our data set respectively. Then item-based CF,
the better performed one, to be applied on the Student Job
Hunting recommendation system. At last, we evaluate the
performance of improved recommender that using used-liked
job and co-apply users weights based on item-based algorithm.
The implementation of our experiments is based on Apache
Mahout.
535
536
Figure. 4. Recall of user-based CF with different similarity methods
536
537
NaN(Not a Number) means there is no valid As Fig.5 showed, when recommending three jobs for each
recommendation result on current user. Due to the apply student, the improved recommender had a little promotion at
records are very sparse, the user-based recommender can not precision, recall and F1 score. When the number of
offer valid recommendation service for every user. Thus it can recommended item came to two, as Fig.6 showed, all these
be seen in Student Job Hunting system, user-based algorithm three indicator score increased significantly. And the Reach
is unsatisfactory. rate remains as before. Because the sparseness of our apply
records dataset, recommender can only offer 3 or 4
E. Evaluation of Item-based CF recommended results for some students. To evaluate the
According to the result of section IV.C, when using Log overall recommender’s performance, we considered the
likelihood similarity method, item-based algorithm performed number of recommended items as three and two. If user U has
well. So we decided to use item-based CF and selected Log three recommended jobs---(job 1, job 2, job 3), when
likelihood method to compute candidate items’ similarities in recommending jobs for U, the improved recommender just re-
the Student Job Hunting recommendation system. ranking the three jobs recommended from the traditional
recommender, so that it has no influence on the precision,
1) The performance of improved recommender: We
recall and F1 score. However, when evaluating the Top 2
evaluate the capability of original item-based recommender recommended jobs, the improved recommender changed the
and the improved recommender that takes co-apply users’ order of these three jobs, so that the top 2 jobs are different
weight and used-liked jobs’ weight into account when from former ones. The increased scores suggest that the
recommending two or three jobs for each student. The r_num improved recommender works well for the reason that jobs
stands for number of recommended items. The results are take precedence (1st and 2nd recommended jobs) are better
recorded in Table V and Table VI. than latter ones (3rd recommended job).
537
538
advantage of students’ implicit behaviors in process of job
hunting, which need further research.
ACKNOWLEDGMENTS
The work on this paper was supported by National Key
Technology R&D Program of China (2012BAH17F01-01,
2012AA011702-02, 2012BAH37F03, 2012BAH02B03), the
National Culture S&T Promotion Program (WHB1002) and
the SARFT Scientific and Technological Project (2012-
20,2011-34,2012-27). Supported by Program for New
Figure. 7. Similarity between user’s preference and recommendation results
in SJH system
Century Excellent Talents in University.
538
539