Parikh IdentifyTagsFromMillionsOfTextQuestion PDF
Parikh IdentifyTagsFromMillionsOfTextQuestion PDF
Page 2
Figure 3: Logistic loss function, mean P = 0.64, R = 0.42
Page 3
For example, api, list, file. This could have multiple
B. Some results from the tag suggestion connotations and don’t particularly belong to a particular
language or a tag set.
Original Tags Suggested Tags
Php, image-proecessing, file- Image, file, php D. Suggested tags performance
upload, upload, mime-types To measure the performance of the entire algorithm to
Firefox Firefox, windows predict the suggested tags, we run through the test set of
R, matlab, machine-learning Ubuntu, apache, networking 100,000 samples through Top500 classifier set. Following is
C#, url, encoding C#, string, json the result from the run
Php, api, file-get-contents Php, api, file
Core-plot Ios, iphone TP = 53219 P = 0.6439
C#, asp.net, windows-phone- Windows, asp.net, c# FP = 29426 R = 0.2551
7 FN = 206221 F1 = 0.3648
.net, javascript, code- Javascript, c#, linq
generatio Table 5: Final result of F1score
Visual-studio, makefile, gnu Visual-studio, file
Html, semantic, line-breaks Html The low recall is some what expected, as we are not
classifying from the entire set of 43k tags.
Table 3: Original and suggested tags for questions from
test set E. Kaggle Submission:
The algorithm described in the above section was
Note in the above results, the tags are predicted from the submitted to the Kaggle, where the test set contains over 2
million test question. The competition is particularly intense,
Top500 tags classifier set.
as Facebook is conducting it for recruiting. The competition
We can set that the top tags such as c#, php are being
ends on 12/20/2013 and as of 12/12 the algorithm described
predicted with a very high accuracy, where as lower
occurring tags which are not part of the Top500 tags are above had the standing of 74th out of 310 total teams.
missed or being predicted with some synonym from the
Top500 set. The example for that being makefile -> file, The mean F1 score of the submission using the methods
windows-phone-7 -> windows. describe above was 0.71132 compared to the top of 0.81539.
Note the high values of F1 score compared to above result
are largely due to overlap of Test set in the Training data set.
C. Classifiers performance
Thus for the questions in Test set if they belonged in
From the figure 4 we see that we can build a fairly
Training set, then the same tags were predicted for them.
accurate classifier with a mean Precision of 0.76 and Recall
of 0.72. This value is obtained from Test set of 100,000
samples which were not part of the training set. IV. CONCLUSION
Vowpal Wabbit was used extensively in the development
The tags for which the precision was lowest in the
of classifiers and its sparse input format, hashing trick and
Top100 set were; file, windows, forms, list, api, oop, class.
particularly vw-varinfo wrapper had been very useful to
The precision and recall for them is shown in the table
debug the models and come up with valid features. The
below.
hinge loss function works much better than logistic loss
function.
Tag Precision Recall
File 0.2044 0.5350 As discussed in the earlier sections, it is possible to build
Windows 0.3284 0.7286 highly accurate classifier for each of the tags in the Training
Forms 0.3526 0.7404 set. The precision is higher for specific tags such as php,
List 0.2594 0.7447 python and it decreases for generic tags such as file, java etc.
Oop 0.3620 0.7159 The results show that average precision of 0.76 is obtained
Api 0.2142 0.7138 for the tags in Top500 set. The recall is particularly low in
the results, since we are not predicting tags from the entire
Table 4: Tags with lowest precision values in Top100 tag set.
Page 4
Precision and Recall values vary. The expectation is that [3] Bird, Steven, Edward Loper and Ewan Klein (2009),
mean F1 score should go up by few percentage points. Natural Language Processing with Python. O’Reilly
Media Inc.
The other thing to try out could be to add more features [4] J. Langford, L. Li, and A. Strehl. Vowpal wabbit online
so as to improve accuracy of the existing Top500 tags. Also learning project, http://hunch.net/?p=309, 2007.
could look at techniques such as LDA to give us a list of [5] Wang Jian, Davidson Brian, Explorations in tag
topics for documents, which could be, then used a feature. suggestion and query expansion,” in Proceedings of the
2008 ACM workshop on Search in social media, ser.
REFERENCES SSM ’08. New York, NY, USA: ACM, 2008, pp.
[1] Kaggle competition Facebook, Keyword Extraction [6] Saha A, Saha R, Schineider K, A discriminative model
http://kaggle.com/c/facebook-recruiting-iii-keyword- approach for suggesting tags
extraction
[2] Kaggle leaderboard,
http://www.kaggle.com/c/facebook-recruiting-iii-
keyword-extraction/leaderboard
Page 5