-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
handling of ties in score #18
Comments
@eugene-yang brought this up yesterday. This is important because different providers do this differently, so they can essentially be scoring based on different rankings. Both @eugene-yang @andrewyates -- what do you think would be the desired handling of this? Follow trec_eval's lead by always sorting by Here are my thoughts: I think the easiest way to standardize this would be to reassign run scores based on sorting by score and docid before sending to the provider. I believe all measures currently supported only consider the order, so this should only change the behaviour for ties. If different tie-breaking behaviour is desired, it can be done by whatever calls it. (E.g., pyterrier could provide its assigned ranks as the scores, rather than the scores themselves.) Worth considering: Maybe we don't want to discard the order explicitly provided in the run file. In the above example, doc 0 is explicitly ranked above doc 1, based on the rank attribute. So in some sense, it's wrong for trec_eval to re-sort them. This was an explicit decision in trec_eval though:
And IIRC a prior version of trec_eval only used the provided ranks and discarded the score. This would be an easy change on the surface. Basically, just replace the scores when reading trec run files with the ranks. But then would ties in ranks be broken by docid? Score (desc) then docid? Same problem. So maybe it's just best to stick with the scores, since trec_eval sets the precedent for ignoring rank. |
classical trec_eval is indeed qid, score (desc), docno (asc); TREC pooling, coincidentally is qid, score (desc), docno (desc). I dont support bringing rank attributes into this, as this means that people get unexpected performances at TREC if their scores dont match their ranks. Isnt the tie-breaking just an implementation decision by the underlying provider? |
Fair point RE implementation details of the providers themselves. I suppose the risk is that providers A and B end up scoring essentially different rankings due to ties, so the results may not align E.g., let's say P@1 is calculated by trec_eval and gives a score of 0.0 for a particular query. Meanwhile, ERR provided by gdeval could give a score of 1.0 due to different tie-breaking. To the user, these results would seem to be incompatible. Documenting the tie-breaking behaviour of each provider would be another option. |
You can omit a warning if ties are present AND there are two different providers are involved in measuring effectiveness. |
Good idea |
It looks like trec_eval handles ties by sorting (descending) by doc id. It would be nice to document this and also to standardize the behavior across dataset providers, if possible.
The text was updated successfully, but these errors were encountered: