Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handling of ties in score #18

Open
andrewyates opened this issue Jun 7, 2021 · 5 comments
Open

handling of ties in score #18

andrewyates opened this issue Jun 7, 2021 · 5 comments

Comments

@andrewyates
Copy link

It looks like trec_eval handles ties by sorting (descending) by doc id. It would be nice to document this and also to standardize the behavior across dataset providers, if possible.

qrels:
0 0 0 0
0 0 1 1

run:
0 Q0 0 0 0 run
0 Q0 1 1 0 run

./trec_eval qrels run -m P.1
P_1 all 1.0000

@seanmacavaney seanmacavaney changed the title handling of ties handling of ties in score Jun 7, 2021
@seanmacavaney
Copy link
Collaborator

@eugene-yang brought this up yesterday.

This is important because different providers do this differently, so they can essentially be scoring based on different rankings.

Both @eugene-yang @andrewyates -- what do you think would be the desired handling of this? Follow trec_eval's lead by always sorting by score (desc), docid (desc)? score (desc), docid (asc)? Rank if available? Something else?

Here are my thoughts:

I think the easiest way to standardize this would be to reassign run scores based on sorting by score and docid before sending to the provider. I believe all measures currently supported only consider the order, so this should only change the behaviour for ties. If different tie-breaking behaviour is desired, it can be done by whatever calls it. (E.g., pyterrier could provide its assigned ranks as the scores, rather than the scores themselves.)

Worth considering: Maybe we don't want to discard the order explicitly provided in the run file. In the above example, doc 0 is explicitly ranked above doc 1, based on the rank attribute. So in some sense, it's wrong for trec_eval to re-sort them. This was an explicit decision in trec_eval though:

In particular, note that the rank field is ignored here; internally ranks are assigned by sorting by the sim field with ties broken deterministicly [sic] (using docno).

And IIRC a prior version of trec_eval only used the provided ranks and discarded the score.

This would be an easy change on the surface. Basically, just replace the scores when reading trec run files with the ranks. But then would ties in ranks be broken by docid? Score (desc) then docid? Same problem. So maybe it's just best to stick with the scores, since trec_eval sets the precedent for ignoring rank.

@cmacdonald
Copy link
Collaborator

classical trec_eval is indeed qid, score (desc), docno (asc); TREC pooling, coincidentally is qid, score (desc), docno (desc). I dont support bringing rank attributes into this, as this means that people get unexpected performances at TREC if their scores dont match their ranks. Isnt the tie-breaking just an implementation decision by the underlying provider?

@seanmacavaney
Copy link
Collaborator

seanmacavaney commented Oct 12, 2021

Fair point RE implementation details of the providers themselves. I suppose the risk is that providers A and B end up scoring essentially different rankings due to ties, so the results may not align E.g., let's say P@1 is calculated by trec_eval and gives a score of 0.0 for a particular query. Meanwhile, ERR provided by gdeval could give a score of 1.0 due to different tie-breaking. To the user, these results would seem to be incompatible.

Documenting the tie-breaking behaviour of each provider would be another option.

@cmacdonald
Copy link
Collaborator

cmacdonald commented Oct 12, 2021

You can omit a warning if ties are present AND there are two different providers are involved in measuring effectiveness.

@seanmacavaney
Copy link
Collaborator

Good idea

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants