handling of ties in score #18

andrewyates · 2021-06-07T08:55:27Z

It looks like trec_eval handles ties by sorting (descending) by doc id. It would be nice to document this and also to standardize the behavior across dataset providers, if possible.

qrels:
0 0 0 0
0 0 1 1

run:
0 Q0 0 0 0 run
0 Q0 1 1 0 run

./trec_eval qrels run -m P.1
P_1 all 1.0000

seanmacavaney · 2021-10-12T09:02:03Z

@eugene-yang brought this up yesterday.

This is important because different providers do this differently, so they can essentially be scoring based on different rankings.

Both @eugene-yang @andrewyates -- what do you think would be the desired handling of this? Follow trec_eval's lead by always sorting by score (desc), docid (desc)? score (desc), docid (asc)? Rank if available? Something else?

Here are my thoughts:

I think the easiest way to standardize this would be to reassign run scores based on sorting by score and docid before sending to the provider. I believe all measures currently supported only consider the order, so this should only change the behaviour for ties. If different tie-breaking behaviour is desired, it can be done by whatever calls it. (E.g., pyterrier could provide its assigned ranks as the scores, rather than the scores themselves.)

Worth considering: Maybe we don't want to discard the order explicitly provided in the run file. In the above example, doc 0 is explicitly ranked above doc 1, based on the rank attribute. So in some sense, it's wrong for trec_eval to re-sort them. This was an explicit decision in trec_eval though:

In particular, note that the rank field is ignored here; internally ranks are assigned by sorting by the sim field with ties broken deterministicly [sic] (using docno).

And IIRC a prior version of trec_eval only used the provided ranks and discarded the score.

This would be an easy change on the surface. Basically, just replace the scores when reading trec run files with the ranks. But then would ties in ranks be broken by docid? Score (desc) then docid? Same problem. So maybe it's just best to stick with the scores, since trec_eval sets the precedent for ignoring rank.

cmacdonald · 2021-10-12T09:43:19Z

classical trec_eval is indeed qid, score (desc), docno (asc); TREC pooling, coincidentally is qid, score (desc), docno (desc). I dont support bringing rank attributes into this, as this means that people get unexpected performances at TREC if their scores dont match their ranks. Isnt the tie-breaking just an implementation decision by the underlying provider?

seanmacavaney · 2021-10-12T09:51:28Z

Fair point RE implementation details of the providers themselves. I suppose the risk is that providers A and B end up scoring essentially different rankings due to ties, so the results may not align E.g., let's say P@1 is calculated by trec_eval and gives a score of 0.0 for a particular query. Meanwhile, ERR provided by gdeval could give a score of 1.0 due to different tie-breaking. To the user, these results would seem to be incompatible.

Documenting the tie-breaking behaviour of each provider would be another option.

cmacdonald · 2021-10-12T13:47:56Z

You can omit a warning if ties are present AND there are two different providers are involved in measuring effectiveness.

seanmacavaney · 2021-10-12T13:54:30Z

Good idea

seanmacavaney changed the title ~~handling of ties~~ handling of ties in score Jun 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

handling of ties in score #18

handling of ties in score #18

andrewyates commented Jun 7, 2021

seanmacavaney commented Oct 12, 2021

cmacdonald commented Oct 12, 2021

seanmacavaney commented Oct 12, 2021 •

edited

Loading

cmacdonald commented Oct 12, 2021 •

edited

Loading

seanmacavaney commented Oct 12, 2021

handling of ties in score #18

handling of ties in score #18

Comments

andrewyates commented Jun 7, 2021

seanmacavaney commented Oct 12, 2021

cmacdonald commented Oct 12, 2021

seanmacavaney commented Oct 12, 2021 • edited Loading

cmacdonald commented Oct 12, 2021 • edited Loading

seanmacavaney commented Oct 12, 2021

seanmacavaney commented Oct 12, 2021 •

edited

Loading

cmacdonald commented Oct 12, 2021 •

edited

Loading