TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

yahoo_ltrc

Description:

The Yahoo Learning to Rank Challenge dataset (also called "C14") is a Learning-to-Rank dataset released by Yahoo. The dataset consists of query-document pairs represented as feature vectors and corresponding relevance judgment labels.

The dataset contains two versions:

set1: Containing 709,877 query-document pairs.
set2: Containing 172,870 query-document pairs.

You can specify whether to use the set1 or set2 version of the dataset as follows:

ds = tfds.load("yahoo_ltrc/set1")
ds = tfds.load("yahoo_ltrc/set2")

If only yahoo_ltrc is specified, the yahoo_ltrc/set1 option is selected by default:

# This is the same as `tfds.load("yahoo_ltrc/set1")`
ds = tfds.load("yahoo_ltrc")

Homepage: https://research.yahoo.com/datasets
Source code: tfds.ranking.yahoo_ltrc.YahooLTRC
Versions:
- 1.0.0: Initial release.
- 1.1.0 (default): Add query and document identifiers.
Download size: Unknown size
Manual download instructions: This dataset requires you to download the source data manually into download_config.manual_dir (defaults to ~/tensorflow_datasets/downloads/manual/):
Request access for the C14 Yahoo Learning To Rank Challenge dataset on https://research.yahoo.com/datasets Extract the downloaded dataset.tgz file and place the ltrc_yahoo.tar.bz2 file in manual_dir/.
Supervised keys (See as_supervised doc): None
Figure (tfds.show_examples): Not supported.
Citation:

@inproceedings{chapelle2011yahoo,
  title={Yahoo! learning to rank challenge overview},
  author={Chapelle, Olivier and Chang, Yi},
  booktitle={Proceedings of the learning to rank challenge},
  pages={1--24},
  year={2011},
  organization={PMLR}
}

yahoo_ltrc/set1 (default config)

Dataset size: 795.39 MiB
Auto-cached (documentation): No
Splits:

Split	Examples
`'test'`	6,983
`'train'`	19,944
`'vali'`	2,994

Feature structure:

FeaturesDict({
    'doc_id': Tensor(shape=(None,), dtype=int64),
    'float_features': Tensor(shape=(None, 699), dtype=float64),
    'label': Tensor(shape=(None,), dtype=float64),
    'query_id': Text(shape=(), dtype=string),
})

Feature documentation:

Feature	Class	Shape	Dtype
	FeaturesDict
doc_id	Tensor	(None,)	int64
float_features	Tensor	(None, 699)	float64
label	Tensor	(None,)	float64
query_id	Text		string

Examples (tfds.as_dataframe):

yahoo_ltrc/set2

Dataset size: 194.92 MiB
Auto-cached (documentation): Yes
Splits:

Split	Examples
`'test'`	3,798
`'train'`	1,266
`'vali'`	1,266

Feature structure:

FeaturesDict({
    'doc_id': Tensor(shape=(None,), dtype=int64),
    'float_features': Tensor(shape=(None, 700), dtype=float64),
    'label': Tensor(shape=(None,), dtype=float64),
    'query_id': Text(shape=(), dtype=string),
})

Feature documentation:

Feature	Class	Shape	Dtype
	FeaturesDict
doc_id	Tensor	(None,)	int64
float_features	Tensor	(None, 700)	float64
label	Tensor	(None,)	float64
query_id	Text		string

Examples (tfds.as_dataframe):