MovieLens

GroupLens Research has collected and made available rating data sets from the MovieLens web site (https://movielens.org). The data sets were collected over various periods of time, depending on the size of the set. Before using these data sets, please review their README files for the usage licenses and other details.

Seeking permission? If you are interested in obtaining permission to use MovieLens datasets, please first read the terms of use that are included in the README file. Then, please fill out this form to request use. We typically do not permit public redistribution (see Kaggle for an alternative download location if you are concerned about availability).

recommended for new research

MovieLens 32M

MovieLens 32M movie ratings. Stable benchmark dataset. 32 million ratings and two million tag applications applied to 87,585 movies by 200,948 users. Collected 10/2023 Released 05/2024

README.txt
ml-32m.zip (size: 239 MB, checksum)

Permalink: https://grouplens.org/datasets/movielens/32m/

MovieLens Tag Genome Dataset 2021

10.5 million computed tag-movie relevance scores from a pool of 1,084 tags applied to 9,734 movies. Released 12/2021. This dataset also contains input necessary to generate the tag genome using both the original process (Vig et al. 2012) and a more recent improvement (Kotkov et al. 2021)

genome_2021_readme.txt
genome_2021.zip (size: 1.8GB)

Permalink: https://grouplens.org/datasets/movielens/tag-genome-2021

recommended for education and development

MovieLens Latest Datasets

These datasets will change over time, and are not appropriate for reporting research results. We will keep the download links stable for automated downloads. We will not archive or make available previously released versions.

Small: 100,000 ratings and 3,600 tag applications applied to 9,000 movies by 600 users. Last updated 9/2018.

README.html
ml-latest-small.zip (size: 1 MB)

Full: approximately 33,000,000 ratings and 2,000,000 tag applications applied to 86,000 movies by 330,975 users. Includes tag genome data with 14 million relevance scores across 1,100 tags. Last updated 9/2018.

README.html
ml-latest.zip (size: 335 MB)

Permalink: https://grouplens.org/datasets/movielens/latest/

synthetic datasets

MovieLens 1B Synthetic Dataset

MovieLens 1B is a synthetic dataset that is expanded from the 20 million real-world ratings from ML-20M, distributed in support of MLPerf. Note that these data are distributed as .npz files, which you must read using python and numpy.

The code for the expansion algorithm is available here: https://github.com/mlperf/training/tree/master/data_generation

To create the dataset above, we ran the algorithm (using commit 1c6ae725a81d15437a2b2df05cac0673fde5c3a4) as described in the README under the section “Running instructions for the recommendation benchmark”.

Permalink: https://grouplens.org/datasets/movielens/movielens-1b/

older datasets

MovieLens 100K Dataset

MovieLens 100K movie ratings. Stable benchmark dataset. 100,000 ratings from 1000 users on 1700 movies. Released 4/1998.

Permalink: https://grouplens.org/datasets/movielens/100k/

MovieLens 1M Dataset

MovieLens 1M movie ratings. Stable benchmark dataset. 1 million ratings from 6000 users on 4000 movies. Released 2/2003.

README.txt
ml-1m.zip (size: 6 MB, checksum)

Permalink: https://grouplens.org/datasets/movielens/1m/

MovieLens 10M Dataset

MovieLens 10M movie ratings. Stable benchmark dataset. 10 million ratings and 100,000 tag applications applied to 10,000 movies by 72,000 users. Released 1/2009.

README.txt
ml-10m.zip (size: 63 MB, checksum)

Permalink: https://grouplens.org/datasets/movielens/10m/

MovieLens 20M Dataset

MovieLens 20M movie ratings. Stable benchmark dataset. 20 million ratings and 465,000 tag applications applied to 27,000 movies by 138,000 users. Includes tag genome data with 12 million relevance scores across 1,100 tags. Released 4/2015; updated 10/2016 to update links.csv and add tag genome data.

README.txt
ml-20m.zip (size: 190 MB, checksum)

Also see the MovieLens 20M YouTube Trailers Dataset for links between MovieLens movies and movie trailers hosted on YouTube.

Permalink: https://grouplens.org/datasets/movielens/20m/

MovieLens Tag Genome Dataset 2014

11 million computed tag-movie relevance scores from a pool of 1,100 tags applied to 10,000 movies. Released 3/2014.

Also consider using the MovieLens 20M or latest datasets, which also contain (more recent) tag genome data or the Tag Genome 2021 dataset.

README.html
tag-genome.zip (size: 41 MB)

Permalink: https://grouplens.org/datasets/movielens/tag-genome/

MovieLens 25M Dataset

MovieLens 25M movie ratings. Stable benchmark dataset. 25 million ratings and one million tag applications applied to 62,000 movies by 162,000 users. Includes tag genome data with 15 million relevance scores across 1,129 tags. Released 12/2019

README.txt
ml-25m.zip (size: 250 MB, checksum)

Permalink: https://grouplens.org/datasets/movielens/25m/