A Global Book Reading Dataset
Abstract
:1. Introduction
2. Data Collection and Exploration
- Comma separate the string, checking only the last part of the string against a list of countries and state names, labeling the country if the value is on that list. This is because most people use the convention of mentioning their country as the last part of their address. A total of 96% of locations are detected in this manner.
- Comma separate the string, checking only the first part of the string against a list of country names, labeling the country if the value is on that list. Similar to the intuition of the last part, this time, consider those who start their address by writing their country name. A total of 0.07% of locations are detected in this manner.
- Input the entire string to GeoPy. A total of 0.06% of locations are detected in this manner.
Anonymity
3. Potential Use Cases
Data Limitations
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Billington, J.; Dowrick, C.; Hamer, A.; Robinson, J.; Williams, C. An Investigation into the Therapeutic Benefits of Reading in Relation to Depression and Well-Being. Liverpool: The Reader Organization, Liverpool Health Inequalities Research Centre. 2010. Available online: https://www.academia.edu/download/32364850/An_investigation_into_the_therapeutic_benefits_of_reading_in_relation_to_depression_and_well-being.pdf (accessed on 20 July 2021).
- Brown, B. The Ultimate Guide to Global Reading Habits (Infographic). 2017. Available online: https://geediting.com/world-reading-habits/ (accessed on 19 December 2020).
- Brown, B. World Reading Habits in 2018 (Infographic). 2018. Available online: https://geediting.com/world-reading-habits-2018/ (accessed on 19 December 2020).
- Perrin, A. Book Reading 2016. 2016. Available online: https://www.pewresearch.org/internet/2016/09/01/book-reading-2016/ (accessed on 20 December 2020).
- Perrin, A. Who Doesn’t Read Books in America? 2019. Available online: https://www.pewresearch.org/fact-tank/2019/09/26/who-doesnt-read-books-in-america/ (accessed on 20 December 2020).
- Perrin, A. One-in-Five Americans Now Listen to Audiobooks. 2019. Available online: https://www.pewresearch.org/fact-tank/2019/09/25/one-in-five-americans-now-listen-to-audiobooks/ (accessed on 20 December 2020).
- CNBC. Physical Books Still Outsell e-Books—And Here’s Why. 2019. Available online: https://www.cnbc.com/2019/09/19/physical-books-still-outsell-e-books-and-heres-why.html (accessed on 20 December 2020).
- Salmerón, L.; Arfé, B.; Avila, V.; Cerdán, R.; De Sixte, R.; Delgado, P.; Fajardo, I.; Ferrer, A.; García, M.; Gil, L.; et al. READ-COGvid: A Database From Reading and Media Habits During COVID-19 Confinement in Spain and Italy. Front. Psychol. 2020, 11, 2639. [Google Scholar] [CrossRef] [PubMed]
- Clement, J. Goodreads: Number of Registered Members 2011–2019. 2020. Available online: https://www.statista.com/statistics/252986/number-of-registered-members-on-goodreadscom/ (accessed on 19 December 2020).
- Thelwall, M.; Kousha, K. Goodreads: A social network site for book readers. J. Assoc. Inf. Sci. Technol. 2017, 68, 972–983. [Google Scholar] [CrossRef] [Green Version]
- Driscoll, B.; Rehberg Sedo, D. Faraway, so close: Seeing the intimacy in Goodreads reviews. Qual. Inq. 2019, 25, 248–259. [Google Scholar] [CrossRef]
- Hajibayova, L. Investigation of Goodreads’ reviews: Kakutanied, deceived or simply honest? J. Doc. 2019, 75, 612–626. [Google Scholar] [CrossRef]
- Kousha, K.; Thelwall, M.; Abdoli, M. Goodreads reviews to assess the wider impacts of books. J. Assoc. Inf. Sci. Technol. 2017, 68, 2004–2016. [Google Scholar] [CrossRef] [Green Version]
- Alghamdi, A.; Ihshaish, H. The use and impact of Goodreads rating and reviews, for readers of Arabic Books. Int. J. Bus. Inf. Syst. 2020. Available online: https://uwe-repository.worktribe.com/OutputFile/4448322 (accessed on 20 July 2021).
- Maity, S.K.; Panigrahi, A.; Mukherjee, A. Analyzing Social Book Reading Behavior on Goodreads and How It Predicts Amazon Best Sellers. In Influence and Behavior Analysis in Social Networks and Social Media. ASONAM 2018. Lecture Notes in Social Networks; Springer: Cham, Switzerland, 2018. [Google Scholar] [CrossRef] [Green Version]
- Goodreads. Goodreads API. 2020. Available online: https://www.goodreads.com/api (accessed on 19 December 2020).
- Jung, S.; Salminen, J.; Jansen, B.J. Name2GAN (Version 1.1) [Computer Software]. Qatar Computing Research Institute. 2020. Available online: https://quecst.qcri.org/tool/Name2GAN (accessed on 20 July 2021).
- Johnson, J. Distribution of the Online Audience of Goodreads.com in Great Britain (GB) in 2018, by Age Group and Gender. 2020. Available online: https://www.statista.com/statistics/490362/gb-online-audience-of-goodreads-com-by-age-group-and-gender/ (accessed on 21 December 2020).
Field | Description | Included in Public Dataset |
---|---|---|
User ID | A unique, numerical identifier for the user on the website. | A hashed version of the ID is made available. |
Name | Name of this user. In contrast to many other pseudonymous social networks, Goodreads users tend to use real names and even full names, as the input form has separate first, middle, and last name fields. | No |
Username | The username that the user has selected. This field is optional; name is the field each user must fill in to create an account. | No |
Profile Image | URL of the user’s profile picture. | No |
Friend Count | The number of friends that the user has. Being friends on Goodreads is a bidirectional property, independent of uni-directional following; 62% of users do not have any friends. | Yes |
Review Count | The total number of books added to any of the user’s shelves, in other words, the total number of books in the user’s automatically generated “all” shelf. Only 4.5% of users have more than 100 books in their shelves. | Yes |
Groups Count | The number of groups the user is part of. Some groups can be freely joined, for others the user needs to be admitted. | Yes |
Location | An optional self-reported location of the user. By default, Goodreads seems to infer a user’s country, presumably based on IP address. This selection can be changed later and a drop-down list of countries is available. Only 3.7% of users have left the field empty. | Self-reported locations are not reported but detected country-level values are. |
Age | Self-reported age of the user; 97% of users have not completed this value. | No |
Gender | Self-reported gender of the user. Only 7735 have filled in this value. Options, from a drop-down, include male, female and custom, which supports free-text. | Inferred gender values are included, but not the self-reported ones. |
About | An optional self-description of the user. | The numerical length of this section is included, but not the textual content. |
Favorite Authors | Favorite authors of the user. | Yes, but author IDs are replaced by hashed values. |
Website | An optional field, allowing users to share their website or any other link. | No |
Joined | The month and year in which the user joined the platform. | Yes |
Last Active | The month and year that the user was last active on this website (since our collection was conducted in 2020, dates within this year do not necessarily indicate that the user has abandoned the website). | Yes |
Field | Description | Included in Public Dataset |
---|---|---|
Book ID | A unique identifier for the book this review is about. | A hashed version of the ID is provided. |
Rating | A numerical rating, taking integer values from 1 to 5. Ratings are optional and can be left empty. Only 47.8% of book additions include ratings. | Yes |
Shelve Names | Users are able to make different shelves. While there are no restrictions on the shelves you are allowed to create, it is at times viewed as genres or tags used for recommendations. | Yes, but shelf names used by fewer than 200 distinct users are replaced by small-count to prevent the tracking of users with a certain taste. |
Spoiler Flag | A Boolean flag indicating if the review contains spoilers (the flag is set by the user) | Yes |
Review Body | The text of the review. | No, but the character length of the text is provided. |
Likes | Number of likes for the review. | Yes |
Date Added | A systematically generated date of when the user first added this book to one of their shelves. | Yes |
Date Updated | A systematically generated date of the last time the user updated this book. | Yes |
Started At | An optional user-inputted date indicating when the user started reading the book. | Yes |
Read At | An optional user-inputted date indicating when the user finished reading the book. | Yes |
Owned | Whether the user owns the book. | Yes |
Read Count | Number of times this book was read by this user (re-reads are possible on the platform). | Yes |
Instance | Count |
---|---|
Users | 1,872,677 |
Books | 3,594,304 |
Book Additions (Referred to as Reviews) | 41,253,535 |
Book Additions (Reviews) with Rating | 19,852,290 |
Female | Male |
---|---|
read | read |
to-read | to-read |
currently-reading | currently-reading |
favorites | fiction |
fiction | fantasy |
fantasy | favorites |
romance | owned |
own | history |
non-fiction | own |
young-adult | science-fiction |
Shelf | Number of Users | Shelf | Number of Users | Shelf | Number of Users |
---|---|---|---|---|---|
to-read | 1,087,410 | manga | 550 | cookbooks | 313 |
read | 758,974 | humor | 546 | didn-t-finish | 304 |
currently-reading | 309,831 | my-books | 542 | school | 301 |
favorites | 5959 | 2015 | 531 | dystopia | 301 |
fantasy | 2420 | favourites | 525 | childrens | 300 |
fiction | 2037 | psychology | 523 | plays | 299 |
nan | 1968 | business | 521 | library | 286 |
non-fiction | 1831 | books | 511 | economics | 272 |
classics | 1561 | memoir | 508 | chick-lit | 267 |
history | 1322 | comics | 506 | want-to-read | 266 |
poetry | 1309 | owned | 486 | suspense | 265 |
romance | 1214 | self-help | 449 | children | 263 |
mystery | 1197 | wishlist | 429 | sports | 262 |
historical-fiction | 1184 | to-buy | 422 | series | 261 |
2020 | 1101 | 2014 | 418 | drama | 258 |
dnf | 1078 | travel | 407 | novels | 251 |
2019 | 1035 | contemporary | 406 | urban-fantasy | 247 |
young-adult | 959 | audiobook | 404 | parenting | 245 |
biography | 898 | politics | 402 | mythology | 240 |
2018 | 879 | historical | 400 | 2012 | 239 |
sci-fi | 869 | dystopian | 397 | kids | 236 |
science-fiction | 848 | crime | 394 | literature | 233 |
horror | 846 | religion | 392 | couldn-t-finish | 232 |
abandoned | 823 | re-read | 382 | favorite | 225 |
nonfiction | 746 | kindle | 378 | maybe | 224 |
philosophy | 740 | graphic-novel | 373 | feminism | 222 |
2017 | 730 | on-hold | 371 | education | 219 |
science | 711 | books-i-own | 360 | ebook | 219 |
1 | 683 | paranormal | 359 | writing | 218 |
did-not-finish | 667 | adventure | 355 | vampires | 217 |
book-club | 661 | audiobooks | 355 | food | 215 |
thriller | 658 | unfinished | 350 | first-reads | 214 |
2016 | 615 | art | 344 | comedy | 214 |
own | 590 | music | 332 | picture-books | 212 |
short-stories | 590 | classic | 323 | children-s-books | 209 |
graphic-novels | 562 | reference | 317 | health | 204 |
ya | 553 | 2013 | 314 | true-crime | 201 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sabri, N.; Weber, I. A Global Book Reading Dataset. Data 2021, 6, 83. https://doi.org/10.3390/data6080083
Sabri N, Weber I. A Global Book Reading Dataset. Data. 2021; 6(8):83. https://doi.org/10.3390/data6080083
Chicago/Turabian StyleSabri, Nazanin, and Ingmar Weber. 2021. "A Global Book Reading Dataset" Data 6, no. 8: 83. https://doi.org/10.3390/data6080083