John Little 2021-06-17
Using the rvest
library to learn about web crawling and HTML parsing in R.
- Introduce just enough HTML/CSS
- Introduce the
library(rvest)
package for harvesting websites/HTML - Tidyverse iteration with
purrr::map
Workshop Video: https://youtu.be/8ISc8V9GDAg
See Also: What to know about law & ethics when archiving & mining data by Rachael Samberg, J.D., MLIS Timothy Vollmer, MIS & the UC Berkeley Office of Scholarly Communication Services youtube playlists on navigating intellectual property, copyright, fair-use. Please note, the Samberg/Vollmer slides are found in this github repo’s slides folder and are redistributed with permission from the slide authors.
John Little https://JohnLittle.info https://Rfun.library.duke.edu https://library.duke.edu/data
Creative Commons Attribution-NonCommercial https://creativecommons.org/licenses/by-nc/4.0