A work in progress!
A workshop presented as part of Kent State University Libraries' 2023 Digital Scholarship Series Showcase. Written by Kristin Yeager (Head) and Moira O'Neill (GA) from the Statistical Consulting Office.
The web is a rich source of data for many kinds of researchers, including applied mathematicians, natural and social scientists, literary scholars, historians, and artists. These data can be found in online newspapers and journals, on social media sites, in government databases, and in website metadata. Often this data is bound up in a website’s HTML structure and is not easily accessible or downloadable, so we need an alternative systematic way to retrieve it.
Web scraping is an automated process in which your computer contacts a website or web resource makes a copy of that website’s HTML or XML extracts the content of that HTML or XML as data.