Digital Narratives of COVID-19: A Twitter Dataset for Text Analysis in Spanish

Our project Digital Narrative of COVID-19, funded by the University of Miami, has been working in different fronts and part of our work been able to publish in the Journal of Open Humanities Data, under the title Digital Narratives of COVID-19: A Twitter Dataset for Text Analysis in Spanish.This is a data paper were we discuss the process and nature or our dataset. Here is the abstract of the paper:

Digital Narratives of COVID-19 (DHCovid) offers a curated Twitter corpus of digital conversations about the Coronavirus pandemic. The dataset is collected through a script via Twitter’s Application Programming Interface (API) starting on April 24^th, 2020, and stored on GitHub as an open access repository of tweet identifiers that can be consulted, downloaded, and reused by scholars interested in Natural Language Processing (NLP), topic modelling, and other quantitative methods. A stable version of the dataset has also been released through Zenodo. Twitter datasets are structured in three main collections: tweets in Spanish worldwide; geolocated tweets in six Spanish-speaking areas spanning North and Central America (Mexico, Colombia, Ecuador), South America (Argentina, Peru), and Europe (Spain); and geolocated tweets in English and Spanish from the greater Miami area in South Florida.