A Machine-Readable Dataset on U.S Policies in Response to COVID-19 Epidemic

Allison Muir, MHA Operations Manager

The volunteer-driven COVID-19 Data Project is committed to procuring the most reliable data to educate and improve the knowledge that will benefit our future. As the nature of COVID-19 continues to change in the United States, state and local policies change in response. The Policy Track of the Project aims to generate a machine-readable, longitudinal dataset of the COVID-19 policies to educate and inform researchers, policy analysts, and community members.

In April 2020, our policy team adapted the process of a past collaborator, the Temple University Center for Public Health Law Research, to create a crowd-sourced structure for a COVID-19 policy review. Each volunteer is trained to read the documents individually for key identifiers, including name or number and the stated sufficient period. Each document is reviewed to include the following scope criteria categories: stay-at-home requirements; curfew restrictions; mask requirements; public gathering bans; social distancing requirements; operational orders for businesses, restaurants, and schools; medical procedure restrictions; travel constraints; and measures related to correctional facilities. The inclusion of these categories is labeled with binary variables (0 for no inclusion; 1 for inclusion). Ambiguous information, outside of the scope criteria, is also noted.

The documents reviewed include executive orders, health directives, proclamations, and policies related to COVID-19 that were released by U.S. states, territories, counties, and cities. Those released by the states and territories are reviewed within a week of online publication. The list of included cities and counties is expanded monthly. As of January 2021, 144 cities have been reviewed, and 108 have been included. The documents released by cities and counties are reviewed within a month of online publication. When a new city is identified, all applicable COVID-19 documents are reviewed regardless of the effective period or original publication date.

The original documents are published in non-standardized formats. The majority of them are not published in a machine-readable format. The use of "amendments" and "extensions" is not standardized. There is no apparent standard for the frequency of publication, the inclusion of topics, the complexity of the language used, or the detail level.

The Project has created a machine-readable longitudinal dataset and a collection of machine-readable COVID-19 policies that can serve as a resource and tool for researchers, policy analysts, community members, and policymakers. The dataset can be used in correlation with COVID-19 case data to observe the impact of the policies on the virus' spread. Over 100 users have accessed the dataset outside of the Project, and the dataset remains unique in its level of detail and quantity of included documents.

The success of our processes proves that crowd-sourced policy analysis is possible. The work of the Project revealed the need for standardized methods and collaboration by government bodies to improve policies' accessibility. Our Project has highlighted the need for policymakers to collaborate with open data initiatives to improve the accessibility and interoperability of their published documents.