Name		Name	Last commit message	Last commit date
parent directory ..
case-counts		case-counts
hospital-data		hospital-data
parsers		parsers
scripts		scripts
LICENSE		LICENSE
Makefile		Makefile
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
__init__.py		__init__.py
country_codes.csv		country_codes.csv
fit_parameters.json		fit_parameters.json
fit_parameters_1stwave.json		fit_parameters_1stwave.json
generate_data.py		generate_data.py
initialCondition.tsv		initialCondition.tsv
paths.py		paths.py
populationData.tsv		populationData.tsv
sources.json		sources.json

README.md

COVID-19 Scenarios Data

Data preprocessing scripts and preprocessed data storage for COVID-19 Scenarios project

Got questions or suggestions?

Discover

	Simulator	Source code repository	Data directory	Updates

Overview

This directory serves as the source of observational data for covid19_scenarios. It ingests data from a variety of sources listed in sources.json. For each source there is a parser written in python in the directory parsers. The data is stored as tsv files (tab separated values) for each location or country. These tabular files are mainly meant to enable data curation and storage, while the web application needs json files as input.

To run the parsers, call

python generate_data.py --fetch

This will update the tables in the directory case-counts. For each parser there is a separate directory which contains individual case counts for each location covered by the parser.

To only run specific parsers, run

python generate_data.py --fetch --parsers netherlands switzerland

To generate jsons for the app, specific the path the location of the target. This can either be done in combination with updating the tsv files or separately depending on whether the command is run with --fetch or not.

python generate_data.py \ --output-cases path/case-counts.json \ --output-population path/population.json

To generate the integrated scenario json, run

python generate_data.py \ --output-cases path/case-counts.json \ --output-scenarios path/scenarios.json

Contributing and curating data:

Adding parser or case count data for a new region:

The steps to follow are:

Identify a source for case counts data that is updated frequently (at least daily) as outbreak evolves.

Write a script that downloads and converts raw data into a dict of lists of lists {'': [['2020-03-20', 1, 0, ...], ['2020-03-21', 2, 0, ...]]}
- Columns: [time, cases, deaths, hospitalized, ICU, recovered]
Important: cases, deaths, and recovered have to be cumulative counts. The fields hospitalized and ICU should refer to current number of patients.
- The time column must be a string formatted as YYYY-MM-DD
Try to keep the same order of columns for hygiene, although it should not ultimately matter
If data is missing, please leave the entry empty (i.e., ['2020-03-20',1, None, None, ...])
Use the store_data() function in utils to store the data into .tsv automatically
Ensure that the data provided to store_data() is well formatted
- The keys in the datastructure provided to utils should be
  - For countries: U.N. country names (see country_codes.csv), or
  - For states within countries: -, where is the three letter code for the country (see country_codes.csv), and is the state name
- The second parameter is the string identifying your parser (see sources.json entry below)
Place the script into the parsers directory
- The name should correspond to the region name desired in the scenario.
- There must be a function parse() defined that calls store_data() from utils

Update the sources.json file to contain all relevant metadata.

The three fields are:
- primarySource = The URL/path to the raw data
- dataProvenance = The organization behind the data collection
- license = The license governing the usage of data

Test your parser and create a Pull Request

Create the appropriate directory in case-counts/
Test your parser from the directory above (outside your covid19_scenario_data folder) using

python generate_data.py --fetch --parsers <yourparsername>

Check the resulting output in case-counts//, and add the files to your Pull Request together with the parser and sources.json

Add populations data for the additional regions/states.

Case count data is most useful when tied to data on the population it refers to. To ensure new case counts are correctly included in the population presets, add a line to the populationData.tsv for each new region (see Adding/editing population data for a country and/or region below).

Updating/editing case count data for the existing region:

We note that this option is not preferred relative to a script that automatically updates as outlined above. However, if there is no accessible data sources, one can manually enter the data. To do so

Commit a manually entered file into the "manuals" directory

Please use only the U.N. designated name for the country, the file name should be .tsv.

Adding/editing population data for a country and/or region:

As of now all data used to initialize scenarios used by our model is found within populationData.tsv It has the following form:

name populationServed ageDistribution hospitalBeds ICUBeds hemisphere srcPopulation srcHospitalBeds srcICUBeds Switzerland ...

Names: the U.N. designated name found within country_codes.csv
- For a sub-region/city, please prefix the name with the three letter country code of the containing country. See country_codes.csv for the correct letters.
populationServed: a number with the population size
ageDistribution: name of the country the region is within. Must be U.N. designated name
hospitalBeds: number of hospital beds within the region
ICUBeds: number of ICU beds
hemisphere: either 'Northern', 'Southern', or 'Tropical', used to determine parameters for the epidemiology
srcPopulation: string commenting on the source of population data. Ideally a URL or reference to official source. Alternatively, comment on how data was estimated. Must not be empty
srcHospitalBeds: string commenting on the source of hospital bed data. Ideally a URL or reference to official source. Alternatively, comment on how data was estimated. Must not be empty
srcICUBeds: string commenting on the source of ICU bed data. Ideally a URL or reference to official source. Alternatively, comment on how data was estimated. Must not be empty

Quick Start

Run natively

Install the requirements:

pipenv

Then type in your terminal pipenv install This should recreate the required python environment To enter the environment, type pipenv shell

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

README.md

COVID-19 Scenarios Data

Got questions or suggestions?

Discover

Overview

Contents

Country codes

Population data

Case count data

Contributing and curating data:

Adding parser or case count data for a new region:

Identify a source for case counts data that is updated frequently (at least daily) as outbreak evolves.

Update the sources.json file to contain all relevant metadata.

Test your parser and create a Pull Request

Add populations data for the additional regions/states.

Updating/editing case count data for the existing region:

Commit a manually entered file into the "manuals" directory

Adding/editing population data for a country and/or region:

Quick Start

Run natively

Files

data

Directory actions

More options

Directory actions

More options

Latest commit

History

data

Folders and files

parent directory

README.md

COVID-19 Scenarios Data

Got questions or suggestions?

Discover

Overview

Contents

Country codes

Population data

Case count data

Contributing and curating data:

Adding parser or case count data for a new region:

Identify a source for case counts data that is updated frequently (at least daily) as outbreak evolves.

Update the sources.json file to contain all relevant metadata.

Test your parser and create a Pull Request

Add populations data for the additional regions/states.

Updating/editing case count data for the existing region:

Commit a manually entered file into the "manuals" directory

Adding/editing population data for a country and/or region:

Quick Start

Run natively