BLS - U.S. Bureau of Labor Statistics
- As a system
- I want to acquire data from the BLS
- in order to provide users with these data
Tasks
-
download CSV files for 29 datasets MUST -
acquire meta-data for each dataset MUST -
convert the data in CSV files into DBnomics TSV (the structure of CSV files is nearly identical) MUST -
exploit date of the datafiles to determine whether a dataset needs to be updated SHOULD
Acceptance criteria
-
metadata and data for 29 datasets must be downloaded -
metadata for area, items, footnotes and other specific dimensions must be recorded -
a number of series equal to number of lines of the *series* files minus one (header) must be stored for a dataset -
various errors in the files must be corrected (see fetcher description) -
updates must be completed in the hour following the update by the publisher. Please report if it takes longer for the largest datasets
The above checks don't need to be included in the production code
Resources
- Fetcher description: https://git.nomics.world/dbnomics-fetchers/management/wikis/Fetcher-Description/BLS
-
Existing code:
- beginning of fetcher in new infrastructure: https://git.nomics.world/dbnomics-fetchers/bls-fetcher/blob/master/bls_parser.py
- fetcher in old infrastructure: https://git.nomics.world/dbnomics/dlstats/blob/master/dlstats/fetchers/bls.py
Technical steps
Edited by Christophe Benz