Ensure commits consistency for datasets
- As a data consumer
- I want to be sure each series of a dataset comes from the same revision on provider side
- in order to be confident about data.
Acceptance criteria
-
MUST
Analysis
- perfect consistency can be impossible to achieve if the provider does not offer snapshots, because the download process takes time, and between start and end, the database of the provider can change
- if at least one series fails when downloading/converting a dataset, then rollback the entire dataset
- find a way to restart the download automatically for the failed datasets only, after some waiting time, to avoid having to wait for the next schedule
- this waiting time may be configurable, because for some providers we may be banned
- ability to tell "if 3 datasets are skipped because of error", quit the job with failure status
- example: NBS are very severe and block our IP if we insist downloading after some datasets have been blocked
Not related but important as well (may be another issue):
- if the delay between 2 downloads is too long then we may miss a revision on the provider side...
Tasks
Edited by Christophe Benz