Port fetchers using pipeline v1 to v5
Related to #948 (closed)
Tasks
For all the fetchers concerned (cf section below):
- try it in pre-production
- review with team
- if OK, deploy to production
- remove the fetcher from pre-production to lighten it (put
deploy: false
in the fetcher item infetchers.yml
of pre-production instance)
Start with simple fetchers (non-incremental) to quickly port the majority of fetchers to pipeline v5, then do the more complex ones.
Technical tasks:
-
on pre-prod, do not add schedules during dbnomics-fetcher-ops configure job -
replace deploy: false
infetchers.yml
by removing the item from the list -
remove webhooks from fetcher projects that trigger indexation and validation jobs, in dbnomics-fetcher-ops (there is a TODO) (example)
Fetchers concerned
All fetchers declaring pipeline: v1
in fetchers.yml
Extend the following list to reflect progression:
done
- BCEAO
- BI
- BOJ
- EIA
- FHFA
- NAR
- ELSTAT
- ND_GAIN
- NBS
- ONS
- pole-emploi
- cbo
- dares
- fao
- fh
- ilo
- indec
Incremental fetchers
Some fetchers read data from Git to implement incremental mode. The pipelines v1 and v2 gave access to Git cloned repo to the fetcher, but v5 gives an empty dir. So these fetchers must be ported.
Note: to facilitate the detection of those fetchers, it is advised to do find -name requirements.txt -exec grep dulwich {} +
. However this grep is not enough: it is required to check manually each fetcher.
Checks to be done:
- whether download is incremental: in this case, replace the old date comparaison strategy by reading
FROM_DATETIME
- whether convert is incremental: in this case, simply remove it and convert all the downloaded datasets
Fetchers:
-
Destatis: incremental download using get_last_commit_date
, cf branch821-read-datetime-from-env
-
Eurostat: incremental download -
IMF: incremental download -
INSEE: incremental download using LAST_DOWNLOAD_STARTED_AT
env var, managed by.gitlab-ci.yml
- -> waiting for current developments by @MichelJuillard before passing to v5
-
UNCTAD: incremental download reading existing file in get_old_source_json_dict
Edited by Christophe Benz