Try S3 object storage for converted data
Following #821 (comment 24382)
Description
As a consequence we quickly explore using object storage:
- use object storage to store provider data, both source-data and json-data
- use the TSV representation for the series (we don't have to use JSON-Lines thanks to S3 that tolerates many small files better than file-systems)
- store a provider per bucket or a dataset per bucket, in function of Scaleway limits
- use S3 object versions
- update dbnomics-data-model and dbnomics-api to read from object storage instead of Git repositories
This would be a nice solution with many advantages over Git repositories:
- object storage is cheaper than block storage
- object storage is more secured against data loss
- revisions are implementable in a more efficient way than reading Git history
- more efficient that file-systems: less problems with many small files, or huge files
- no more problems with GitLab server (slow git clones, pushes...)
Tasks
-
sync json-data after pipeline runs to S3 (@eraviart) -
adapt dbnomics-api and dbnomics-data-model to read from S3 (@cbenz ) -
check that dbnomics-solr indexation script works (@cbenz) -
import Git repositories history for source-data and json-data as S3 versions
Before closing issue
-
merge imf-fetcher!8 (merged)
Edited by Christophe Benz