Create a data storage library

Part of #554

Related to #818

Description

Goals:

  • abstract source code of fetchers and services (Web API, indexation script, Python client) from storage specificities, letting them manipulate a domain-level data model instead of a storage-level one
  • concentrate the source code handling data storage at one place
  • improve the documentation of the storage model, with all its variants
    • git+tsv, git+jsonl, filesystem+tsv, filesystem+jsonl

Features:

  • serialize and deserialize data model instances to many backend storages (e.g. git+tsv, git+jsonl)
  • provider capabilities like accessing past revisions

Use cases:

  • convert script of fetchers write data model instances to storage
  • indexation script and web API read data model instances from storage
  • simplify towards one storage model
  • get rid of bare repositories by converting to git+jsonl the repositories that can't be checked-out because of the too many tsv files

Tasks

  • create a new Python package: dbnomics-storage
  • move and adapt parts of dbnomics_data_model.storage.* to new package
  • adapt clients of dbnomics_data_model.storage to new package
  • remove dbnomics_data_model.storage
Edited Nov 18, 2020 by Christophe Benz
Assignee Loading
Time tracking Loading