Consume data locally

EPIC: #519

  • As a data consumer
  • I want to consume data locally
  • in order to save bandwidth (and save server resources).

Acceptance criteria

  • a client MUST be able to build a data-frame from a local dataset (downloaded without using the web API)

Description

Problems:

  • Using web API to download a big amount of series uses too much server resources
    • We must find a way to let the clients replicate data locally and work on it without relying on DBnomics infrastructure

Goals:

  • download a single dataset locally

Ideas:

  • load DataFrame from a Git bare repo
    # `./ameco-json-data.git` has been cloned from DBnomics GitLab instance in bare mode
    ameco = DBnomics(provider_dir="./ameco-json-data.git")
    ameco.to_df(dataset="ZUTN")
    ameco.to_df(dataset="ZUTN", series="AUS.1.0.0.0.ZUTN")
    # access revisions
    ameco.to_df(dataset="ZUTN", series="AUS.1.0.0.0.ZUTN", revisions=True)
    ameco.to_df(dataset="ZUTN", series="AUS.1.0.0.0.ZUTN", revision="{SHA1}")
  • instrument git clone step also
    # `./data` is a black box directory
    client = DBnomics(data_dir="./data")
    ameco = client.fetch(provider="AMECO")
    ameco.to_df(dataset="ZUTN", series="AUS.1.0.0.0.ZUTN")

Questions:

  • if the server is restarted while a client is downloading a dataset from the API, using pagination, will the client miss some data pages, or will the request fail?

Tasks

  • ...
Edited Oct 28, 2019 by Christophe Benz
Assignee Loading
Time tracking Loading