Commit 13535c10 authored by Pierre Dittgen's avatar Pierre Dittgen

Adding data model description

parent 81369744
Pipeline #101894 failed with stage
in 360 minutes and 5 seconds
......@@ -2,23 +2,49 @@
Download and convert data from NBS.
## Code quality
## Download
### Monthly
- `LATEST36`
## Quarterly
- `LATEST18`
## Annual
- `LATEST20`
\ No newline at end of file
## Download URLs
- monthly
- `LATEST36`
- quarterly
- `LATEST18`
- annual
- `LATEST20`
## Data model
### Source-data
- Downloaded data is divided into 3 different folders:
- `A` for annual data
- `Q` for quarterly data
- `M` for monthly data
- Each folder contain 2 different kind of JSON files:
- category tree node file
- named `{frequency}_nodes_{nodeid}.json` (root nodes are stored in a file called `{frequency}_nodes_root.json` and also in a file called `{frequency}_root_nodes.json` TODO: choose on of two)
- content: children node list of a given node
- for each child node, we get:
- identifier
- name
- parent node id
- is this child node a parent?
- yes: it's a node of the category tree
- no: it's a leaf node describing a dataset
- data node file
- named `{frequency}_datanode_{datasetid}.json`
- content: observation values related to different time series of the dataset
- note: data node file is downloaded only if referenced by a category tree node file
### Convert task
For each frequency folder, convert script:
- read all category tree node files to extract leaf nodes information
- for each leaf node, look for matching data node file
- if found, generate dataset content (folder, dataset.json, \*.tsv files) and keep track of dataset
- then read all category tree node files again to build dbnomics category tree including only previously processed dataset
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment