|
|
# Write a new converter
|
|
|
|
|
|
The aim of this page is to describe a *conversion* process from `source_data` to `json_data` free, starting from a dummy `dataset` TSV file.
|
|
|
|
|
|
Categories won't be covered here.
|
|
|
|
|
|
## Source data
|
|
|
|
|
|
Let's consider the following data, in a tsv file:
|
|
|
|
|
|
Country ccode Flow fcode year total
|
|
|
France FR Import I 2010 83791
|
|
|
France FR Import I 2011 83332
|
|
|
France FR Import I 2012 82001
|
|
|
Belguim BE Import I 2010 33290
|
|
|
Belguim BE Import I 2011 36002
|
|
|
Belguim BE Import I 2012 39332
|
|
|
Italy IT Import I 2009 78992
|
|
|
Italy IT Import I 2010 77300
|
|
|
Italy IT Import I 2011 77266
|
|
|
Italy IT Import I 2012 89022
|
|
|
France FR Export E 2010 23982
|
|
|
France FR Export E 2011 23777
|
|
|
France FR Export E 2012 24000
|
|
|
Belguim BE Export E 2010 13922
|
|
|
Belguim BE Export E 2011 13277
|
|
|
Belguim BE Export E 2012 14002
|
|
|
Italy IT Export E 2009 56299
|
|
|
Italy IT Export E 2010 57200
|
|
|
Italy IT Export E 2011 59288
|
|
|
Italy IT Export E 2012 61300
|
|
|
|
|
|
This dataset contains only 2 dimensions:
|
|
|
* Country
|
|
|
* Flow
|
|
|
|
|
|
Fixing values for those dimensions make possible to extract a `series`. For example, the series corresponding to Country = 'Belgium' and Flow = 'Export' is:
|
|
|
|
|
|
2010 13922
|
|
|
2011 13277
|
|
|
2012 14002
|
|
|
|
|
|
## Files tree
|
|
|
|
|
|
The output tree for this dataset should be:
|
|
|
|
|
|
category_name
|
|
|
├── dataset.json
|
|
|
├── Export-Belguim
|
|
|
│ ├── observations.tsv
|
|
|
│ └── series.json
|
|
|
├── Export-France
|
|
|
│ ├── observations.tsv
|
|
|
│ └── series.json
|
|
|
├── Export-Italy
|
|
|
│ ├── observations.tsv
|
|
|
│ └── series.json
|
|
|
├── Import-Belguim
|
|
|
│ ├── observations.tsv
|
|
|
│ └── series.json
|
|
|
├── Import-France
|
|
|
│ ├── observations.tsv
|
|
|
│ └── series.json
|
|
|
└── Import-Italy
|
|
|
├── observations.tsv
|
|
|
└── series.json
|
|
|
|
|
|
=> Remember that this is a part of the total tree produced by a parser; we do not talk about categories here
|
|
|
|
|
|
## DB.nomics vocabulary
|
|
|
|
|
|
For dimensions and values of dimensions ("France" is a value of dimension "Country"), we use `label` and `code` terms.
|
|
|
* `label`: human readable version
|
|
|
* `code`: used for indexation (slugified label if no code given by provider)
|
|
|
|
|
|
Note: in DB.nomics, we use `geo` as **code** for "Country" dimension. So "Country" dimension has:
|
|
|
* `dimension_label`: "Country"
|
|
|
* `dimension_code`: "geo"
|
|
|
|
|
|
## series.json
|
|
|
|
|
|
The `series.json` file for Country = 'Belgium' and Flow = 'Export' should be:
|
|
|
|
|
|
```json
|
|
|
{
|
|
|
"dimensions": {
|
|
|
"geo": "bel",
|
|
|
"flow": "E"
|
|
|
},
|
|
|
"frequency": "A",
|
|
|
"key": "ITA.1.0.0.0.ZNAWRU",
|
|
|
"name": "Belgium exports"
|
|
|
}
|
|
|
```
|
|
|
|
|
|
Notes:
|
|
|
* "dimensions" is a dict of `dimension_code`: `dimension_value_code`, its aim is to give the list of **values** of dimensions for this series
|
|
|
|
|
|
## observations.tsv
|
|
|
|
|
|
The `observations.tsv` file for this series should be:
|
|
|
|
|
|
```tsv
|
|
|
2010 13922
|
|
|
2011 13277
|
|
|
2012 14002
|
|
|
```
|
|
|
|
|
|
**TODO**: header
|
|
|
|
|
|
## dataset.json
|
|
|
|
|
|
The dataset's `dataset.json` should be:
|
|
|
|
|
|
> Comments have been added for understanding, despite being invalid JSON.
|
|
|
|
|
|
```json
|
|
|
{
|
|
|
"codelists": {
|
|
|
// dimensions_codes: {dimension_value_code, dimension_value_label}
|
|
|
"flow": {
|
|
|
// dimension_value_code: dimension_value_label
|
|
|
"I": "Import",
|
|
|
"E": "Export"
|
|
|
},
|
|
|
"geo": {
|
|
|
// dimension_value_code: dimension_value_label
|
|
|
"fra": "France",
|
|
|
"ita": "Italy",
|
|
|
"bel": "Belguim",
|
|
|
},
|
|
|
},
|
|
|
"concepts": {
|
|
|
// dimension_code: dimension_label
|
|
|
"freq": "Frequency",
|
|
|
"geo": "Country",
|
|
|
"unit": "Unit"
|
|
|
},
|
|
|
"dataset_code": "FWTD",
|
|
|
"dimension_keys": [
|
|
|
// dimensions_codes
|
|
|
"freq",
|
|
|
"geo",
|
|
|
"unit"
|
|
|
],
|
|
|
// Human-readeable name dataset name
|
|
|
"name": "Employees, full-time equivalents: total economy (National accounts)",
|
|
|
"series": [
|
|
|
// List of series in this dataset (equal to the list of sub-directories)
|
|
|
"ESP.1.0.0.0.FWTD Spain",
|
|
|
"FRA.1.0.0.0.FWTD France",
|
|
|
"ITA.1.0.0.0.FWTD Italy",
|
|
|
"NLD.1.0.0.0.FWTD Netherlands"
|
|
|
]
|
|
|
}
|
|
|
```
|
|
|
|
|
|
Notes:
|
|
|
* *codelists* and *concepts* terms comes from the [SDMX standard](https://en.wikipedia.org/wiki/SDMX)
|
|
|
* We didn't use the `dimensions_codes` given by provider for dimension "Country" (aka dimension with `dimension_label`="Country"): we used "geo" in DB.nomics
|
|
|
* We didn't use the `dimensions_values_codes` given by provider for `dimension_values_labels` "France", "Italy" and "Belgium": we used "fra", "ita" and "bel" (not "FR", "IT" and "BE" as given in source file)
|
|
|
* We used the `dimensions_values_codes` given by provider for "flow" dimension ("I" and "E"). We could have choosen something else.
|
|
|
|
|
|
## In a nutshell
|
|
|
|
|
|
To summarize terms introduced here:
|
|
|
* a `dimension` (example: "Country") has:
|
|
|
* a `dimension_code`: "geo"
|
|
|
* a `dimension_label`: "Country"
|
|
|
* a `dimension_value` (example: "France") has:
|
|
|
* a `dimension_value_code`: "FR"
|
|
|
* a `dimension_value_label`: "France" |