Handle hierarchical dimensions
This is called hierarchical dimensions in SDMX.
Source data
We can have in source data either:
- a tree with fixed depth (e.g. /continent/country)
- a tree with variable depth (e.g. /path/to/a/node)
Data model
The idea is to encode the dimension tree in dataset.json
, by adding a new key, keeping the existing data as-is.
The dimension tree can either:
- be encoded by a single dimension (e.g. 1 dimension country with continents and countries mixed)
- (not choosed) be spanned on many dimensions (e.g. 1 dimension continent and 1 dimension country)
We choose the first solution.
Example:
# dataset
{
"code": "dataset1",
"dimensions_codes_order": ["SUBJECT", "GEO"],
"dimensions_labels": {
"SUBJECT": "Subject",
"GEO": "Geo"
},
"dimensions_values_labels": {
"SUBJECT": {
"S1": "Subject 1",
"S2": "Subject 2"
},
"GEO": {
"AF": "Africa",
"AL": "Algeria",
"DE": "Germany",
"EU": "Europe",
"FR": "France"
}
},
"dimension_tree": {
"GEO": {
"EU": ["DE", "FR"],
"AF": ["AL"]
}
}
}
# series
{
"code": "S1.FR",
"dimensions": {
"SUBJECT": "S1",
"GEO": "FR"
},
"observations": []
}
Note: depending on the provider data, the intermediary tree nodes ("EU", "AF") can have series or just encode a classification. For example some providers may distribute time series for Europe or Africa, and some others may just distribute series for countries.
Faceted search
Indexation script to Solr has to be updated. A series must know which intermediary nodes of the tree it is classified in.
Example:
# Solr document for a series
{
[...]
"code": "S1.FR",
"type": "series",
"dimension_value_code.SUBJECT": "S1",
"dimension_value_code.GEO": ["FR", "EU"]
}
Question:
-
is it feasible? should the schema be evolved?
UX
The facet widget has to become a tree, visually indented, where:
- each node and leaves is selectionnable via a checkbox
- each node is collapsable/extensible
When selecting a node or a leaf:
- the normal mechanism of facets would be used
- for now we keep displaying the selected node in the select box (and not its full path)
Optimizations:
- lazy DOM nodes creation
Inspiration
- https://www.econdb.com/dataset/NAMQ_10_GDP/gdp-and-main-components-output-expenditure-and-income/
- an example of JSON-stat using tree-like dimensions: https://json-stat.org/samples/hierarchy.json
Tasks
-
encode dimension tree in dataset.json
(cf above) -
propose an evolution of dataset.json
schema in data model -
enhance the indexation script to set multiple values for hierarchical dimensions -
enhance the web API to output this tree when requesting a dataset -
release a new version of the API, deployed under /v22.1
-
enhance the dimension widget in the UI