Validate observations stored in a series.jsonl that have an additionnal columns
In case you have stored your observations inside a series.jsonl:
- "attributes_values_labels of a dataset.json consists of the aggregation of every observations possible in the series
- and declared as applicable for every series.json with no distinction between series or on the specific attributes used in the series
Validating the json-data, in case of storing series into a series.jsonl and getting additionnal observations in observations raises the corresponding error:
ERROR:/mnt/1tb/dbnomics-json-data/ilo-json-data/EAP_DWA1_SEX_AGE_RT/series.jsonl:220: line has 2 columns but header has 3 columns
Dataset EAP_DWA1_SEX_AGE_RT dataset has every different observations_status declared:
"attributes_labels": {
"OBSV_STATUS": "Observation Status"
},
"attributes_values_labels": {
"OBSV_STATUS": {
"B": "Break in series",
"E": "Estimate",
"N": "Not available",
"P": "Provisional",
"U": "Unreliable"
}
},
- not every series has an observations_status and not every series has every observation_status observed in dataset
- other series have only PERIOD and VALUE and no attribute such as OBSV_STATUS
I suggest:
- adding at series.json element level the corresponding 'attributes' on the same pattern than dimension if applicable
- adding a complementary check at series level:
- if
attributes
key is present in series element of the jsonl file, script should validate that observations have three columns - else: observations should have two columns
- adding a complementary control at series levels: elements stored in the 3d columns should correspond to the list of declared in
attributes