Handle unknown values of series
Related to this technical committee
This is a proposal and should be voted in planification before starting implementing it!
- As a web API
- I want to respond unknown values in series with a unified code
- in order to let the user distinguish unknown values from real values
Acceptance criteria
-
unknown_values=keep
in the query string MUST keep unknown values as is -
unknown_values=convert
in the query string MUST convert unknown values to the unified valueNaN
-
unknown_values=convert:<given_value>
in the query string MUST convert unknown values to the given value -
unknown_values=remove
in the query string MUST remove the(period, value)
pairs for which value is unknown
Resources
Providers have various ways to represent a missing value. We would like to distinguish unknown values from real values.
First, store in the metadata of a series the values that should be interpreted as unknown (e.g. NaN
, N/A
, Null
, -1
, 9999
, etc.). There may be many codes in the same series, due to potential dirty data, so let's use an array.
Example in series.json
:
{
"unknown_values": ["n.a.", ""]
}
The clients, in turn, should convert NaN
to native values of the target language/software.
Example in a real-world dataset: http://datapipes.okfnlabs.org/csv%20-t/html/?url=http://api-next.db.nomics.world/dares/CVS-CJO/fr_categorie-b_milliers_M.tsv
Technical tasks
Edited by Christophe Benz