|
|
# 17-09-29 DB.nomics - Technical Committee
|
|
|
September 29, 2017 16:00-17:00
|
|
|
|
|
|
## Attendees
|
|
|
|
|
|
Christophe Benz, DB.nomics
|
|
|
Thomas Brand, Cepremap
|
|
|
Michel Juillard, Banque de France
|
|
|
Julien Lasselot, Banque de France
|
|
|
Constance de Quatrebarbes, DB.nomics
|
|
|
Johan Richer, DB.nomics
|
|
|
|
|
|
[Meeting preparations](https://git.nomics.world/dbnomics-fetchers/management/issues/22)
|
|
|
|
|
|
## Outstanding issues
|
|
|
Decisions or propositions of solutions
|
|
|
|
|
|
### Can we factorize code for Excel file parsing?
|
|
|
The developer has the last word on this issue. Not a matter treated during Analysis.
|
|
|
|
|
|
### ONS
|
|
|
To be discussed during the next Technical Committee
|
|
|
|
|
|
### Destatis
|
|
|
API : 50€ per year to get access to tables ; 500€ to get access to linear files. Metadata not guaranteed in English.
|
|
|
Questions: do we want to spend this kind of money and in the end have a segment of the database in German?
|
|
|
Decisions:
|
|
|
- Look into the feasability of using just the website (scraping)
|
|
|
- Contact Destatis to know exactly what we get access to by paying 500€ per year.
|
|
|
|
|
|
### What to do with missing and unknown values?
|
|
|
Problem: How do the API know which value should be interpreted as missing or unknown?
|
|
|
Propositions:
|
|
|
- Store in the metadata of a series the values that should be interpreted as 'missing' or 'unknown (e.g. NaN, N/A, Null, -1, 9999, etc.)
|
|
|
- Keep as is (period and symbol of value)
|
|
|
- Convert to a standard value for unknown (e.g. NaN)
|
|
|
- Give value of your choice
|
|
|
- Remove the value missing
|
|
|
|
|
|
### Should we store web pages for categories?
|
|
|
To be decided fetcher by fetcher. The developer has the last word on this issue. Mettre en dur les informations ou les extraire du source (HTML ou fichier).
|
|
|
|
|
|
### Should we use a numbering for `categories_code` like AMECO or let each fetcher choose?
|
|
|
Take number given by provider if existing, or make up one or use label slug.
|
|
|
**Other decision:** abandon readme.md.
|
|
|
|
|
|
### How to standardize `observations.tsv` file header (`YEAR\t???`)?
|
|
|
The header up to the developer for now.
|
|
|
|
|
|
### Is it relevant to store an order for dimensions (ie dataset.json property dimension_keys)?Or should we forget about it and display dimensions by lexicographic order in the UI?
|
|
|
Keep the order of dimensions. When a key exist is in series, use the same order in the key. If not, add most significant dimensions first. The order of dimensions should always be the order of the key. A last possibility is to have no specific dimension order.
|
|
|
|