dbnomics-data-model issueshttps://git.nomics.world/dbnomics/dbnomics-data-model/-/issues2022-12-13T16:28:55Zhttps://git.nomics.world/dbnomics/dbnomics-data-model/-/issues/7Filenames with ':' make the repository unusable on Windows2022-12-13T16:28:55ZGyorgy GyomaiFilenames with ':' make the repository unusable on WindowsWindows path cannot handle ':'. In the current repository e.g. test case 6 contains files with colon in their name. When cloning on a windows machine, the git repo does not check out and becomes unusable.Windows path cannot handle ':'. In the current repository e.g. test case 6 contains files with colon in their name. When cloning on a windows machine, the git repo does not check out and becomes unusable.https://git.nomics.world/dbnomics/dbnomics-data-model/-/issues/4Difficulty to build the datamodel package for Windows2021-07-15T09:01:46ZGyorgy GyomaiDifficulty to build the datamodel package for WindowsWould it be possible to provide the datamodel (and/or) its dependencies as packages for windows machines?
They are hard to import in environments where users have no admin access to their machines, and hence do not have access to window...Would it be possible to provide the datamodel (and/or) its dependencies as packages for windows machines?
They are hard to import in environments where users have no admin access to their machines, and hence do not have access to windows build tools.
Would it be also possible to not use the backports-datetime-fromisoformat package, or at least ignore it when using a 3.7 or higher version of Python to execute the fetchers?Christophe Benzchristophe.benz@nomics.worldChristophe Benzchristophe.benz@nomics.worldhttps://git.nomics.world/dbnomics/dbnomics-data-model/-/issues/3Validate observations stored in a series.jsonl that have an additionnal columns2018-07-02T14:49:36ZConstance de Quatrebarbesconstance.24barbes@jailbreak.parisValidate observations stored in a series.jsonl that have an additionnal columnsIn case you have stored your observations inside a series.jsonl:
* "attributes_values_labels of a dataset.json consists of the aggregation of every observations possible in the series
* and declared as applicable for every series.json w...In case you have stored your observations inside a series.jsonl:
* "attributes_values_labels of a dataset.json consists of the aggregation of every observations possible in the series
* and declared as applicable for every series.json with no distinction between series or on the specific attributes used in the series
Validating the json-data, in case of storing series into a series.jsonl and getting additionnal observations in observations raises the corresponding error:
```
ERROR:/mnt/1tb/dbnomics-json-data/ilo-json-data/EAP_DWA1_SEX_AGE_RT/series.jsonl:220: line has 2 columns but header has 3 columns
```
Dataset EAP_DWA1_SEX_AGE_RT dataset has every different observations_status declared:
```
"attributes_labels": {
"OBSV_STATUS": "Observation Status"
},
"attributes_values_labels": {
"OBSV_STATUS": {
"B": "Break in series",
"E": "Estimate",
"N": "Not available",
"P": "Provisional",
"U": "Unreliable"
}
},
```
- not every series has an observations_status and not every series has every observation_status observed in dataset
- other series have only PERIOD and VALUE and no attribute such as OBSV_STATUS
I suggest:
1. adding at series.json element level the corresponding 'attributes' on the same pattern than dimension if applicable
2. adding a **complementary check** at series level:
- if `attributes` key is present in series element of the jsonl file, script should validate that observations have three columns
- else: observations should have two columns
3. adding a complementary control at series levels: elements stored in the 3d columns should correspond to the list of declared in `attributes`https://git.nomics.world/dbnomics/dbnomics-data-model/-/issues/2Validate code for timeseries2018-07-02T13:51:08ZConstance de Quatrebarbesconstance.24barbes@jailbreak.parisValidate code for timeseriesScript `validate_json_data_repository.py` raises an error for series code
```
'series.88795.code': "'HIGHRT.DMG_POP_65+_NA_NA_NA_NA_MILL.A' does not match '^[-0-9A-Za-z._:@%$]+$'"
AEO2015REF.DMG_POP_65+_NA_NA_NA_NA_MILL.A
```
Regex for ...Script `validate_json_data_repository.py` raises an error for series code
```
'series.88795.code': "'HIGHRT.DMG_POP_65+_NA_NA_NA_NA_MILL.A' does not match '^[-0-9A-Za-z._:@%$]+$'"
AEO2015REF.DMG_POP_65+_NA_NA_NA_NA_MILL.A
```
Regex for validating a series code should include special diacritic caracter `+`
Proposal:
- extend the regex to ```^[-0-9A-Za-z._:@%$\+]+$```
Could you review this proposal @cbenz ?https://git.nomics.world/dbnomics/dbnomics-data-model/-/issues/1missing value should be given a unique code2017-09-29T08:37:53ZMichel Juillardmissing value should be given a unique codeProviders have various ways to represent a missing value. This is a case where we should attempt at homogeneity and we should use a unique code for missing value at the time we import the data from the provider. My preference goes to Na...Providers have various ways to represent a missing value. This is a case where we should attempt at homogeneity and we should use a unique code for missing value at the time we import the data from the provider. My preference goes to NaN.
The plugins, in turn, should convert NaN to whatever is used to represent missing values in a particular software