Christophe Benz · 3f1e766d
--- a/write-a-new-converter.md
+++ b/write-a-new-converter.md
+# Write a new converter
+
+The aim of this page is to describe a *conversion* process from `source_data` to `json_data` free, starting from a dummy `dataset` TSV file.
+
+Categories won't be covered here.
+
+## Source data
+
+Let's consider the following data, in a tsv file:
+
+    Country	ccode	Flow	fcode	year	total
+    France	FR	Import	I	2010	83791
+    France	FR	Import	I	2011	83332
+    France	FR	Import	I	2012	82001
+    Belguim	BE	Import	I	2010	33290
+    Belguim	BE	Import	I	2011	36002
+    Belguim	BE	Import	I	2012	39332
+    Italy	IT	Import	I	2009	78992
+    Italy	IT	Import	I	2010	77300
+    Italy	IT	Import	I	2011	77266
+    Italy	IT	Import	I	2012	89022
+    France	FR	Export	E	2010	23982
+    France	FR	Export	E	2011	23777
+    France	FR	Export	E	2012	24000
+    Belguim	BE	Export	E	2010	13922
+    Belguim	BE	Export	E	2011	13277
+    Belguim	BE	Export	E	2012	14002
+    Italy	IT	Export	E	2009	56299
+    Italy	IT	Export	E	2010	57200
+    Italy	IT	Export	E	2011	59288
+    Italy	IT	Export	E	2012	61300
+
+This dataset contains only 2 dimensions:
+* Country
+* Flow
+
+Fixing values for those dimensions make possible to extract a `series`. For example, the series corresponding to Country = 'Belgium' and Flow = 'Export' is:
+
+    2010	13922
+    2011	13277
+    2012	14002
+
+## Files tree
+
+The output tree for this dataset should be:
+
+    category_name
+    ├── dataset.json
+    ├── Export-Belguim
+    │   ├── observations.tsv
+    │   └── series.json
+    ├── Export-France
+    │   ├── observations.tsv
+    │   └── series.json
+    ├── Export-Italy
+    │   ├── observations.tsv
+    │   └── series.json
+    ├── Import-Belguim
+    │   ├── observations.tsv
+    │   └── series.json
+    ├── Import-France
+    │   ├── observations.tsv
+    │   └── series.json
+    └── Import-Italy
+        ├── observations.tsv
+        └── series.json
+
+=> Remember that this is a part of the total tree produced by a parser; we do not talk about categories here
+
+## DB.nomics vocabulary
+
+For dimensions and values of dimensions ("France" is a value of dimension "Country"), we use `label` and `code` terms.
+* `label`: human readable version
+* `code`: used for indexation (slugified label if no code given by provider)
+
+Note: in DB.nomics, we use `geo` as **code** for "Country" dimension. So "Country" dimension has:
+* `dimension_label`: "Country"
+* `dimension_code`: "geo"
+
+## series.json
+
+The `series.json` file for Country = 'Belgium' and Flow = 'Export' should be:
+
+```json
+{
+  "dimensions": {
+    "geo": "bel",
+    "flow": "E"
+  },
+  "frequency": "A",
+  "key": "ITA.1.0.0.0.ZNAWRU",
+  "name": "Belgium exports"
+}
+```
+
+Notes:
+* "dimensions" is a dict of `dimension_code`: `dimension_value_code`, its aim is to give the list of **values** of dimensions for this series
+
+## observations.tsv
+
+The `observations.tsv` file for this series should be:
+
+```tsv
+    2010	13922
+    2011	13277
+    2012	14002
+```
+
+**TODO**: header
+
+## dataset.json
+
+The dataset's `dataset.json` should be:
+
+> Comments have been added for understanding, despite being invalid JSON.
+
+```json
+{
+  "codelists": {
+    // dimensions_codes: {dimension_value_code, dimension_value_label}
+    "flow": {
+      // dimension_value_code: dimension_value_label
+      "I": "Import",
+      "E": "Export"
+    },
+    "geo": {
+      // dimension_value_code: dimension_value_label
+      "fra": "France",
+      "ita": "Italy",
+      "bel": "Belguim",
+    },
+  },
+  "concepts": {
+    // dimension_code: dimension_label
+    "freq": "Frequency",
+    "geo": "Country",
+    "unit": "Unit"
+  },
+  "dataset_code": "FWTD",
+  "dimension_keys": [
+    // dimensions_codes
+    "freq",
+    "geo",
+    "unit"
+  ],
+  // Human-readeable name dataset name
+  "name": "Employees, full-time equivalents: total economy (National accounts)",
+  "series": [
+    // List of series in this dataset (equal to the list of sub-directories)
+    "ESP.1.0.0.0.FWTD Spain",
+    "FRA.1.0.0.0.FWTD France",
+    "ITA.1.0.0.0.FWTD Italy",
+    "NLD.1.0.0.0.FWTD Netherlands"
+  ]
+}
+```
+
+Notes:
+* *codelists* and *concepts* terms comes from the [SDMX standard](https://en.wikipedia.org/wiki/SDMX)
+* We didn't use the `dimensions_codes` given by provider for dimension "Country" (aka dimension with `dimension_label`="Country"): we used "geo" in DB.nomics
+* We didn't use the `dimensions_values_codes` given by provider for `dimension_values_labels` "France", "Italy" and "Belgium": we used "fra", "ita" and "bel" (not "FR", "IT" and "BE" as given in source file)
+* We used the `dimensions_values_codes` given by provider for "flow" dimension ("I" and "E"). We could have choosen something else.
+
+## In a nutshell
+
+To summarize terms introduced here:
+* a `dimension` (example: "Country") has:
+  * a `dimension_code`: "geo"
+  * a `dimension_label`: "Country"
+* a `dimension_value` (example: "France") has:
+  * a `dimension_value_code`: "FR"
+  * a `dimension_value_label`: "France"