README.md 8.17 KB
Newer Older
1
# Data Catalogue
Emmanuel Raviart's avatar
Emmanuel Raviart committed
2
3
4
5
6
7

_Fetch, validate, convert & serve CESSDA-compliants DDI repositories._

## Installation

```bash
8
9
10
git clone https://git.nomics.world/progedo/data-catalogue.git
cd data-catalogue/
ln -s example.env .env
Emmanuel Raviart's avatar
Emmanuel Raviart committed
11
12
```

13
14
15
16
17
18
19
20
### Database Creation

#### Using _Debian GNU/Linux_

As `root` user:

```bash
apt install postgresql
David Smadja's avatar
David Smadja committed
21
apt install libpq-dev
22
23
24
25
26
27
28
29
30
31
32
33
34
35
su - postgres
psql
```

#### Using _MacOS_

```bash
brew install postgresql
psql postgres
```

#### For everybody

```sql
36
37
38
CREATE USER data_catalogue WITH PASSWORD 'data_catalogue';
CREATE DATABASE data_catalogue WITH OWNER data_catalogue;
\connect data_catalogue
39
40
41
42
43
CREATE EXTENSION IF NOT EXISTS pg_trgm;
\q
logout # For Debian only
```

44
As normal user, install dependencies:
45
46

```bash
David Smadja's avatar
David Smadja committed
47
npm install
48
49
```

50
As normal user, compile scripts and server middlewares to be able to use them:
51
52

```bash
Emmanuel Raviart's avatar
Emmanuel Raviart committed
53
npm run package
54
55
```

56
57
58
As normal user, create database tables:

```bash
59
60
61
npm run configure
```

Emmanuel Raviart's avatar
Emmanuel Raviart committed
62
63
64
65
66
67
As normal user, compile middlewares to be able to use them:

```bash
pm run build
```

Emmanuel Raviart's avatar
Emmanuel Raviart committed
68
69
## Usage

Emmanuel Raviart's avatar
0.21.0    
Emmanuel Raviart committed
70
71
72
### Updating database when drop-box directory change

```bash
73
node --experimental-specifier-resolution=node -- package/scripts/update_ddis_on_directory_changes.js --output ../public_data/ --verbose adisp
Emmanuel Raviart's avatar
0.21.0    
Emmanuel Raviart committed
74
75
```

76
77
78
### Fetching DDI Files

#### Fetching DDI Files from OAI-PMH Servers
Emmanuel Raviart's avatar
Emmanuel Raviart committed
79

Emmanuel Raviart's avatar
Emmanuel Raviart committed
80
```bash
81
# ADISP (OAI-PMH): Contains every french DDIs
Emmanuel Raviart's avatar
Emmanuel Raviart committed
82
# node --experimental-specifier-resolution=node -- package/scripts/retrieve_oai-pmh_ddis.js --url http://www.progedo-adisp.fr/oai/oai2.php ../public_data/adisp-oai-pmh-ddi/
83
84
85
86
87
88
```

#### Fetching DDI Files from Dataverse Servers

```bash
# data.sciencespo
89
node --experimental-specifier-resolution=node -- package/scripts/retrieve_dataverse_ddis.js --tree cdsp --url https://data.sciencespo.fr/ --verbose ../public_data/sciencespo-dataverse-ddi/
90
91
92
93
94
95
```

#### Fetching DDI Files from Nesstar Servers

```bash
# ADISP (public Nesstar) : Contains some French & English DDIs
Emmanuel Raviart's avatar
Emmanuel Raviart committed
96
# node --experimental-specifier-resolution=node package/scripts/retrieve_nesstar_ddis.js --url http://nesstar.progedo-adisp.fr/ ../public_data/adisp-nesstar-ddi/
97
# CDSP Sciences Po (obsolete & closed)
98
# node --experimental-specifier-resolution=node package/scripts/retrieve_nesstar_ddis.js --url http://nesstar.sciences-po.fr/ ../public_data/cdsp-nesstar-ddi/
99
# INED
Emmanuel Raviart's avatar
Emmanuel Raviart committed
100
# node --experimental-specifier-resolution=node package/scripts/retrieve_nesstar_ddis.js --url http://nesstar.ined.fr/ ../public_data/ined-nesstar-ddi/
101
# INED - Generations and Gender Survey
102
node --experimental-specifier-resolution=node package/scripts/retrieve_nesstar_ddis.js --url http://ggpsurvey.ined.fr/ ../public_data/ined-gpgsurvey-nesstar-ddi/
103
# UK Data Service
104
node --experimental-specifier-resolution=node package/scripts/retrieve_nesstar_ddis.js --url http://nesstar.ukdataservice.ac.uk/ ../public_data/ukdataservice-nesstar-ddi/
105
# Norwegian Centre for Research Data
106
node --experimental-specifier-resolution=node package/scripts/retrieve_nesstar_ddis.js --url http://nsddata.nsd.uib.no ../public_data/nsddata-nesstar-ddi/
107
108
```

109
### Repairing DDI Files
110
111

```bash
112
# ADISP
113
114
node --experimental-specifier-resolution=node package/scripts/repair_adisp_oai-pmh_ddis.js --source=../public_data/adisp-oai-pmh-ddi/ ../public_data/adisp-oai-pmh-ddi-repaired/
node --experimental-specifier-resolution=node package/scripts/repair_adisp_nesstar_ddis.js --source=../public_data/adisp-nesstar-ddi/ ../public_data/adisp-nesstar-ddi-repaired/
Emmanuel Raviart's avatar
Emmanuel Raviart committed
115
# node --experimental-specifier-resolution=node package/scripts/repair_ined_nesstar_ddis.js --source=../public_data/ined-nesstar-ddi/ ../public_data/ined-nesstar-ddi-repaired/
116
117
```

118
119
120
### Indexing DDI files

#### Indexing Progedo DDI Files
121
122

```bash
123
124
125
# node --experimental-specifier-resolution=node -- package/scripts/index_codebooks.js --language=fr --path=adisp ../public_data/adisp-oai-pmh-ddi-repaired/
node --experimental-specifier-resolution=node -- package/scripts/index_codebooks.js --language=fr --path=adisp ../public_data/adisp-ddis-repaired/
node --experimental-specifier-resolution=node -- package/scripts/index_codebooks.js --language=fr --path=cdsp ../public_data/sciencespo-dataverse-ddi/
126
127
# node --experimental-specifier-resolution=node -- package/scripts/index_codebooks.js --language=fr --path=ined ../public_data/ined-nesstar-ddi-repaired/
node --experimental-specifier-resolution=node -- package/scripts/index_codebooks.js --language=fr --path=ined ../public_data/ined-manual-ddi/
Emmanuel Raviart's avatar
Emmanuel Raviart committed
128
```
Emmanuel Raviart's avatar
Emmanuel Raviart committed
129

130
#### Indexing Other (non Progedo-related) DDI Files
131
132

```bash
133
134
node --experimental-specifier-resolution=node -- package/scripts/index_codebooks.js --language=fr --path=adisp-nesstar ../public_data/adisp-nesstar-ddi-repaired/
node --experimental-specifier-resolution=node -- package/scripts/index_codebooks.js --language=fr --path=cdsp ../public_data/sciencespo-dataverse-ddi/
135
# Obsolete
136
137
138
139
# node --experimental-specifier-resolution=node -- package/scripts/index_codebooks.js --language=fr --path=cdsp-obsolete ../public_data/cdsp-nesstar-ddi/
node --experimental-specifier-resolution=node -- package/scripts/index_codebooks.js --language=fr --path=ined/gpgsurvey ../public_data/ined-gpgsurvey-nesstar-ddi/
node --experimental-specifier-resolution=node -- package/scripts/index_codebooks.js --language=en --path=ukdataservice ../public_data/ukdataservice-nesstar-ddi/
node --experimental-specifier-resolution=node -- package/scripts/index_codebooks.js --language=no --path=nsddata ../public_data/nsddata-nesstar-ddi/
Emmanuel Raviart's avatar
Emmanuel Raviart committed
140

141
142
```

143
### Extracting Words from CodeBooks for Autocompletion
144
145

```bash
146
node --experimental-specifier-resolution=node -- package/scripts/index_words.js
147
148
```

149
150
## Development

151
152
153
154
155
### Extracting TypeScript Raw Types from DDI Files

#### Extracting TypeScript Raw Types from Progedo DDI Files

```bash
156
node --experimental-specifier-resolution=node --max-old-space-size=10240 -- package/scripts/raw_types_from_ddi_files.js ../public_data/adisp-oai-pmh-ddi-repaired/ ../public_data/sciencespo-dataverse-ddi/ ../public_data/ined-nesstar-ddi/
157

158
159
160
node --experimental-specifier-resolution=node --max-old-space-size=10240 -- package/scripts/raw_types_from_ddi_files.js ../public_data/adisp-oai-pmh-ddi-repaired/ ../public_data/sciencespo-dataverse-ddi/ ../public_data/ined-nesstar-ddi/ --version=1.2.2
node --experimental-specifier-resolution=node --max-old-space-size=10240 -- package/scripts/raw_types_from_ddi_files.js ../public_data/adisp-oai-pmh-ddi-repaired/ ../public_data/sciencespo-dataverse-ddi/ ../public_data/ined-nesstar-ddi/ --version=1.3
node --experimental-specifier-resolution=node --max-old-space-size=10240 -- package/scripts/raw_types_from_ddi_files.js ../public_data/adisp-oai-pmh-ddi-repaired/ ../public_data/sciencespo-dataverse-ddi/ ../public_data/ined-nesstar-ddi/ --version=2.5
Emmanuel Raviart's avatar
Emmanuel Raviart committed
161

162
# Prettify generated TypeScript file:
Emmanuel Raviart's avatar
Emmanuel Raviart committed
163
npm run format
164
```
Emmanuel Raviart's avatar
Emmanuel Raviart committed
165

166
#### Extracting TypeScript Raw Types for Other Tests
167

Emmanuel Raviart's avatar
Emmanuel Raviart committed
168
```bash
169
170
171
172
node --experimental-specifier-resolution=node --max-old-space-size=8192 -- package/scripts/raw_types_from_ddi_files.js ../public_data/adisp-manual-ddi/ --target=src/raw_types/codebooks_adisp_manual.ts
node --experimental-specifier-resolution=node --max-old-space-size=8192 -- package/scripts/raw_types_from_ddi_files.js ../public_data/adisp-nesstar-ddi/ --target=src/raw_types/codebooks_adisp_nesstar.ts
node --experimental-specifier-resolution=node -- package/scripts/raw_types_from_ddi_files.js ../public_data/sciencespo-dataverse-ddi/ --target=src/raw_types/codebooks_sciencespo_dataverse.ts
node --experimental-specifier-resolution=node -- package/scripts/raw_types_from_ddi_files.js ../public_data/ined-nesstar-ddi/ --target=src/raw_types/codebooks_ined_nesstar.ts
Emmanuel Raviart's avatar
Emmanuel Raviart committed
173
174

# Prettify generated TypeScript files:
Emmanuel Raviart's avatar
Emmanuel Raviart committed
175
npm run format
Emmanuel Raviart's avatar
Emmanuel Raviart committed
176
```
177
178
179
180
181
182
183
184
185
186
187
188

### Generating a CSV file of all organizations

```sql
COPY (
  SELECT name, acronym, relation
  FROM organizations
  INNER JOIN study_organization_associations
    ON organizations.name = study_organization_associations.organization_name
  GROUP BY name, acronym, relation
  ORDER BY name, relation
) TO '/tmp/organizations.csv' DELIMITER ',' CSV HEADER;
189
190
191
192
193
194
195
196
197

COPY (
  SELECT name, acronym, relation, study_path
  FROM organizations
  INNER JOIN study_organization_associations
    ON organizations.name = study_organization_associations.organization_name
  GROUP BY name, acronym, relation, study_path
  ORDER BY name, relation
) TO '/tmp/organizations_with_studies.csv' DELIMITER ',' CSV HEADER;
198
```