# Data Catalogue _Fetch, validate, convert & serve CESSDA-compliants DDI repositories._ ## Installation ```bash git clone https://git.nomics.world/progedo/data-catalogue.git cd data-catalogue/ ln -s example.env .env ``` ### Database Creation #### Using _Debian GNU/Linux_ As `root` user: ```bash apt install postgresql apt install libpq-dev su - postgres psql ``` #### Using _MacOS_ ```bash brew install postgresql psql postgres ``` #### For everybody ```sql CREATE USER data_catalogue WITH PASSWORD 'data_catalogue'; CREATE DATABASE data_catalogue WITH OWNER data_catalogue; \connect data_catalogue CREATE EXTENSION IF NOT EXISTS pg_trgm; \q logout # For Debian only ``` As normal user, install dependencies: ```bash npm install ``` As normal user, compile scripts and server middlewares to be able to use them: ```bash npm run package ``` As normal user, create database tables: ```bash npm run configure ``` As normal user, compile middlewares to be able to use them: ```bash pm run build ``` ## Usage ### Updating database when drop-box directory change ```bash node --experimental-specifier-resolution=node -- package/scripts/update_ddis_on_directory_changes.js --output ../public_data/ --verbose adisp ``` ### Fetching DDI Files #### Fetching DDI Files from OAI-PMH Servers ```bash # ADISP (OAI-PMH): Contains every french DDIs # node --experimental-specifier-resolution=node -- package/scripts/retrieve_oai-pmh_ddis.js --url http://www.progedo-adisp.fr/oai/oai2.php ../public_data/adisp-oai-pmh-ddi/ ``` #### Fetching DDI Files from Dataverse Servers ```bash # data.sciencespo node --experimental-specifier-resolution=node -- package/scripts/retrieve_dataverse_ddis.js --tree cdsp --url https://data.sciencespo.fr/ --verbose ../public_data/sciencespo-dataverse-ddi/ ``` #### Fetching DDI Files from Nesstar Servers ```bash # ADISP (public Nesstar) : Contains some French & English DDIs # node --experimental-specifier-resolution=node package/scripts/retrieve_nesstar_ddis.js --url http://nesstar.progedo-adisp.fr/ ../public_data/adisp-nesstar-ddi/ # CDSP Sciences Po (obsolete & closed) # node --experimental-specifier-resolution=node package/scripts/retrieve_nesstar_ddis.js --url http://nesstar.sciences-po.fr/ ../public_data/cdsp-nesstar-ddi/ # INED # node --experimental-specifier-resolution=node package/scripts/retrieve_nesstar_ddis.js --url http://nesstar.ined.fr/ ../public_data/ined-nesstar-ddi/ # INED - Generations and Gender Survey node --experimental-specifier-resolution=node package/scripts/retrieve_nesstar_ddis.js --url http://ggpsurvey.ined.fr/ ../public_data/ined-gpgsurvey-nesstar-ddi/ # UK Data Service node --experimental-specifier-resolution=node package/scripts/retrieve_nesstar_ddis.js --url http://nesstar.ukdataservice.ac.uk/ ../public_data/ukdataservice-nesstar-ddi/ # Norwegian Centre for Research Data node --experimental-specifier-resolution=node package/scripts/retrieve_nesstar_ddis.js --url http://nsddata.nsd.uib.no ../public_data/nsddata-nesstar-ddi/ ``` ### Repairing DDI Files ```bash # ADISP node --experimental-specifier-resolution=node package/scripts/repair_adisp_oai-pmh_ddis.js --source=../public_data/adisp-oai-pmh-ddi/ ../public_data/adisp-oai-pmh-ddi-repaired/ node --experimental-specifier-resolution=node package/scripts/repair_adisp_nesstar_ddis.js --source=../public_data/adisp-nesstar-ddi/ ../public_data/adisp-nesstar-ddi-repaired/ # node --experimental-specifier-resolution=node package/scripts/repair_ined_nesstar_ddis.js --source=../public_data/ined-nesstar-ddi/ ../public_data/ined-nesstar-ddi-repaired/ ``` ### Indexing DDI files #### Indexing Progedo DDI Files ```bash # node --experimental-specifier-resolution=node -- package/scripts/index_codebooks.js --language=fr --path=adisp ../public_data/adisp-oai-pmh-ddi-repaired/ node --experimental-specifier-resolution=node -- package/scripts/index_codebooks.js --language=fr --path=adisp ../public_data/adisp-ddis-repaired/ node --experimental-specifier-resolution=node -- package/scripts/index_codebooks.js --language=fr --path=cdsp ../public_data/sciencespo-dataverse-ddi/ # node --experimental-specifier-resolution=node -- package/scripts/index_codebooks.js --language=fr --path=ined ../public_data/ined-nesstar-ddi-repaired/ node --experimental-specifier-resolution=node -- package/scripts/index_codebooks.js --language=fr --path=ined ../public_data/ined-manual-ddi/ ``` #### Indexing Other (non Progedo-related) DDI Files ```bash node --experimental-specifier-resolution=node -- package/scripts/index_codebooks.js --language=fr --path=adisp-nesstar ../public_data/adisp-nesstar-ddi-repaired/ node --experimental-specifier-resolution=node -- package/scripts/index_codebooks.js --language=fr --path=cdsp ../public_data/sciencespo-dataverse-ddi/ # Obsolete # node --experimental-specifier-resolution=node -- package/scripts/index_codebooks.js --language=fr --path=cdsp-obsolete ../public_data/cdsp-nesstar-ddi/ node --experimental-specifier-resolution=node -- package/scripts/index_codebooks.js --language=fr --path=ined/gpgsurvey ../public_data/ined-gpgsurvey-nesstar-ddi/ node --experimental-specifier-resolution=node -- package/scripts/index_codebooks.js --language=en --path=ukdataservice ../public_data/ukdataservice-nesstar-ddi/ node --experimental-specifier-resolution=node -- package/scripts/index_codebooks.js --language=no --path=nsddata ../public_data/nsddata-nesstar-ddi/ ``` ### Extracting Words from CodeBooks for Autocompletion ```bash node --experimental-specifier-resolution=node -- package/scripts/index_words.js ``` ## Development ### Extracting TypeScript Raw Types from DDI Files #### Extracting TypeScript Raw Types from Progedo DDI Files ```bash node --experimental-specifier-resolution=node --max-old-space-size=10240 -- package/scripts/raw_types_from_ddi_files.js ../public_data/adisp-oai-pmh-ddi-repaired/ ../public_data/sciencespo-dataverse-ddi/ ../public_data/ined-nesstar-ddi/ node --experimental-specifier-resolution=node --max-old-space-size=10240 -- package/scripts/raw_types_from_ddi_files.js ../public_data/adisp-oai-pmh-ddi-repaired/ ../public_data/sciencespo-dataverse-ddi/ ../public_data/ined-nesstar-ddi/ --version=1.2.2 node --experimental-specifier-resolution=node --max-old-space-size=10240 -- package/scripts/raw_types_from_ddi_files.js ../public_data/adisp-oai-pmh-ddi-repaired/ ../public_data/sciencespo-dataverse-ddi/ ../public_data/ined-nesstar-ddi/ --version=1.3 node --experimental-specifier-resolution=node --max-old-space-size=10240 -- package/scripts/raw_types_from_ddi_files.js ../public_data/adisp-oai-pmh-ddi-repaired/ ../public_data/sciencespo-dataverse-ddi/ ../public_data/ined-nesstar-ddi/ --version=2.5 # Prettify generated TypeScript file: npm run format ``` #### Extracting TypeScript Raw Types for Other Tests ```bash node --experimental-specifier-resolution=node --max-old-space-size=8192 -- package/scripts/raw_types_from_ddi_files.js ../public_data/adisp-manual-ddi/ --target=src/raw_types/codebooks_adisp_manual.ts node --experimental-specifier-resolution=node --max-old-space-size=8192 -- package/scripts/raw_types_from_ddi_files.js ../public_data/adisp-nesstar-ddi/ --target=src/raw_types/codebooks_adisp_nesstar.ts node --experimental-specifier-resolution=node -- package/scripts/raw_types_from_ddi_files.js ../public_data/sciencespo-dataverse-ddi/ --target=src/raw_types/codebooks_sciencespo_dataverse.ts node --experimental-specifier-resolution=node -- package/scripts/raw_types_from_ddi_files.js ../public_data/ined-nesstar-ddi/ --target=src/raw_types/codebooks_ined_nesstar.ts # Prettify generated TypeScript files: npm run format ``` ### Generating a CSV file of all organizations ```sql COPY ( SELECT name, acronym, relation FROM organizations INNER JOIN study_organization_associations ON organizations.name = study_organization_associations.organization_name GROUP BY name, acronym, relation ORDER BY name, relation ) TO '/tmp/organizations.csv' DELIMITER ',' CSV HEADER; COPY ( SELECT name, acronym, relation, study_path FROM organizations INNER JOIN study_organization_associations ON organizations.name = study_organization_associations.organization_name GROUP BY name, acronym, relation, study_path ORDER BY name, relation ) TO '/tmp/organizations_with_studies.csv' DELIMITER ',' CSV HEADER; ```