...
 
Commits (2)
root = true
[*]
charset = utf-8
end_of_line = lf
insert_final_newline = true
trim_trailing_whitespace = true
[*.py]
max_line_length = 88
indent_style = space
indent_size = 4
[*.{rst,ini,cfg}]
indent_style = space
indent_size = 4
[*.{md,yml}]
indent_style = space
indent_size = 2
# Base directory where all DBnomics data is stored
JSON_DATA_BASE_DIR=
# Don't check auto-generated stuff into git
coverage
build
dist
generated
node_modules
generated
stats.json
yarn-error.log
wallay.config.js
# Cruft
.DS_Store
npm-debug.log
.idea
jest.ide.config
.keys
.githash
tmp
.cache
json-data
.env
......@@ -2,14 +2,12 @@
Docker Compose stack for DBnomics development environment
## Prerequisite
Get a Linux computer with a running [docker](https://docs.docker.com/install/) environment.
Install [docker-compose](https://docs.docker.com/compose/install/) version >= 1.24
## Install
Clone the repo on your local computer:
......@@ -17,113 +15,93 @@ Clone the repo on your local computer:
```
$ git clone https://git.nomics.world/dbnomics/dbnomics-docker.git
$ cd dbnomics-docker
$ chmod +x bin/dbnomics-dev*
```
## Usage
3 scripts will allow to graphically test, via a dbnomics-website locale instance, generated json-data.
Testing will be made in 3 steps:
* upload of the environment (solr, api, website)
* loading of data, one can load many provider data many times
* teardown of the environment
## Configure
Beforehand, you managed to get one (or many) reachable provider's json data folders on your computer.
### Upload
For example, let's say you want to work with data from [AMECO](https://db.nomics.world/AMECO) and [RBA](https://db.nomics.world/RBA) providers.
```bash
mkdir -p ~/dbnomics/dbnomics-json-data
cd ~/dbnomics/dbnomics-json-data
git clone https://git.nomics.world/dbnomics-json-data/ameco-json-data.git
git clone https://git.nomics.world/dbnomics-json-data/rba-json-data.git
```
$ dbnomics-docker/bin/dbnomics-dev-up
```
Synopsis:
* creates one docker volume for solr config
* creates one docker volume for json-data
* runs `solr`, available under http://localhost:18983
* runs `dbnomics-api` server, available under http://localhost:15000
* runs `dbnomics-website` server, available under http://localhost:13000
At this time, all servers are installed but with empty index and data.
You can now check `dbnomics-website`.
NB:
* wait for standard output to stop scrolling, before loading data.
* first launch will take really more time than next ones due to the building and downloading of the docker images
## Load and test your json data
So you have this file tree:
```
$ dbnomics-docker/bin/dbnomics-dev-load :somewhere/:provider-json-data
$HOME
|- dbnomics
|- dbnomics-json-data <= this is JSON_DATA_BASE_DIR
|- ameco-json-data
|- rba-json-data
```
Beforehand, you managed to get one (or many) reachable provider's json data folders on your computer.
Just launch previous command, it will load data to the json-data docker volume, shares by `solr` and `dbnomics-api` and run `dbnomics-importer` script.
Copy `.env.example` to `.env` and define the environment variable:
Check solr logs and dbnomics-dev-load output, and launch your favorite browser to http://localhost:13000, you may see your data !
```env
JSON_DATA_BASE_DIR=~/dbnomics/dbnomics-json-data
```
You can can repeat previous operations for many other provider's data, loaded data are kept until dbnomics-dev is stopped.
## Launch Docker stack
You can even, rerun the loader for the same provider but data are copied over previous ones.
To launch a local instance of DBnomics:
```bash
docker-compose up -d
```
### Teardown
Note: first launch will take really more time than next ones due to the building and downloading of the Docker images.
To access logs:
```
$ dbnomics-docker/bin/dbnomics-dev-down
```bash
docker-compose logs
```
This is the end, servers are shut down, dokers containers are removed, and volumes are dropped.
## Index DBnomics data
### Recipe
By default, the DBnomics instance contains no data.
First terminal:
Data indexation is required for full-text search and faceted search (used when searching by dimension).
Data has to follow DBnomics data model, and be located under the `JSON_DATA_BASE_DIR` you configured in `.env` (cf above).
```
$ ~/dbnomics-docker/bin/dbnomics-dev-up
```
Data is indexed provider per provider:
Second terminal:
```bash
./index-provider.sh /path/to/provider-json-data
# Example:
./index-provider.sh ~/dbnomics/dbnomics-json-data/rba-json-data
```
$ git clone https://git.nomics.world/dbnomics-json-data/dares-json-data.git
$ ~/dbnomics-docker/bin/dbnomics-dev-load dares-json-data
$ git clone https://git.nomics.world/dbnomics-json-data/oecd-json-data.git
$ ~/dbnomics-docker/bin/dbnomics-dev-load oecd-json-data
```
Launch your browser to localhost:13000
Very late in the nigth:
You can repeat the previous operation for many other provider's data, or rerun it for the same provider if data evolved.
```
$ ~/dbnomics-docker/bin/dbnomics-dev-down
```
Indexed data are kept in a Docker volume and will remain available as long as you don't delete it.
### Implementation details
## Access services
`dbnomics-dev` is strongly based on docker (for linux). It uses docker-compose and docker commands to launch a local stack of dbnomics and solr servers.
After the different services have started, you can view the imported data by opening the [local DBnomics website](http://localhost:3000)!
First, script `dbnomics-dev-up` pull dbnomics-api and dbnomics-website images. Those images are built from `redpelicans/dnnomics-*` repos and pushed to redpelicans docker hub account.
In the short term, those images must be generated directly from dbnomics-api and dbnomics-website pipelines and pushed to dbnomics docker repos.
You have access to those services:
Then, `dbnomics-dev-up` init docker volumes. One, `solr-conf`, to setup solr with right `dbnomics` core and dedicated config files, the other, `json-data`, to store incoming json data. Latter is shared between `solr` and `api`.
* DBnomics Web site, available under http://localhost:3000
* DBnomics Web API, available under http://localhost:5000 with docs available under http://localhost:5000/apidocs
* Apache Solr, available under http://localhost:8983
Time comes to run docker images:
## Shutdown Docker stack
* `dbnomics-dev-solr`: based on solr:7 and extended with python 3.5 with embedded `dbnomics-importer` sources.
* `dbnomics-dev-api`: dockerisation of `dbnomics-api`, see https://gitlab.com/oecd/dbnomics-api
* `dbnomics-dev-websire`: dockerisation of `dbnomics-website`, see https://gitlab.com/oecd/dbnomics-ui
To shutdown the local instance of DBnomics:
NB:
```bash
docker-compose down
```
* python3.5 is used for importer because solr image is based on Debian 9, and only 3.5 is available in apt repos
This is the end, servers are shut down, Docker containers are removed, but volumes are kept.
If you really want to remove volumes, you can pass the `--volumes` option.
That's all folks ...
Keeping volumes allows to restart the Docker stack of DBnomics without having to index data again.
#!/bin/sh
CURRENT_DIR=$(dirname $(readlink -f $0))
CONFIG="${CURRENT_DIR}"/../config
docker-compose -f $CONFIG/docker-compose.yaml down
docker volume rm -f solr-conf
docker volume rm -f json-data
#!/bin/sh
export JSON_DATA_PATH=$1
if [ ! -d $JSON_DATA_PATH ]
then
echo "Usage: dbnomics-dev-load PATH"
echo
echo "PATH is a relative or absolute path to a provider json-data folder"
exit
fi
JSON_DATA_DIR=$(basename $1)
echo "Copy json data from ${JSON_DATA_PATH} "
docker run -d --rm --name json-data-loader -v json-data:/home alpine tail -f /dev/null
docker cp ${JSON_DATA_PATH} json-data-loader:/home
docker exec json-data-loader chmod -R +rX /home
docker rm -f json-data-loader
echo "Import data from provider ${JSON_DATA_DIR} "
docker exec dbnomics-dev-solr bash -c "/opt/importer/import_storage_dir.py --full /opt/json-data/${JSON_DATA_DIR}"
#!/bin/sh
CURRENT_DIR=$(dirname $(readlink -f $0))
CONFIG="${CURRENT_DIR}"/../config
CACHE="${CONFIG}"/.cache
IMPORTER_REPO=dbnomics-importer
IMPORTER_PATH=$CACHE/$IMPORTER_REPO
URL=https://git.nomics.world/dbnomics/$IMPORTER_REPO.git
echo $CONFIG
if [ ! -d $IMPORTER_PATH ]
then
echo "Cloning dbnomics-importer"
git clone --quiet $URL $IMPORTER_PATH
else
echo "Updating dbnomics-importer repo"
cd $IMPORTER_PATH
git pull origin master
cd ..
fi
echo "Update images"
docker pull redpelicans/dbnomics-api
docker pull redpelicans/dbnomics-website
echo "Create Docker volumes"
docker volume rm -f solr-conf
docker volume rm -f json-data
docker volume create --name=solr-conf
docker volume create --name=json-data
echo "Precreate solr code"
docker run --rm -v solr-conf:/opt/solr/server/solr/mycores solr:7 bash -c "precreate-core dbnomics"
docker run -d --rm --name init-solr -v solr-conf:/root alpine tail -f /dev/null
docker exec -it init-solr rm /root/dbnomics/conf/managed-schema
docker cp $IMPORTER_PATH/solr_core_config/schema.xml init-solr:/root/dbnomics/conf
docker cp $IMPORTER_PATH/solr_core_config/solrconfig.xml init-solr:/root/dbnomics/conf
docker rm -f init-solr
echo "Run docker composition setup"
docker-compose -f $CONFIG/docker-compose.yaml build
docker-compose -f $CONFIG/docker-compose.yaml up
#!/bin/bash
# mydir=$(dirname "$0")
mydir=$(dirname $(readlink -f $0))
echo $mydir
cat ${mydir}/../config/docker-compose.yaml
FROM python:3.8
RUN mkdir -p /opt
WORKDIR /opt
COPY .cache/dbnomics-importer .
RUN pip3 install --requirement requirements.txt
CMD ./import_storage_dir.py --full /opt/data/$PROVIDER-json-data
FROM solr:7
USER root
RUN apt-get update && \
apt-get install -yq python3 python3-pip && \
apt-get clean && apt-get autoremove -y && rm -rf /var/lib/apt/lists/*
RUN mkdir -p /opt/importer
RUN mkdir -p /opt/json-data
RUN chown -R solr:solr /opt/json-data
RUN pip3 install virtualenv
WORKDIR /opt/importer
COPY .cache/dbnomics-importer .
RUN pip3 install --requirement /opt/importer/requirements.txt
USER solr
WORKDIR /opt/solr
ENV SOLR_POST="/opt/solr/bin/post"
version: '3.7'
services:
solr:
container_name: dbnomics-dev-solr
build:
context: .
dockerfile: Dockerfile.solr
volumes:
- solr-conf:/opt/solr/server/solr/mycores
- json-data:/opt/json-data:rw
ports:
- "0.0.0.0:18983:8983"
website:
container_name: dbnomics-dev-website
image: redpelicans/dbnomics-website
ports:
- "0.0.0.0:13000:3000"
api:
container_name: dbnomics-dev-api
image: redpelicans/dbnomics-api
ports:
- "0.0.0.0:15000:5000"
volumes:
- json-data:/json-data:rw
volumes:
solr-conf:
external: true
json-data:
external: true
version: "3.7"
services:
api:
image: git.nomics.world:4567/dbnomics/dbnomics-api:latest
# build:
# context: ../dbnomics-api
environment:
SOLR_BASE_URL: http://solr:8983
SOLR_CORE_NAME: dbnomics
JSON_DATA_BASE_DIR: /json-data
depends_on:
- solr
ports:
- "3000:3000"
- "5000:5000"
volumes:
- ${JSON_DATA_BASE_DIR:?}:/json-data:ro
solr:
image: solr:8.4
volumes:
- ./solr/dbnomics-configset:/opt/solr/server/solr/configsets/dbnomics:ro
- solr-data:/var/solr
ports:
- "8983:8983"
# environment:
# VERBOSE: "yes" # let solr-precreate be more verbose
command: solr-precreate dbnomics /opt/solr/server/solr/configsets/dbnomics
website:
image: git.nomics.world:4567/dbnomics/dbnomics-website:latest
# build:
# context: ../dbnomics-website
# args:
# DBNOMICS_UI_API_URL: http://localhost:5000
# DBNOMICS_UI_BASE_URL: http://localhost:3000
# DBNOMICS_UI_NAV_BAR_BANNER: local via Docker
depends_on:
- api
network_mode: service:api
volumes:
solr-data:
version: "3.7"
services:
importer:
build:
context: ../dbnomics-importer
depends_on:
- solr
environment:
SOLR_HOST: solr
volumes:
- ${PROVIDER_DATA_DIR}:/provider-data:ro
#!/bin/bash
set -e
if [ -z "$1" ]; then
echo "usage: $0 <provider_data_dir>"
exit -1
fi
export PROVIDER_DATA_DIR="$1"
shift
docker-compose -f docker-compose.yml -f ./importer/docker-compose.import.yml run importer import-provider /provider-data $@
# Set of Catalan contractions for ElisionFilter
# TODO: load this as a resource from the analyzer and sync it in build.xml
d
l
m
n
s
t
# Set of French contractions for ElisionFilter
# TODO: load this as a resource from the analyzer and sync it in build.xml
l
m
t
qu
n
s
j
d
c
jusqu
quoiqu
lorsqu
puisqu
# Set of Irish contractions for ElisionFilter
# TODO: load this as a resource from the analyzer and sync it in build.xml
d
m
b
# Set of Italian contractions for ElisionFilter
# TODO: load this as a resource from the analyzer and sync it in build.xml
c
l
all
dall
dell
nell
sull
coll
pell
gl
agl
dagl
degl
negl
sugl
un
m
t
s
v
d
# Set of Irish hyphenations for StopFilter
# TODO: load this as a resource from the analyzer and sync it in build.xml
h
n
t
# Set of overrides for the dutch stemmer
# TODO: load this as a resource from the analyzer and sync it in build.xml
fiets fiets
bromfiets bromfiets
ei eier
kind kinder
This diff is collapsed.
# This file was created by Jacques Savoy and is distributed under the BSD license.
# See http://members.unine.ch/jacques.savoy/clef/index.html.
# Also see http://www.opensource.org/licenses/bsd-license.html
# Cleaned on October 11, 2009 (not normalized, so use before normalization)
# This means that when modifying this list, you might need to add some
# redundant entries, for example containing forms with both أ and ا
من
ومن
منها
منه
في
وفي
فيها
فيه
و
ف
ثم
او
أو
ب
بها
به
ا
أ
اى
اي
أي
أى
لا
ولا
الا
ألا
إلا
لكن
ما
وما
كما
فما
عن
مع
اذا
إذا
ان
أن
إن
انها
أنها
إنها
انه
أنه
إنه
بان
بأن
فان
فأن
وان
وأن
وإن
التى
التي
الذى
الذي
الذين
الى
الي
إلى
إلي
على
عليها
عليه
اما
أما
إما
ايضا
أيضا
كل
وكل
لم
ولم
لن
ولن
هى
هي
هو
وهى
وهي
وهو
فهى
فهي
فهو
انت
أنت
لك
لها
له
هذه
هذا
تلك
ذلك
هناك
كانت
كان
يكون
تكون
وكانت
وكان
غير
بعض
قد
نحو
بين
بينما
منذ
ضمن
حيث
الان
الآن
خلال
بعد
قبل
حتى
عند
عندما
لدى
جميع
# This file was created by Jacques Savoy and is distributed under the BSD license.
# See http://members.unine.ch/jacques.savoy/clef/index.html.
# Also see http://www.opensource.org/licenses/bsd-license.html
а
аз
ако
ала
бе
без
беше
би
бил
била
били
било
близо
бъдат
бъде
бяха
в
вас
ваш
ваша
вероятно
вече
взема
ви
вие
винаги
все
всеки
всички
всичко
всяка
във
въпреки
върху
г
ги
главно
го
д
да
дали
до
докато
докога
дори
досега
доста
е
едва
един
ето
за
зад
заедно
заради
засега
затова
защо
защото
и
из
или
им
има
имат
иска
й
каза
как
каква
какво
както
какъв
като
кога
когато
което
които
кой
който
колко
която
къде
където
към
ли
м
ме
между
мен
ми
мнозина
мога
могат
може
моля
момента
му
н
на
над
назад
най
направи
напред
например
нас
не
него
нея
ни
ние
никой
нито
но
някои
някой
няма
обаче
около
освен
особено
от
отгоре
отново
още
пак
по
повече
повечето
под
поне
поради
после
почти
прави
пред
преди
през
при
пък
първо
с
са
само
се
сега
си
скоро
след
сме
според
сред
срещу
сте
съм
със
също
т
тази
така
такива
такъв
там
твой
те
тези
ти
тн
то
това
тогава
този
той
толкова
точно
трябва
тук
тъй
тя
тях
у
харесва
ч
че
често
чрез
ще
щом
я
# Catalan stopwords from http://github.com/vcl/cue.language (Apache 2 Licensed)
a
abans
ací
ah
així
això
al
als
aleshores
algun
alguna
algunes
alguns
alhora
allà
allí
allò
altra
altre
altres
amb
ambdós
ambdues
apa
aquell
aquella
aquelles
aquells
aquest
aquesta
aquestes
aquests
aquí
baix
cada
cadascú
cadascuna
cadascunes
cadascuns
com
contra
d'un
d'una
d'unes
d'uns
dalt
de
del
dels
des
després
dins
dintre
donat
doncs
durant
e
eh
el
els
em
en
encara
ens
entre
érem
eren
éreu
es
és
esta
està
estàvem
estaven
estàveu
esteu
et
etc
ets
fins
fora
gairebé
ha
han
has
havia
he
hem
heu
hi
ho
i
igual
iguals
ja
l'hi
la
les
li
li'n
llavors
m'he
ma
mal
malgrat
mateix
mateixa
mateixes
mateixos
me
mentre
més
meu
meus
meva
meves
molt
molta
moltes
molts
mon
mons
n'he
n'hi
ne
ni
no
nogensmenys
només
nosaltres
nostra
nostre
nostres
o
oh
oi
on
pas
pel
pels
per
però
perquè
poc
poca
pocs
poques
potser
propi
qual
quals
quan
quant
que
què
quelcom
qui
quin
quina
quines
quins
s'ha
s'han
sa
semblant
semblants
ses
seu
seus
seva
seva
seves
si
sobre
sobretot
sóc
solament
sols
son
són
sons
sota
sou
t'ha
t'han
t'he
ta
tal
també
tampoc
tan
tant
tanta
tantes
teu
teus
teva
teves
ton
tons
tot
tota
totes
tots
un
una
unes
uns
us
va
vaig
vam
van
vas
veu
vosaltres
vostra
vostre
vostres
a
s
k
o
i
u
v
z
dnes
cz
tímto
budeš
budem
byli
jseš
můj
svým
ta
tomto
tohle
tuto
tyto
jej
zda
proč
máte
tato
kam
tohoto
kdo
kteří
mi
nám
tom
tomuto
mít
nic
proto
kterou
byla
toho
protože
asi
ho
naši
napište
re
což
tím
takže
svých
její
svými
jste
aj
tu
tedy
teto
bylo
kde
ke
pravé
ji
nad
nejsou
či
pod
téma
mezi
přes
ty
pak
vám
ani
když
však
neg
jsem
tento
článku
články
aby
jsme
před
pta
jejich
byl
ještě
bez
také
pouze
první
vaše
která
nás
nový
tipy
pokud
může
strana
jeho
své
jiné
zprávy
nové
není
vás
jen
podle
zde
být
více
bude
již
než
který
by
které
co
nebo
ten
tak
při
od
po
jsou
jak
další
ale
si
se
ve
to
jako
za
zpět
ze
do
pro
je
na
atd
atp
jakmile
přičemž
on
ona
ono
oni
ony
my
vy
ji
mne
jemu
tomu
těm
těmu
němu
němuž
jehož
jíž
jelikož
jež
jakož
načež
| From svn.tartarus.org/snowball/trunk/website/algorithms/danish/stop.txt
| This file is distributed under the BSD License.
| See http://snowball.tartarus.org/license.php
| Also see http://www.opensource.org/licenses/bsd-license.html
| - Encoding was converted to UTF-8.
| - This notice was added.
|
| NOTE: To use this file with StopFilterFactory, you must specify format="snowball"
| A Danish stop word list. Comments begin with vertical bar. Each stop
| word is at the start of a line.
| This is a ranked list (commonest to rarest) of stopwords derived from
| a large text sample.
og | and
i | in
jeg | I
det | that (dem. pronoun)/it (pers. pronoun)
at | that (in front of a sentence)/to (with infinitive)
en | a/an
den | it (pers. pronoun)/that (dem. pronoun)
til | to/at/for/until/against/by/of/into, more
er | present tense of "to be"
som | who, as
på | on/upon/in/on/at/to/after/of/with/for, on
de | they
med | with/by/in, along
han | he
af | of/by/from/off/for/in/with/on, off
for | at/for/to/from/by/of/ago, in front/before, because
ikke | not
der | who/which, there/those
var | past tense of "to be"
mig | me/myself
sig | oneself/himself/herself/itself/themselves
men | but
et | a/an/one, one (number), someone/somebody/one
har | present tense of "to have"
om | round/about/for/in/a, about/around/down, if
vi | we
min | my
havde | past tense of "to have"
ham | him
hun | she
nu | now
over | over/above/across/by/beyond/past/on/about, over/past
da | then, when/as/since
fra | from/off/since, off, since
du | you
ud | out
sin | his/her/its/one's
dem | them
os | us/ourselves
op | up
man | you/one
hans | his
hvor | where
eller | or
hvad | what
skal | must/shall etc.
selv | myself/youself/herself/ourselves etc., even
her | here
alle | all/everyone/everybody etc.
vil | will (verb)
blev | past tense of "to stay/to remain/to get/to become"
kunne | could
ind | in
når | when
være | present tense of "to be"
dog | however/yet/after all
noget | something
ville | would
jo | you know/you see (adv), yes
deres | their/theirs
efter | after/behind/according to/for/by/from, later/afterwards
ned | down
skulle | should
denne | this
end | than
dette | this
mit | my/mine
også | also
under | under/beneath/below/during, below/underneath
have | have
dig | you
anden | other
hende | her
mine | my
alt | everything
meget | much/very, plenty of
sit | his, her, its, one's
sine | his, her, its, one's
vor | our
mod | against
disse | these
hvis | if
din | your/yours
nogle | some
hos | by/at
blive | be/become
mange | many
ad | by/through
bliver | present tense of "to be/to become"
hendes | her/hers
været | be
thi | for (conj)
jer | you
sådan | such, like this/like that
| From svn.tartarus.org/snowball/trunk/website/algorithms/german/stop.txt
| This file is distributed under the BSD License.
| See http://snowball.tartarus.org/license.php
| Also see http://www.opensource.org/licenses/bsd-license.html
| - Encoding was converted to UTF-8.
| - This notice was added.
|
| NOTE: To use this file with StopFilterFactory, you must specify format="snowball"
| A German stop word list. Comments begin with vertical bar. Each stop
| word is at the start of a line.
| The number of forms in this list is reduced significantly by passing it
| through the German stemmer.
aber | but
alle | all
allem
allen
aller
alles
als | than, as
also | so
am | an + dem
an | at
ander | other
andere
anderem
anderen
anderer
anderes
anderm
andern
anderr
anders
auch | also
auf | on
aus | out of
bei | by
bin | am
bis | until
bist | art
da | there
damit | with it
dann | then
der | the
den
des
dem
die
das
daß | that
derselbe | the same
derselben
denselben
desselben
demselben
dieselbe
dieselben
dasselbe
dazu | to that
dein | thy
deine
deinem
deinen
deiner
deines
denn | because
derer | of those
dessen | of him
dich | thee
dir | to thee
du | thou
dies | this
diese
diesem
diesen
dieser
dieses
doch | (several meanings)
dort | (over) there
durch | through
ein | a
eine
einem
einen
einer
eines
einig | some
einige
einigem
einigen
einiger
einiges
einmal | once
er | he
ihn | him
ihm | to him
es | it
etwas | something
euer | your
eure
eurem
euren
eurer
eures
für | for
gegen | towards
gewesen | p.p. of sein
hab | have
habe | have
haben | have
hat | has
hatte | had
hatten | had
hier | here
hin | there
hinter | behind
ich | I
mich | me
mir | to me
ihr | you, to her
ihre
ihrem
ihren
ihrer
ihres
euch | to you
im | in + dem
in | in
indem | while
ins | in + das
ist | is
jede | each, every
jedem
jeden
jeder
jedes
jene | that
jenem
jenen
jener
jenes
jetzt | now
kann | can
kein | no
keine
keinem
keinen
keiner
keines
können | can
könnte | could
machen | do
man | one
manche | some, many a
manchem
manchen
mancher
manches
mein | my
meine
meinem
meinen
meiner
meines
mit | with
muss | must
musste | had to
nach | to(wards)
nicht | not
nichts | nothing
noch | still, yet
nun | now
nur | only
ob | whether
oder | or
ohne | without
sehr | very
sein | his
seine
seinem
seinen
seiner
seines
selbst | self
sich | herself
sie | they, she
ihnen | to them
sind | are
so | so
solche | such
solchem
solchen
solcher
solches
soll | shall
sollte | should
sondern | but
sonst | else
über | over
um | about, around
und | and
uns | us
unse
unsem
unsen
unser
unses
unter | under
viel | much
vom | von + dem
von | from
vor | before
während | while
war | was
waren | were
warst | wast
was | what
weg | away, off
weil | because
weiter | further
welche | which
welchem
welchen
welcher
welches
wenn | when
werde | will
werden | will
wie | how
wieder | again
will | want
wir | we
wird | will
wirst | willst
wo | where
wollen | want
wollte | wanted
würde | would
würden | would
zu | to
zum | zu + dem
zur | zu + der
zwar | indeed
zwischen | between
# Lucene Greek Stopwords list
# Note: by default this file is used after GreekLowerCaseFilter,
# so when modifying this file use 'σ' instead of 'ς'
ο
η
το
οι
τα
του
τησ
των
τον
την
και
κι
κ
ειμαι
εισαι
ειναι
ειμαστε
ειστε
στο
στον
στη
στην
μα
αλλα
απο
για
προσ
με
σε
ωσ
παρα
αντι
κατα
μετα
θα
να
δε
δεν
μη
μην
επι
ενω
εαν
αν
τοτε
που
πωσ
ποιοσ
ποια
ποιο
ποιοι
ποιεσ
ποιων
ποιουσ
αυτοσ
αυτη
αυτο
αυτοι
αυτων
αυτουσ
αυτεσ
αυτα
εκεινοσ
εκεινη
εκεινο
εκεινοι
εκεινεσ
εκεινα
εκεινων
εκεινουσ
οπωσ
ομωσ
ισωσ
οσο
οτι
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# a couple of test stopwords to test that the words are really being
# configured from this file:
stopworda
stopwordb
# Standard english stop words taken from Lucene's StopAnalyzer
a
an
and
are
as
at
be
but
by
for
if
in
into
is
it
no
not
of
on
or
such
that
the
their
then
there
these
they
this
to
was
will
with
| From svn.tartarus.org/snowball/trunk/website/algorithms/spanish/stop.txt
| This file is distributed under the BSD License.
| See http://snowball.tartarus.org/license.php
| Also see http://www.opensource.org/licenses/bsd-license.html
| - Encoding was converted to UTF-8.
| - This notice was added.
|
| NOTE: To use this file with StopFilterFactory, you must specify format="snowball"
| A Spanish stop word list. Comments begin with vertical bar. Each stop
| word is at the start of a line.
| The following is a ranked list (commonest to rarest) of stopwords
| deriving from a large sample of text.
| Extra words have been added at the end.
de | from, of
la | the, her
que | who, that
el | the
en | in