-
Notifications
You must be signed in to change notification settings - Fork 43
Bitextor Docker
To install Docker refer to Docker documentation (recommended), or install via snap:
sudo snap install docker
Docker image of bitextor is available here. Both release and nightly versions are available:
docker pull bitextor/bitextor # latest release
docker pull bitextor/bitextor:edge # nightlies from Github master branch
After pulling the Bitextor image, Bitextor can be run with:
docker run bitextor/bitextor
The command above will launch bitextor.sh
and show the help message.
Any arguments that are provided to this command will be passed to bitextor.sh
.
To run Bitextor you need to provide some resources to it, such as the config file, translation system, dictionaries, bicleaner model, etc. The Docker container must be able to read these files. Likewise, the output of Bitextor ideally should also be shared with the host system in a similar way. One way to achieve this is to use Docker volumes.
The following snippet is an example of Docker volume usage. The folder /home/user/bitextor-files
of the host system will be mounted to the /home/docker/data
folder of the container.
docker run -v /home/user/bitextor-files:/home/docker/data bitextor/bitextor
# multiple volumes are also allowed:
docker run -v /home/user/bitextor_input:/home/docker/bitextor_input -v /home/user/bitextor_output:/home/docker/bitextor_output bitextor/bitextor
In the image Bitextor folder and other relevant dependencies are located at /home/docker
.
All of the dependencies and compilations are fulfilled.
$ ls /home/docker
bitextor go heritrix-3.4.0-SNAPSHOT protobuf-3.10.1
It is important to note that the paths to the input files and scripts that are specified in the configuration file should refer to the folders that will be mounted inside the Docker container. The path of the config file passed to Bitextor via command line should also be relative to the container.
$ # ~/bitextor-data folder contains the config file and the bicleaner model
$ ls ~/bitextor-data
bitextor_config.yaml en-es
$ # config file argument relative to the container
$ docker run -v ~/bitextor-data:/home/docker/corpus bitextor/bitextor -s /home/docker/corpus/bitextor_config.yaml
# in bitextor_config.yaml:
# make sure output files are in the volume,
# so that they can be accessed directly from the host machine
permanentDir: /home/docker/corpus/permanent
transientDir: /home/docker/corpus/transient
dataDir: /home/docker/corpus/data
# bicleaer model path also relative to the container
bicleaner: /home/docker/corpus/en-es/en-es.yaml
By default Bitextor Docker image launches bitextor.sh
scripts, if you want to change that behavior to launch an interactive shell instead, you have to change the entrypoint of the container.
When launching the container for the first time:
docker pull bitextor/bitextor
docker run -it --entrypoint /bin/bash bitextor # will open an interactive shell
To run a shell in an existing Bitextor container (i.e. after you have already run Bitextor in the default way), you first have to create an image of it, and then run that image with a different entrypoint like in the snippet above. To create an image based on an existing container first find out the name or the ID.
docker ps -a # will list your containers with some basic info
docker commit <CONTAINER_ID> bitextor/new_image # create new image
docker run -it --entrypoint /bin/bash bitextor/new_image # run shell in the new image