8000 GitHub - keisks/joshua: Joshua Statistical Machine Translation Toolkit
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

keisks/joshua

 
 

Repository files navigation

Welcome to Joshua
-----------------

Joshua is a statistical machine translation toolkit for both
phrase-based (new in version 6.0) and syntax-based decoding. It can be
run with pre-built language packs available for download, and can also
be used to build models for new language pairs. Among the many features of
Joshua are:

- Support for both phrase-based and syntax-based decoding models
- Translation of weighted input lattices
- [Thrax](http://joshua-decoder.org/6.0/thrax.html): a Hadoop-based, scalable
  grammar extractor
- A [sparse feature architecture](http://cs.jhu.edu/~post/joshua-docs/md_sparse_features.html)
  supporting an arbitrary number of features

The latest release of Joshua is 6.0, released in January of 2014.

New in 6.0
----------

Joshua 6.0 includes the following new features:

- A fast phrase-based decoder with the ability to read [Moses](http://statmt.org/moses) 
  phrase tables
- Large speed improvements compared to the previous syntax-based decoder
- Special input handling
- A host of bugfixes and stability improvements

Working with "language packs"
-----------------------------

Joshua includes a number of "language packs", which are pre-built models that
allow you to use the translation system as a black box, without worrying too
much about how machine translation works. You can browse the models available
for download on the [Joshua
website](http://joshua-decoder.org/language-packs/).

Building new models
-------------------

Joshua includes a pipeline script that allows you to build new models, provided
you have training data.  This pipeline can be run (more or less) by invoking a
single command, which handles data preparation, alignment, phrase-table or
grammar construction, and tuning of the model parameters. See [the
documentation](http://joshua-decoder.org/pipeline.html)
for a walkthrough and more information about the many available options.

Quick start
-----------

To run the decoder in any form requires setting a few basic environment
variables: `$JAVA_HOME`, `$JOSHUA`, and potentially `$MOSES`.

    export JAVA_HOME=/path/to/java  # maybe /usr/java/home
    export JOSHUA=/path/to/joshua

You might also find it helpful to set these:

    export LC_ALL=en_US.UTF-8
    export LANG=en_US.UTF-8

Then, compile Joshua by typing:

    cd $JOSHUA
    ant

The basic method for invoking the decoder looks like this:

    cat SOURCE | JOSHUA m MEM -c CONFIG OPTIONS > OUTPUT

Some example usage scenarios and scripts can be found in the `examples/`
directory.

About

Joshua Statistical Machine Translation Toolkit

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Java 48.3%
  • C++ 39.1%
  • Shell 5.8%
  • Perl 4.4%
  • C 0.9%
  • Python 0.9%
  • Other 0.6%
0