8000 hfst-proc prints tag parts with special symbols for ungeneratables in -g · Issue #573 · hfst/hfst · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
hfst-proc prints tag parts with special symbols for ungeneratables in -g #573
Open
@flammie

Description

@flammie

Basically in any giellalt target pair in apertium, e.g.:

echo  "Satu: Mist lii sávnumeehid!" | hfst-proc --weight-classes 1 -w -e '/home/flammie/github/apertium/apertium-fin-smn/smn-fin.automorf-untrimmed.hfst' | cg-proc -w '/home/flammie/github/apertium/apertium-fin-smn/smn-fin.mor.rlx.bin' | cg-proc -n -1 -w '/home/flammie/github/apertium/apertium-fin-smn/smn-fin.syn.rlx.bin' | apertium-pretransfer | lt-proc -b '/home/flammie/github/apertium/apertium-fin-smn/smn-fin.autobil.bin' | lrx-proc -m '/home/flammie/github/apertium/apertium-fin-smn/smn-fin.autolex.bin' | rtx-proc '/home/flammie/github/apertium/apertium-fin-smn/smn-fin.rtx.bin' 
!! Warning: Transducer contains one or more multi-character symbols made up of
ASCII characters which are also available as single-character symbols. The
input stream will always be tokenised using the longest symbols available.
Use the -t option to view the tokenisation. The problematic symbol(s):
Ä Ö ä ö
^Satu<np><ant><f><sg><nom>$^:<punct>$ ^Minä<prn><pl><ine>$ ^olla<vaux><iv><indic><pres><p3><sg><@+FMAINV>$ ^saunoa<vblex><actv>$ ^ilta<n><sg><nom>$^!<punct>$
echo "Satu: Mist lii sávnumeehid!" | hfst-proc --weight-classes 1 -w -e '/home/flammie/github/apertium/apertium-fin-smn/smn-fin.automorf-untrimmed.hfst' | cg-proc -w '/home/flammie/github/apertium/apertium-fin-smn/smn-fin.mor.rlx.bin' | cg-proc -n -1 -w '/home/flammie/github/apertium/apertium-fin-smn/smn-fin.syn.rlx.bin' | apertium-pretransfer | lt-proc -b '/home/flammie/github/apertium/apertium-fin-smn/smn-fin.autobil.bin' | lrx-proc -m '/home/flammie/github/apertium/apertium-fin-smn/smn-fin.autolex.bin' | rtx-proc '/home/flammie/github/apertium/apertium-fin-smn/smn-fin.rtx.bin' | hfst-proc -d smn-fin.autogen.hfst 
!! Warning: Transducer contains one or more multi-character symbols made up of
ASCII characters which are also available as single-character symbols. The
input stream will always be tokenised using the longest symbols available.
Use the -t option to view the tokenisation. The problematic symbol(s):
Ä Ö ä ö
Satu: #Minä\<prn\>\<pl\>\<ine\> \@+FMAINV\> #saunoa\<vblex\>\<actv\> ilta!

so that unhandled syntax tag @+FMAINV seems to eat the whole word in generation phase. This happens to all words in giellalt pairs in apertium until one finds a way to hide syntax tags in transfer. The expected result would be to have the generation for the word like other words in sentence, e.g. #olla\<vaux\>...\<\@\+FMAINV\>.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0