8000 `hfst-tokenise` returns unknown when weight ≠ 0, and --weight-classes=1 · Issue #562 · hfst/hfst · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
hfst-tokenise returns unknown when weight ≠ 0, and --weight-classes=1 #562
Open
@snomos

Description

@snomos

Compare these two commands:

echo viessogirji | hfst-tokenize --giella-cg --weight-classes=1 tokeniser-disamb-gt-desc.pmhfst
"<viessogirji>"
	"viessogirji" ?
:\n
echo viessogirji | hfst-tokenize --giella-cg tokeniser-disamb-gt-desc.pmhfst
"<viessogirji>"
	"girji" N Sem/Txt Sg Nom <W:10.0>
		"viessu" N Sem/Build Cmp/SgNom Cmp <W:10.0>
:\n

It is not restricted to the --giella-cg mode:

echo viessogirji | hfst-tokenize -x --weight-classes=1 tokeniser-disamb-gt-desc.pmhfst
viessogirji	viessogirji ??
viessogirji	viessogirji ??
viessogirji	viessogirji ??
viessogirji	viessogirji ??
viessogirji	viessogirji ??
viessogirji	viessogirji ??
viessogirji	viessogirji ??
viessogirji	viessogirji ??


echo viessogirji | hfst-tokenize -c --weight-classes=1 tokeniser-disamb-gt-desc.pmhfst
"<viessogirji>"
	"viessogirji" ??
	"viessogirji" ??
	"viessogirji" ??
	"viessogirji" ??
	"viessogirji" ??
	"viessogirji" ??
	"viessogirji" ??
	"viessogirji" ??

although the output in CG mode is a bit strange — why would all the unknown analyses be printed if there is a known analysis (with a non-zero weight)?

echo viessogirji | hfst-tokenize -c tokeniser-disamb-gt-desc.pmhfst
"<viessogirji>"
	"viessogirji" ??
	"viessogirji" ??
	"viessogirji" ??
	"viessogirji" ??
	"viessogirji" ??
	"viessogirji" ??
	"viessogirji" ??
	"viessogirji" ??
	viessu N Sem/Build Cmp/SgNom Cmp#girji N Sem/Txt Sg Nom

Hfst tools from Tino's nightly package from November 8, 2021. macOS 11.6.2.

Tokeniser fst is too big to be included, but can be found here for a limited time.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    0