8000 `hfst-tokenise` returns unknown when weight ≠ 0, and --weight-classes=1 · Issue #562 · hfst/hfst · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

hfst-tokenise returns unknown when weight ≠ 0, and --weight-classes=1 #562

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
snomos opened this issue Jan 19, 2022 · 0 comments
Open
Assignees
Labels

Comments

@snomos
Copy link
Member
snomos commented Jan 19, 2022

Compare these two commands:

echo viessogirji | hfst-tokenize --giella-cg --weight-classes=1 tokeniser-disamb-gt-desc.pmhfst
"<viessogirji>"
	"viessogirji" ?
:\n
echo viessogirji | hfst-tokenize --giella-cg tokeniser-disamb-gt-desc.pmhfst
"<viessogirji>"
	"girji" N Sem/Txt Sg Nom <W:10.0>
		"viessu" N Sem/Build Cmp/SgNom Cmp <W:10.0>
:\n

It is not restricted to the --giella-cg mode:

echo viessogirji | hfst-tokenize -x --weight-classes=1 tokeniser-disamb-gt-desc.pmhfst
viessogirji	viessogirji ??
viessogirji	viessogirji ??
viessogirji	viessogirji ??
viessogirji	viessogirji ??
viessogirji	viessogirji ??
viessogirji	viessogirji ??
viessogirji	viessogirji ??
viessogirji	viessogirji ??


echo viessogirji | hfst-tokenize -c --weight-classes=1 tokeniser-disamb-gt-desc.pmhfst
"<viessogirji>"
	"viessogirji" ??
	"viessogirji" ??
	"viessogirji" ??
	"viessogirji" ??
	"viessogirji" ??
	"viessogirji" ??
	"viessogirji" ??
	"viessogirji" ??

although the output in CG mode is a bit strange — why would all the unknown analyses be printed if there is a known analysis (with a non-zero weight)?

echo viessogirji | hfst-tokenize -c tokeniser-disamb-gt-desc.pmhfst
"<viessogirji>"
	"viessogirji" ??
	"viessogirji" ??
	"viessogirji" ??
	"viessogirji" ??
	"viessogirji" ??
	"viessogirji" ??
	"viessogirji" ??
	"viessogirji" ??
	viessu N Sem/Build Cmp/SgNom Cmp#girji N Sem/Txt Sg Nom

Hfst tools from Tino's nightly package from November 8, 2021. macOS 11.6.2.

Tokeniser fst is too big to be included, but can be found here for a limited time.

@snomos snomos added the bug label Jan 19, 2022
@flammie flammie self-assigned this Jan 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants
0