8000 Tokenizer should split on apostrophe · Issue #9113 · tutao/tutanota · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
Tokenizer should split on apostrophe #9113
Closed
@paw-hub

Description

@paw-hub

In French, this is used to connect a subject to a verb or an article with a noun when the first word ends with a vowel and the second starts with one.

For example:

  • Je aime -> J'aime
  • Tu aimes -> T'aimes

As such, it would not be expected for the first word to be part of the second even if it appears to be written like this. You may still want to search for the second word (which is unmodified).

Related pull requests:

Test notes:

  • Write an email that says T'aimes. Verify that searching for aimes finds it.
  • Write an email that says J’aime (note the special apostrophe). Verify that searching for aime finds it.
  • Write an email that says tutamail.com. Verify that searching for tutamail finds it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    dev bugunpublished bugs, found during our development/test cycle (excluded from release notes)state:testedWe tested it and are about to release it

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0