10000 Install by yzhang123 · Pull Request #36 · NVIDIA/NeMo-text-processing · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Install #36

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Feb 7, 2023
Merged

Install #36

merged 8 commits into from
Feb 7, 2023

Conversation

yzhang123
Copy link
Contributor

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Collection: [Note which collection this PR will affect]

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

yzhang123 and others added 7 commits February 7, 2023 10:07
Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
@yzhang123 yzhang123 requested a review from ekmb February 7, 2023 19:09
@yzhang123 yzhang123 closed this Feb 7, 2023
@yzhang123 yzhang123 reopened this Feb 7, 2023
@ekmb ekmb merged commit 34350d9 into main Feb 7, 2023
@ekmb ekmb deleted the install branch February 7, 2023 20:09
BuyuanCui pushed a commit to BuyuanCui/NeMo-text-processing that referenced this pull request Jul 6, 2023
* remove conda pynini install

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added pynini install note

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: BuyuanCui <alexcui1994@gmail.com>
mgrafu pushed a commit that referenced this pull request Jul 18, 2023
* remove conda pynini install

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added pynini install note

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
BuyuanCui pushed a commit that referenced this pull request Sep 26, 2024
* remove conda pynini install

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added pynini install note

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>
mgrafu added a commit that referenced this pull request Oct 1, 2024
* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* temporal changes will change back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update jp tn date

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving conflict

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing ci test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updats on Jenkins

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jenkinspdate

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* changing the data format, to align to the blind test data

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding one more test item

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing ci test cases
resolving conflicts
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and NEMO_SPACES_AND_ALHPANUMERICS

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixed typo on decimaltext

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing unsed improts

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unused import

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* changed regular space to narrow space

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* imports error fixing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* imports errors

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Jekins update for jp itn

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* reverting

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issuel chaing narrow space to regular normal space

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixing style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixng style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed imports

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jp tn date update

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Update test_cases_fraction.txt

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* removing previously created nemo imports

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* test order arrangement

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolve fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update jp tn

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed import

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Update post_processing.py

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* empty file

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* to delete

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add contributing (#21)

* add contributing

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* add

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add jenkins file (#23)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish TN (#12)

* test now runs, but getting ordinal instead of cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* force ordinals to either have :a/:e or "." at the end

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add minimal ordinal data

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for ordinals, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add // to symbols

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test cases for electronic; transformed with sed from spanish, so I expect errors

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for electronic, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move to graph_utils

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal test case for fractions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for fraction, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix language

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix graph construction to make pluralisation work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for decimal, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for whitelist, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal test case for whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for word, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for date, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for measure, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a pair of test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix plurals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix number, but this whole thing is only partially adapted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix some test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add usd$

Signed-off-by: Jim O'Regan <joregan@kth.se>

* insert "komma"

Signed-off-by: Jim O'Regan <joregan@kth.se>

* "pund" is neuter

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* towards proper graphs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* GBP

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make komma non-det

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more money tagger fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more minor words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* do a bit better with en/ett

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use the correct list

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make sure the numbers have no 1

Signed-off-by: Jim O'Regan <joregan@kth.se>

* abbreviations for million and milliard

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add year suffixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add minimal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* expansions of era abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use eras

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use eras in verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix examples in comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix extension

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* date verbaliser is broken, this does not fix it

Signed-off-by: Jim O'Regan <joregan@kth.se>

* load labels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* right first time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missing space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix year in test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* getting closer to getting dates working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a (failing) test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* date working now

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also handle decades

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove todo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* years where -00 is -hundra

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for telephone (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changes to telephone tagger/verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add partially incomplete test data

Signed-off-by: Jim O'Regan <joregan@kth.se>

* mostly fixed test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more in progress changes to telephone parts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* much prodding later, turns out I forgot a space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missed wrapping

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "no difference"

This reverts commit 29680925bebd65d489f3b1a5415607c12bb7e3b9.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* telephone tagger works

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try adding brackets

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try adding more brackets

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a comment, because I confused myself

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add in abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a version for fraction that does what I intended: 2 & 3 digit numbers without leading 0 are read as cardinals, everything else as digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* single digit

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test case/remove a duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use the nice variable I just added to cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* this is not right; leading zeros fail

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "this is not right; leading zeros fail"

This reverts commit 5997e95e0cb08ffee9cf21a9c82697ed7beb042f.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, this seems to work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* drop the tests starting with comma

Signed-off-by: Jim O'Regan <joregan@kth.se>

* decimal tagger works

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* lower case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add klockan and variants as a prompt, so they are not silently deleted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a very minimal test case for time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite with less ambiguity

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite with less ambiguity, hms

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add prompt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy the roman handling from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* greek letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some fixes to the time tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for time (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for time (adapted from es) ((actually adapted))

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more work on time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* |=, not =

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt verbaliser a little

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some test cases from module comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export some variables to check

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment some stuff that needs major changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try doing this here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try doing this here"

This reverts commit ebdba0e3da5cdde19eae24f268ee0edd21f298b5.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix errors in tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minimal test cases for measure

Signed-off-by: Jim O'Regan <joregan@kth.se>

* sort|uniq everything, see what the difference is

Signed-off-by: Jim O'Regan <joregan@kth.se>

* merge different tsvs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix casing to avoid conflicts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export some variables for testing

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* need an en/ett split here too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix decimal subgraph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove todo, I've just done it

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove missing integer test, does not work elsewhere

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* include greek letters in maths

Signed-off-by: Jim O'Regan <joregan@kth.se>

* include greek here too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minor sg/pl

Signed-off-by: Jim O'Regan <joregan@kth.se>

* dedup

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put these under if, too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no; there are no minor neuters, so that is not relevant here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove greek from here, interferes with delimeter

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export variables to see what is happening

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix some test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* here is one error

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* put ensure_space in graph_utils

Signed-off-by: Jim O'Regan <joregan@kth.se>

* handle cases where unit follows amount

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export a variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a tesst case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* . is not a cardinal separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add yen

Signed-off-by: Jim O'Regan <joregan@kth.se>

* final fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove English roman tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add tokenize_and_classify_lm.py (adapted from en)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove some unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add tokenize_and_classify_with_audio.py (adapted from en)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test pieces for audio (recopied from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add audio test (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* in non-deterministic mode, generate both en and ett

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal non-deterministic test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <joregan@kth.se>

* warnings about missing whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* some Riksdag specific titles

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add my copyright to the other files with non-trivial changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix year

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Swedish support in pynini_export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add Swedish support for sparrowhark tests -- untested (:

Signed-off-by: Jim O'Regan <joregan@kth.se>

* address codeql comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change decade to year; sparrowhawk enforces categories

Signed-off-by: Jim O'Regan <joregan@kth.se>

* shoehorn this stuff into the overly narrow sparrowhawk classes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "shoehorn this stuff into the overly narrow sparrowhawk classes"

This reverts commit a3cf3d5de1702366b2bf9c12ebf7e5d26634c688.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* read out the AM/PM words, they are not read as letters anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change date verbaliser to manage isolated decades

Signed-off-by: Jim O'Regan <joregan@kth.se>

* redo changes to get rid of 'prompt' for 'klockan'

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove broken duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a case for hours without minutes (which should not happen)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* time tests now pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a time test case that also passes here, but not in sparrowhawk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix error in dates, add more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* import delete_preserve_order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeql feedback

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some ambiguous abbreviations for non-deterministic mode (more as a demonstration than anything deeply useful)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move to the correct subdirectory

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add swedish

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove pynini checks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix error with 1000 in non-deterministic cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix here also

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also generate a string of digits if not deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a date case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove duplication

Signed-off-by: Jim O'Regan <joregan@kth.se>

* boost n_tagged

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also copyright this year

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 1500 only fixes one, boost again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 2500 does nothing, going to -1

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway"

This reverts commit 383a096083061b0c79457e815a65e55563c7ac74.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try setting a low weight to everything non-default

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put n_tagged back to 500

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* days of the week

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add more abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* setting weights did not work for cardinals, but it did push the test from taking 11 minutes to something more than 40. Re-reverting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove blank line

Signed-off-by: Jim O'Regan <joregan@kth.se>

* forgot to remove this piece in the merge conflict

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove erroneously added copyright notice

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add __init__.py in a few places it was missing

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add the google notice required by the incoming contributing document

Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CI setup (#25)

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci _cr

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* revert setup tool

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove pytest-runner from setup.py

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Merge EN riva release 22.10 (#26)

* Merge EN riva release 22.10

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Eng TN - update urls to handle dictionary words (#27)

* wip el words

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip el words

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* electronic pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* test pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove unused imports

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add deterministic option normalized options

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins grammar folder

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up, update for SH

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* reduce cardinal graph

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add weight for sh

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Tn en astronomical no (#28)

* Add support for large numbers (>999,999,999,999,999)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache folder in Jenkinsfile

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Increase mem size for CI tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Updating shmem for docker to deal with memory overflow

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Ensure large au cardinal graph is used only if deterministic

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Make comma mandatory in cardinals

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Run FST cache generation and Pytests in separate stages

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix stage

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Change cache folder

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add whitelist param to ITN (#30)

* add whitelist param to itn

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add whitelist to export

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update docstrings

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Eng tn itn (#31)

* Add additional units and plurals

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add support for financial periods (1H22, 2Q19)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add missing plural for "gigabit per second"

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix for measures

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Use environment variables to set path of fst cache

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix environment variable

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Fix parse "None" as string (#33)

* Fix parse "None" as string

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* read double digits for telephone grammar (#32)

* read double digits for telephone grammar

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* import zero graph instead of hard coding

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>

---------

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Install (#35)

* remove conda pynini install

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added pynini install note

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Install (#36)

* remove conda pynini install

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added pynini install note

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* [pre-commit.ci] auto fixes from pre-com
341A
mit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.6rc0 (#37)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add ci (#39)

* Add additional languages to CI Pipeline

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix Jenkinsfile

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add missing 'ar' in lang options

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix missing 'ar' in normalize.py

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Correct name of verbalizer far

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Run language tests in stages

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update DE cache folder

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add VI, RU, SV CI tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix misssing bracket, add ZH

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Use non-deterministic TN for RU

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* support the use of phonetic superscript letters for ordinals, because there are maniacs on the internet who think that because you can, you should (#41)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update fr cache path for ci (#44)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update ITN to work after Punctuation capitalization model (#22)

* add cases with capitalization, cardinal, decimal pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix telephone, ordinal

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update electronic

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* review feedback, update whitelist

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* rename capitalize func

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix SH tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins folder name

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* added cased arg to ITN

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add input_case arg to other lang

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dirs update

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix codeql errors

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix sh

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* review

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix default value

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* En names (#42)

* Add support for Financial year and for  years between 1000 BC and 1000AD

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add support for product names and add abbreviations to whitelist

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add weights for some sequences, exclude 'a' before numeric sequence

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache folder for EN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update FR Cache path

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Move text to TSV files, and some code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add additional vocabulary, allow singular usage of units to support adjective phrases

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix issue with whitelist loader not handling weights correctly
Move cased loader file to graph_utils

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* insert space between value and unit

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Insert space between measurement and unit. Adjust weight for ordinal

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update doc and fix alignment for itn (#47)

* save

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* save

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* extend alignment for itn

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Align ci test (#51)

* added jenkins tests for aligment

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added test to pr doc

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Audio-based TN for Swedish (#49)

* Audio-based TN for Swedish, for Språkbanken Tal

Replaces #48

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updating cache directory

(Not entirely sure what the pattern is)

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Delete tokenize_and_classify_lm.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* fraction fix from ITN branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix sv tests (#52)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.7 release (#53)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* En names (#56)

* Rename "period" tag to "text" tag for date to avoid changes to sparrowhawk proto

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update Jenkinsfile

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix bug for hh:mm:ss normalization (#57)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* rewrite regex to silence deprecation warning (#55)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Hungarian TN ✅ (#9)

* additional exports from cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add inflection for quantities

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* enable decimal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change integer

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to verbaliser for decimal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add superessive forms (powers of)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* superscript to superessive

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add vowels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add vowels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix var

Signed-off-by: Jim O'Regan <joregan@kth.se>

* bare minimum electronic test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a symbol

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add incomplete time tagger (partially adapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix error with some inflected abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some alternative measure forms

Signed-off-by: Jim O'Regan <joregan@kth.se>

* hour, minute, second; whichever is last can be inflected

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add test runner for time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal time test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* will want cardinal here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add inflection for things like GBP, where inflection is based on pé

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* docstring

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move two letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add my copyright

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partially adapted number tagger (adapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add unadapted measure tagger (from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* other ways of reading w

Signed-off-by: Jim O'Regan <joregan@kth.se>

* for non deterministic, a bunch of these symbols can be read as letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* currency

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more inflection

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get the abbreviation expanded as letters for non-deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* working now, add a comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also integer, and preserve order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also accept the full words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* deduplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* reorder to make a bit more sense

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see h
F438
ttps://pre-commit.ci

* explicitly make tuples elsewhere; this works from what I see of the function output, but not in the resulting fst

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* commenting out weighted part makes this work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* duplicate space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partially adapted money verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* actually saving the adaptations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add time_zone data (copy from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete commented code, irrelevant here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small modifications, still thinking about how to tackle this

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix missing tabs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove pynini checks from tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for measure (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for telephone (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for time (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix cache dir

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tagger for telephone (copy from sv)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add basic tests (native verified)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add components for read digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an example with a different separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* start adapting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add 2-digit area codes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Bp to area codes, no need to be that specific

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export var

Signed-off-by: Jim O'Regan <joregan@kth.se>

* in progress

Signed-off-by: Jim O'Regan <joregan@kth.se>

* country codes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy/paste errors abound

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put in a function rather than duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nominal digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add IP prompt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add google copyright notice; probably meaningless

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more work on telephone

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minor adaptation; more needed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace time verbaliser with version from sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt more

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nearly there

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace with version from sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some tweaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an IP test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a couple more ordinal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move variables

Signed-off-by: Jim O'Regan <joregan@kth.se>

* filter ordinals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic fraction tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* . and / both clash, so only make year optional if it is not deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* using the other word for two, that test cannot pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* numerator and denominator can compound; qdd minus

Signed-off-by: Jim O'Regan <joregan@kth.se>

* form fractionals in ordinal, because something about bare_ordinals does not work when exported

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test, including spaces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* works in the repl, not in reality

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy fraction symbols from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy two lines from es to handle faction symbols

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test for that

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* extend

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ah, I was forgetting to delete preserve order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add pieces from swedish itn, adapted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a function to give from/to minutes for 15/30/45 subdivision

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add functions, but some pieces came from ITN, so are backwards

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, should change the quarter word to a cardinal, or something

Signed-off-by: Jim O'Regan <joregan@kth.se>

* swapping order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more swapping

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an example

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change some things

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some things fixed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more adjustments to time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more todo, but working for this subset

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* missing endings

Signed-off-by: Jim O'Regan <joregan@kth.se>

* sort|uniq

Signed-off-by: Jim O'Regan <joregan@kth.se>

* timezone can be inflected too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add sparrowhark test (todo)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test_cases_word (copy from sv)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some word cases with Hungarian accents

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Hungarian to Jenkinsfile. This may cause much distress and wailing and gnashing of teeth

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix the commented ITN part

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases for the last two parts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix measure cardinals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* a couple more tests, last still not working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missed removing preserver_order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* codeql

Signed-off-by: Jim O'Regan <joregan@kth.se>

* codeql

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment the variables I may wish to use later (codeql)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix decimals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* incorporate feedback from @Laszlo-Weber

Signed-off-by: Jim O'Regan <joregan@kth.se>

* bare minimum tests + fix verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add öre (also for NOK)

Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>

* Comment line, for now

Signed-off-by: Jim O’Regan <joregan@kth.se>

* try breaking this into pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* revert 0c6823e111a876495702d347cf7b347106388ed4

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a bug in cardinal graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* at no point is 000 being deleted; probably why the tests are weird

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert a0d031a861fcd7b5750027f2887f3344f39b6616

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add more spaced alternatives to the non-deterministic cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add the hyphen before or-ing with 000

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change money handling to keep sparrowhawk happy

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add [be]os_or_space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try just rewriting the offending pieces to see if they are coming from here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try just rewriting the offending pieces to see if they are coming from here"

This reverts commit bc06b1162703354fe7bd5efaff7f58ed981d81c0.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Ok... seems to not be happening here either

Revert "try here"

This reverts commit 801c5f1c28d234c8b47a1d6f52f662b909fbb1c2.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try moving a test to see if that makes any difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try duplicating to see if it fails twice

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Ok, fails both times

Revert "try duplicating to see if it fails twice"

This reverts commit 908cddc7b3453fb2deaaa201881a304c853a746a.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 1 fails in some places, 2 in others, so add 2 here and see if that also fails

Signed-off-by: Jim O'Regan <joregan@kth.se>

* see if this makes a difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* It does not

Revert "see if this makes a difference"

This reverts commit dacc61281c4efbfd2d5ce1e91386cbd234392d28.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite regex to silence deprecation warning

Signed-off-by: Jim O'Regan <joregan@kth.se>

* REVERTME: change to see what is happening

Signed-off-by: Jim O'Regan <joregan@kth.se>

* that missing bracket cannot have been good

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no difference, try just deleting leading zero

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move that thing, merge some lines

Signed-off-by: Jim O'Regan <joregan@kth.se>

* at least it fails quickly

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export original

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move things around for no real reason

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add in the clean_cardinal from the tutorial

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "add in the clean_cardinal from the tutorial"

This reverts commit 4f06c885a0bfe1acc183c3560d88c4e2e76574ac.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try this again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* pretty sure this should work. As should the other

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment the ugly kludges to make them easier to remove. They do not work anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, try here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "rewrite regex to silence deprecation warning"

This reverts commit b8a923db57d27c5b3353be4b85bac9efb6e2d220.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "REVERTME: change to see what is happening"

This reverts commit c73e4ef384ec29a2b5877dac9d4fe617a5c681b6.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export unfiltered version of cardinal graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change the variable names

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of duplicate input print

Signed-off-by: Jim O'Regan <joregan@kth.se>

* BUGHUNT: check if string has been escaped

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changing variable, because I am getting tired of looking at that overly long name

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try deleting the normaliser to see if that makes any difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "BUGHUNT: check if string has been escaped"

This reverts commit 70f83241d47b0c73fa41e395eee193cc1685e056.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try deleting the normaliser to see if that makes any difference"

This reverts commit 78f4ded93375308dadb9b5e247f030da2efbecb5.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* moving globals into __init__ fixes the problem

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_sparrowhawk_normalization.sh

Signed-off-by: Jim O’Regan <joregan@kth.se>

* prompt: is not part of the ontology sparrowhawk recognises

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* these two now conflict

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rearrange slightly

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update telephone.py

remove unused import

Signed-off-by: Jim O’Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Es bugfix (#59)

* improve shortest path for decimals and currency

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix sh tn test files for telephone

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* replace non-breaking space

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* improve ambiguous test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* refine weights for decimal

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* improve testing when there are multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* revert ES TN for measures with mixed fractions

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix formatting

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add comment for testing multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Store input_case in Normalizer (#65)

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish telephone fix (#60)

* port fix for telephone from swedish-itn branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend cardinal in non-deterministic mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitespace fixes
…
ankitnv pushed a commit to ankitnv/NeMo-text-processing that referenced this pull request Oct 24, 2024
* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* temporal changes will change back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update jp tn date

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving conflict

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing ci test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updats on Jenkins

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jenkinspdate

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* changing the data format, to align to the blind test data

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding one more test item

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing ci test cases
resolving conflicts
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and NEMO_SPACES_AND_ALHPANUMERICS

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixed typo on decimaltext

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing unsed improts

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unused import

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* changed regular space to narrow space

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* imports error fixing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* imports errors

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Jekins update for jp itn

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* reverting

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issuel chaing narrow space to regular normal space

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixing style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixng style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed imports

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jp tn date update

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Update test_cases_fraction.txt

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* removing previously created nemo imports

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* test order arrangement

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolve fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update jp tn

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed import

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Update post_processing.py

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* empty file

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* to delete

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add contributing (#21)

* add contributing

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* add

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add jenkins file (#23)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish TN (#12)

* test now runs, but getting ordinal instead of cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* force ordinals to either have :a/:e or "." at the end

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add minimal ordinal data

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for ordinals, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add // to symbols

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test cases for electronic; transformed with sed from spanish, so I expect errors

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for electronic, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move to graph_utils

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal test case for fractions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for fraction, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix language

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix graph construction to make pluralisation work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for decimal, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for whitelist, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal test case for whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for word, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for date, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for measure, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a pair of test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix plurals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix number, but this whole thing is only partially adapted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix some test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add usd$

Signed-off-by: Jim O'Regan <joregan@kth.se>

* insert "komma"

Signed-off-by: Jim O'Regan <joregan@kth.se>

* "pund" is neuter

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* towards proper graphs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* GBP

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make komma non-det

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more money tagger fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more minor words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* do a bit better with en/ett

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use the correct list

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make sure the numbers have no 1

Signed-off-by: Jim O'Regan <joregan@kth.se>

* abbreviations for million and milliard

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add year suffixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add minimal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* expansions of era abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use eras

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use eras in verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix examples in comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix extension

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* date verbaliser is broken, this does not fix it

Signed-off-by: Jim O'Regan <joregan@kth.se>

* load labels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* right first time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missing space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix year in test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* getting closer to getting dates working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a (failing) test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* date working now

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also handle decades

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove todo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* years where -00 is -hundra

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for telephone (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changes to telephone tagger/verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add partially incomplete test data

Signed-off-by: Jim O'Regan <joregan@kth.se>

* mostly fixed test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more in progress changes to telephone parts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* much prodding later, turns out I forgot a space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missed wrapping

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "no difference"

This reverts commit 29680925bebd65d489f3b1a5415607c12bb7e3b9.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* telephone tagger works

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try adding brackets

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try adding more brackets

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a comment, because I confused myself

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add in abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a version for fraction that does what I intended: 2 & 3 digit numbers without leading 0 are read as cardinals, everything else as digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* single digit

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test case/remove a duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use the nice variable I just added to cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* this is not right; leading zeros fail

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "this is not right; leading zeros fail"

This reverts commit 5997e95e0cb08ffee9cf21a9c82697ed7beb042f.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, this seems to work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* drop the tests starting with comma

Signed-off-by: Jim O'Regan <joregan@kth.se>

* decimal tagger works

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* lower case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add klockan and variants as a prompt, so they are not silently deleted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a very minimal test case for time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite with less ambiguity

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite with less ambiguity, hms

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add prompt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy the roman handling from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* greek letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some fixes to the time tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for time (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for time (adapted from es) ((actually adapted))

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more work on time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* |=, not =

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt verbaliser a little

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some test cases from module comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export some variables to check

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment some stuff that needs major changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try doing this here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try doing this here"

This reverts commit ebdba0e3da5cdde19eae24f268ee0edd21f298b5.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix errors in tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minimal test cases for measure

Signed-off-by: Jim O'Regan <joregan@kth.se>

* sort|uniq everything, see what the difference is

Signed-off-by: Jim O'Regan <joregan@kth.se>

* merge different tsvs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix casing to avoid conflicts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export some variables for testing

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* need an en/ett split here too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix decimal subgraph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove todo, I've just done it

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove missing integer test, does not work elsewhere

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* include greek letters in maths

Signed-off-by: Jim O'Regan <joregan@kth.se>

* include greek here too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minor sg/pl

Signed-off-by: Jim O'Regan <joregan@kth.se>

* dedup

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put these under if, too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no; there are no minor neuters, so that is not relevant here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove greek from here, interferes with delimeter

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export variables to see what is happening

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix some test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* here is one error

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* put ensure_space in graph_utils

Signed-off-by: Jim O'Regan <joregan@kth.se>

* handle cases where unit follows amount

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export a variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a tesst case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* r
F438
emove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* . is not a cardinal separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add yen

Signed-off-by: Jim O'Regan <joregan@kth.se>

* final fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove English roman tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add tokenize_and_classify_lm.py (adapted from en)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove some unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add tokenize_and_classify_with_audio.py (adapted from en)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test pieces for audio (recopied from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add audio test (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* in non-deterministic mode, generate both en and ett

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal non-deterministic test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <joregan@kth.se>

* warnings about missing whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* some Riksdag specific titles

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add my copyright to the other files with non-trivial changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix year

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Swedish support in pynini_export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add Swedish support for sparrowhark tests -- untested (:

Signed-off-by: Jim O'Regan <joregan@kth.se>

* address codeql comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change decade to year; sparrowhawk enforces categories

Signed-off-by: Jim O'Regan <joregan@kth.se>

* shoehorn this stuff into the overly narrow sparrowhawk classes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "shoehorn this stuff into the overly narrow sparrowhawk classes"

This reverts commit a3cf3d5de1702366b2bf9c12ebf7e5d26634c688.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* read out the AM/PM words, they are not read as letters anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change date verbaliser to manage isolated decades

Signed-off-by: Jim O'Regan <joregan@kth.se>

* redo changes to get rid of 'prompt' for 'klockan'

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove broken duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a case for hours without minutes (which should not happen)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* time tests now pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a time test case that also passes here, but not in sparrowhawk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix error in dates, add more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* import delete_preserve_order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeql feedback

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some ambiguous abbreviations for non-deterministic mode (more as a demonstration than anything deeply useful)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move to the correct subdirectory

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add swedish

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove pynini checks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix error with 1000 in non-deterministic cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix here also

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also generate a string of digits if not deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a date case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove duplication

Signed-off-by: Jim O'Regan <joregan@kth.se>

* boost n_tagged

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also copyright this year

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 1500 only fixes one, boost again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 2500 does nothing, going to -1

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway"

This reverts commit 383a096083061b0c79457e815a65e55563c7ac74.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try setting a low weight to everything non-default

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put n_tagged back to 500

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* days of the week

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add more abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* setting weights did not work for cardinals, but it did push the test from taking 11 minutes to something more than 40. Re-reverting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove blank line

Signed-off-by: Jim O'Regan <joregan@kth.se>

* forgot to remove this piece in the merge conflict

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove erroneously added copyright notice

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add __init__.py in a few places it was missing

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add the google notice required by the incoming contributing document

Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CI setup (#25)

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci _cr

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* revert setup tool

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove pytest-runner from setup.py

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Merge EN riva release 22.10 (#26)

* Merge EN riva release 22.10

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Eng TN - update urls to handle dictionary words (#27)

* wip el words

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip el words

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* electronic pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* test pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove unused imports

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add deterministic option normalized options

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins grammar folder

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up, update for SH

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* reduce cardinal graph

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add weight for sh

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Tn en astronomical no (#28)

* Add support for large numbers (>999,999,999,999,999)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache folder in Jenkinsfile

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Increase mem size for CI tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Updating shmem for docker to deal with memory overflow

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Ensure large au cardinal graph is used only if deterministic

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Make comma mandatory in cardinals

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Run FST cache generation and Pytests in separate stages

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix stage

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Change cache folder

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add whitelist param to ITN (#30)

* add whitelist param to itn

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add whitelist to export

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update docstrings

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Eng tn itn (#31)

* Add additional units and plurals

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add support for financial periods (1H22, 2Q19)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add missing plural for "gigabit per second"

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix for measures

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Use environment variables to set path of fst cache

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix environment variable

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Fix parse "None" as string (#33)

* Fix parse "None" as string

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* read double digits for telephone grammar (#32)

* read double digits for telephone grammar

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* import zero graph instead of hard coding

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>

---------

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Install (#35)

* remove conda pynini install

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added pynini install note

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Install (#36)

* remove conda pynini install

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added pynini install note

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.6rc0 (#37)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add ci (#39)

* Add additional languages to CI Pipeline

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix Jenkinsfile

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add missing 'ar' in lang options

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix missing 'ar' in normalize.py

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Correct name of verbalizer far

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Run language tests in stages

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update DE cache folder

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add VI, RU, SV CI tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix misssing bracket, add ZH

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Use non-deterministic TN for RU

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* support the use of phonetic superscript letters for ordinals, because there are maniacs on the internet who think that because you can, you should (#41)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update fr cache path for ci (#44)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update ITN to work after Punctuation capitalization model (#22)

* add cases with capitalization, cardinal, decimal pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix telephone, ordinal

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update electronic

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* review feedback, update whitelist

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* rename capitalize func

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix SH tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins folder name

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* added cased arg to ITN

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add input_case arg to other lang

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dirs update

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix codeql errors

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix sh

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* review

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix default value

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* En names (#42)

* Add support for Financial year and for  years between 1000 BC and 1000AD

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add support for product names and add abbreviations to whitelist

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add weights for some sequences, exclude 'a' before numeric sequence

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache folder for EN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update FR Cache path

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Move text to TSV files, and some code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add additional vocabulary, allow singular usage of units to support adjective phrases

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix issue with whitelist loader not handling weights correctly
Move cased loader file to graph_utils

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* insert space between value and unit

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Insert space between measurement and unit. Adjust weight for ordinal

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update doc and fix alignment for itn (#47)

* save

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* save

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* extend alignment for itn

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Align ci test (#51)

* added jenkins tests for aligment

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added test to pr doc

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Audio-based TN for Swedish (#49)

* Audio-based TN for Swedish, for Språkbanken Tal

Replaces #48

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updating cache directory

(Not entirely sure what the pattern is)

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Delete tokenize_and_classify_lm.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* fraction fix from ITN branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix sv tests (#52)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.7 release (#53)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* En names (#56)

* Rename "period" tag to "text" tag for date to avoid changes to sparrowhawk proto

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update Jenkinsfile

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix bug for hh:mm:ss normalization (#57)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* rewrite regex to silence deprecation warning (#55)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Hungarian TN ✅ (#9)

* additional exports from cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add inflection for quantities

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* enable decimal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change integer

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to verbaliser for decimal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add superessive forms (powers of)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* superscript to superessive

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add vowels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add vowels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix var

Signed-off-by: Jim O'Regan <joregan@kth.se>

* bare minimum electronic test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a symbol

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add incomplete time tagger (partially adapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix error with some inflected abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some alternative measure forms

Signed-off-by: Jim O'Regan <joregan@kth.se>

* hour, minute, second; whichever is last can be inflected

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add test runner for time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal time test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* will want cardinal here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add inflection for things like GBP, where inflection is based on pé

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* docstring

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move two letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add my copyright

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partially adapted number tagger (adapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add unadapted measure tagger (from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* other ways of reading w

Signed-off-by: Jim O'Regan <joregan@kth.se>

* for non deterministic, a bunch of these symbols can be read as letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* currency

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more inflection

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get the abbreviation expanded as letters for non-deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* working now, add a comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also integer, and preserve order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also accept the full words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* deduplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* reorder to make a bit more sense

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* explicitly make tuples elsewhere; this works from what I see of the function output, but not in the resulting fst

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* commenting out weighted part makes this work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* duplicate space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partially adapted money verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* actually saving the adaptations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add time_zone data (copy from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete commented code, irrelevant here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small modifications, still thinking about how to tackle this

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix missing tabs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove pynini checks from tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for measure (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for telephone (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for time (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix cache dir

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tagger for telephone (copy from sv)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add basic tests (native verified)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add components for read digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an example with a different separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* start adapting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add 2-digit area codes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Bp to area codes, no need to be that specific

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export var

Signed-off-by: Jim O'Regan <joregan@kth.se>

* in progress

Signed-off-by: Jim O'Regan <joregan@kth.se>

* country codes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy/paste errors abound

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put in a function rather than duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nominal digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add IP prompt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add google copyright notice; probably meaningless

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more work on telephone

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minor adaptation; more needed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace time verbaliser with version from sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt more

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nearly there

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace with version from sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some tweaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an IP test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a couple more ordinal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move variables

Signed-off-by: Jim O'Regan <joregan@kth.se>

* filter ordinals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic fraction tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* . and / both clash, so only make year optional if it is not deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* using the other word for two, that test cannot pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* numerator and denominator can compound; qdd minus

Signed-off-by: Jim O'Regan <joregan@kth.se>

* form fractionals in ordinal, because something about bare_ordinals does not work when exported

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test, including spaces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* works in the repl, not in reality

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy fraction symbols from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy two lines from es to handle faction symbols

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test for that

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* extend

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ah, I was forgetting to delete preserve order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add pieces from swedish itn, adapted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a function to give from/to minutes for 15/30/45 subdivision

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add functions, but some pieces came from ITN, so are backwards

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, should change the quarter word to a cardinal, or something

Signed-off-by: Jim O'Regan <joregan@kth.se>

* swapping order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more swapping

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an example

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change some things

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some things fixed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more adjustments to time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more todo, but working for this subset

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* missing endings

Signed-off-by: Jim O'Regan <joregan@kth.se>

* sort|uniq

Signed-off-by: Jim O'Regan <joregan@kth.se>

* timezone can be inflected too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add sparrowhark test (todo)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test_cases_word (copy from sv)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some word cases with Hungarian accents

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Hungarian to Jenkinsfile. This may cause much distress and wailing and gnashing of teeth

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix the commented ITN part

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases for the last two parts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix measure cardinals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* a couple more tests, last still not working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missed removing preserver_order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* codeql

Signed-off-by: Jim O'Regan <joregan@kth.se>

* codeql

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment the variables I may wish to use later (codeql)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix decimals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* incorporate feedback from @Laszlo-Weber

Signed-off-by: Jim O'Regan <joregan@kth.se>

* bare minimum tests + fix verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add öre (also for NOK)

Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>

* Comment line, for now

Signed-off-by: Jim O’Regan <joregan@kth.se>

* try breaking this into pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* revert 0c6823e111a876495702d347cf7b347106388ed4

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a bug in cardinal graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* at no point is 000 being deleted; probably why the tests are weird

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert a0d031a861fcd7b5750027f2887f3344f39b6616

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add more spaced alternatives to the non-deterministic cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add the hyphen before or-ing with 000

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change money handling to keep sparrowhawk happy

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add [be]os_or_space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try just rewriting the offending pieces to see if they are coming from here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try just rewriting the offending pieces to see if they are coming from here"

This reverts commit bc06b1162703354fe7bd5efaff7f58ed981d81c0.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Ok... seems to not be happening here either

Revert "try here"

This reverts commit 801c5f1c28d234c8b47a1d6f52f662b909fbb1c2.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try moving a test to see if that makes any difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try duplicating to see if it fails twice

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Ok, fails both times

Revert "try duplicating to see if it fails twice"

This reverts commit 908cddc7b3453fb2deaaa201881a304c853a746a.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 1 fails in some places, 2 in others, so add 2 here and see if that also fails

Signed-off-by: Jim O'Regan <joregan@kth.se>

* see if this makes a difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* It does not

Revert "see if this makes a difference"

This reverts commit dacc61281c4efbfd2d5ce1e91386cbd234392d28.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite regex to silence deprecation warning

Signed-off-by: Jim O'Regan <joregan@kth.se>

* REVERTME: change to see what is happening

Signed-off-by: Jim O'Regan <joregan@kth.se>

* that missing bracket cannot have been good

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no difference, try just deleting leading zero

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move that thing, merge some lines

Signed-off-by: Jim O'Regan <joregan@kth.se>

* at least it fails quickly

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export original

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move things around for no real reason

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add in the clean_cardinal from the tutorial

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "add in the clean_cardinal from the tutorial"

This reverts commit 4f06c885a0bfe1acc183c3560d88c4e2e76574ac.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try this again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* pretty sure this should work. As should the other

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment the ugly kludges to make them easier to remove. They do not work anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, try here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "rewrite regex to silence deprecation warning"

This reverts commit b8a923db57d27c5b3353be4b85bac9efb6e2d220.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "REVERTME: change to see what is happening"

This reverts commit c73e4ef384ec29a2b5877dac9d4fe617a5c681b6.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export unfiltered version of cardinal graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change the variable names

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of duplicate input print

Signed-off-by: Jim O'Regan <joregan@kth.se>

* BUGHUNT: check if string has been escaped

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changing variable, because I am getting tired of looking at that overly long name

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try deleting the normaliser to see if that makes any difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "BUGHUNT: check if string has been escaped"

This reverts commit 70f83241d47b0c73fa41e395eee193cc1685e056.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try deleting the normaliser to see if that makes any difference"

This reverts commit 78f4ded93375308dadb9b5e247f030da2efbecb5.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* moving globals into __init__ fixes the problem

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_sparrowhawk_normalization.sh

Signed-off-by: Jim O’Regan <joregan@kth.se>

* prompt: is not part of the ontology sparrowhawk recognises

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* these two now conflict

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rearrange slightly

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update telephone.py

remove unused import

Signed-off-by: Jim O’Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Es bugfix (#59)

* improve shortest path for decimals and currency

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix sh tn test files for telephone

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* replace non-breaking space

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* improve ambiguous test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* refine weights for decimal

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* improve testing when there are multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* revert ES TN for measures with mixed fractions

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix formatting

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add comment for testing multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Store input_case in Normalizer (#65)

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish telephone fix (#60)

* port fix for telephone from swedish-itn branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend cardinal in non-deterministic mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitespace fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also fix in the verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update Jenkinsfile

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* log instead of print in graph_utils.py (#68)

Signed-off-by: Enno Hermann <enno.hermann@idiap.ch>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CER estimation speedup for audio-based text normalization (#73)

* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add measure coverage for TN and ITN (#62)

* add measure coverage for TN and ITN

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update measure.py

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63)

* upload es-ES and fr-FR g2p dicts

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add inits

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add NALA Spanish dict

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* rename Spanish and French dictionaries

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add Italian dictionary

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.no…
ankitnv pushed a commit to ankitnv/NeMo-text-processing that referenced this pull request Oct 28, 2024
* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* temporal changes will change back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update jp tn date

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving conflict

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing ci test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updats on Jenkins

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jenkinspdate

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* changing the data format, to align to the blind test data

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding one more test item

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing ci test cases
resolving conflicts
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and NEMO_SPACES_AND_ALHPANUMERICS

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixed typo on decimaltext

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing unsed improts

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unused import

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* changed regular space to narrow space

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* imports error fixing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* imports errors

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Jekins update for jp itn

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* reverting

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issuel chaing narrow space to regular normal space

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixing style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixng style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed imports

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jp tn date update

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Update test_cases_fraction.txt

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* removing previously created nemo imports

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* test order arrangement

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolve fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update jp tn

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed import

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Update post_processing.py

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* empty file

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* to delete

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add contributing (#21)

* add contributing

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* add

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add jenkins file (#23)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish TN (#12)

* test now runs, but getting ordinal instead of cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* force ordinals to either have :a/:e or "." at the end

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add minimal ordinal data

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for ordinals, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add // to symbols

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test cases for electronic; transformed with sed from spanish, so I expect errors

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for electronic, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move to graph_utils

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal test case for fractions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for fraction, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix language

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix graph construction to make pluralisation work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for decimal, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for whitelist, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal test case for whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for word, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for date, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for measure, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a pair of test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix plurals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix number, but this whole thing is only partially adapted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix some test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add usd$

Signed-off-by: Jim O'Regan <joregan@kth.se>

* insert "komma"

Signed-off-by: Jim O'Regan <joregan@kth.se>

* "pund" is neuter

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* towards proper graphs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* GBP

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make komma non-det

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more money tagger fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more minor words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* do a bit better with en/ett

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use the correct list

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make sure the numbers have no 1

Signed-off-by: Jim O'Regan <joregan@kth.se>

* abbreviations for million and milliard

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add year suffixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add minimal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* expansions of era abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use eras

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use eras in verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix examples in comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix extension

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* date verbaliser is broken, this does not fix it

Signed-off-by: Jim O'Regan <joregan@kth.se>

* load labels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* right first time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missing space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix year in test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* getting closer to getting dates working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a (failing) test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* date working now

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also handle decades

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove todo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* years where -00 is -hundra

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for telephone (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changes to telephone tagger/verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add partially incomplete test data

Signed-off-by: Jim O'Regan <joregan@kth.se>

* mostly fixed test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more in progress changes to telephone parts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* much prodding later, turns out I forgot a space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missed wrapping

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "no difference"

This reverts commit 29680925bebd65d489f3b1a5415607c12bb7e3b9.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* telephone tagger works

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try adding brackets

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try adding more brackets

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a comment, because I confused myself

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add in abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a version for fraction that does what I intended: 2 & 3 digit numbers without leading 0 are read as cardinals, everything else as digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* single digit

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test case/remove a duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use the nice variable I just added to cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* this is not right; leading zeros fail

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "this is not right; leading zeros fail"

This reverts commit 5997e95e0cb08ffee9cf21a9c82697ed7beb042f.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, this seems to work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* drop the tests starting with comma

Signed-off-by: Jim O'Regan <joregan@kth.se>

* decimal tagger works

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* lower case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add klockan and variants as a prompt, so they are not silently deleted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a very minimal test case for time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite with less ambiguity

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite with less ambiguity, hms

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add prompt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy the roman handling from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* greek letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some fixes to the time tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for time (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for time (adapted from es) ((actually adapted))

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more work on time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* |=, not =

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt verbaliser a little

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some test cases from module comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export some variables to check

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment some stuff that needs major changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try doing this here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try doing this here"

This reverts commit ebdba0e3da5cdde19eae24f268ee0edd21f298b5.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix errors in tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minimal test cases for measure

Signed-off-by: Jim O'Regan <joregan@kth.se>

* sort|uniq everything, see what the difference is

Signed-off-by: Jim O'Regan <joregan@kth.se>

* merge different tsvs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix casing to avoid conflicts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export some variables for testing

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* need an en/ett split here too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix decimal subgraph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove todo, I've just done it

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove missing integer test, does not work elsewhere

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* include greek letters in maths

Signed-off-by: Jim O'Regan <joregan@kth.se>

* include greek here too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minor sg/pl

Signed-off-by: Jim O'Regan <joregan@kth.se>

* dedup

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put these under if, too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no; there are no minor neuters, so that is not relevant here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove greek from here, interferes with delimeter

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export variables to see what is happening

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix some test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* here is one error

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* put ensure_space in graph_utils

Signed-off-by: Jim O'Regan <joregan@kth.se>

* handle cases where unit follows amount

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export a variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a tesst case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* . is not a cardinal separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add yen

Signed-off-by: Jim O'Regan <joregan@kth.se>

* final fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove English roman tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add tokenize_and_classify_lm.py (adapted from en)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove some unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add tokenize_and_classify_with_audio.py (adapted from en)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test pieces for audio (recopied from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add audio test (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* in non-deterministic mode, generate both en and ett

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal non-deterministic test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <joregan@kth.se>

* warnings about missing whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* some Riksdag specific titles

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add my copyright to the other files with non-trivial changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix year

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Swedish support in pynini_export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add Swedish support for sparrowhark tests -- untested (:

Signed-off-by: Jim O'Regan <joregan@kth.se>

* address codeql comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change decade to year; sparrowhawk enforces categories

Signed-off-by: Jim O'Regan <joregan@kth.se>

* shoehorn this stuff into the overly narrow sparrowhawk classes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "shoehorn this stuff into the overly narrow sparrowhawk classes"

This reverts commit a3cf3d5de1702366b2bf9c12ebf7e5d26634c688.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* read out the AM/PM words, they are not read as letters anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change date verbaliser to manage isolated decades

Signed-off-by: Jim O'Regan <joregan@kth.se>

* redo changes to get rid of 'prompt' for 'klockan'

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove broken duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a case for hours without minutes (which should not happen)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* time tests now pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a time test case that also passes here, but not in sparrowhawk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix error in dates, add more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* import delete_preserve_order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeql feedback

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some ambiguous abbreviations for non-deterministic mode (more as a demonstration than anything deeply useful)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move to the correct subdirectory

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add swedish

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove pynini checks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix error with 1000 in non-deterministic cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix here also

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also generate a string of digits if not deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a date case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove duplication

Signed-off-by: Jim O'Regan <joregan@kth.se>

* boost n_tagged

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also copyright this year

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 1500 only fixes one, boost again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 2500 does nothing, going to -1

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway"

This reverts commit 383a096083061b0c79457e815a65e55563c7ac74.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try setting a low weight to everything non-default

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put n_tagged back to 500

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* days of the week

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add more abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* setting weights did not work for cardinals, but it did push the test from taking 11 minutes to something more than 40. Re-reverting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove blank line

Signed-off-by: Jim O'Regan <joregan@kth.se>

* forgot to remove this piece in the merge conflict

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove erroneously added copyright notice

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add __init__.py in a few places it was missing

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add the google notice required by the incoming contributing document

Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CI setup (#25)

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci _cr

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* revert setup tool

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove pytest-runner from setup.py

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Merge EN riva release 22.10 (#26)

* Merge EN riva release 22.10

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Eng TN - update urls to handle dictionary words (#27)

* wip el words

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip el words

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* electronic pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* test pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove unused imports

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add deterministic option normalized options

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins grammar folder

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up, update for SH

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* reduce cardinal graph

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add weight for sh

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Tn en astronomical no (#28)

* Add support for large numbers (>999,999,999,999,999)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache folder in Jenkinsfile

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Increase mem size for CI tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Updating shmem for docker to deal with memory overflow

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Ensure large au cardinal graph is used only if deterministic

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Make comma mandatory in cardinals

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Run FST cache generation and Pytests in separate stages

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix stage

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Change cache folder

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add whitelist param to ITN (#30)

* add whitelist param to itn

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add whitelist to export

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update docstrings

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Eng tn itn (#31)

* Add additional units and plurals

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add support for financial periods (1H22, 2Q19)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add missing plural for "gigabit per second"

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix for measures

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Use environment variables to set path of fst cache

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix environment variable

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Fix parse "None" as string (#33)

* Fix parse "None" as string

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* read double digits for telephone grammar (#32)

* read double digits for telephone grammar

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* import zero graph instead of hard coding

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>

---------

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Install (#35)

* remove conda pynini install

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added pynini install note

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Install (#36)

* remove conda pynini install

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added pynini install note

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.6rc0 (#37)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add ci (#39)

* Add additional languages to CI Pipeline

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix Jenkinsfile

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add missing 'ar' in lang options

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix missing 'ar' in normalize.py

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Correct name of verbalizer far

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Run language tests in stages

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update DE cache folder

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add VI, RU, SV CI tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix misssing bracket, add ZH

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Use non-deterministic TN for RU

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* support the use of phonetic superscript letters for ordinals, because there are maniacs on the internet who think that because you can, you should (#41)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update fr cache path for ci (#44)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update ITN to work after Punctuation capitalization model (#22)

* add cases with capitalization, cardinal, decimal pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix telephone, ordinal

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update electronic

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* review feedback, update whitelist

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* rename capitalize func

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix SH tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins folder name

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* added cased arg to ITN

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add input_case arg to other lang

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dirs update

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix codeql errors

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix sh

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* review

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix default value

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* En names (#42)

* Add support for Financial year and for  years between 1000 BC and 1000AD

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add support for product names and add abbreviations to whitelist

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add weights for some sequences, exclude 'a' before numeric sequence

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache folder for EN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update FR Cache path

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Move text to TSV files, and some code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add additional vocabulary, allow singular usage of units to support adjective phrases

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix issue with whitelist loader not handling weights correctly
Move cased loader file to graph_utils

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* insert space between value and unit

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Insert space between measurement and unit. Adjust weight for ordinal

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update doc and fix alignment for itn (#47)

* save

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* save

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* extend alignment for itn

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Align ci test (#51)

* added jenkins tests for aligment

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added test to pr doc

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Audio-based TN for Swedish (#49)

* Audio-based TN for Swedish, for Språkbanken Tal

Replaces #48

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updating cache directory

(Not entirely sure what the pattern is)

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Delete tokenize_and_classify_lm.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* fraction fix from ITN branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix sv tests (#52)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.7 release (#53)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* En names (#56)

* Rename "period" tag to "text" tag for date to avoid changes to sparrowhawk proto

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update Jenkinsfile

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix bug for hh:mm:ss normalization (#57)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* rewrite regex to silence deprecation warning (#55)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Hungarian TN ✅ (#9)

* additional exports from cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add inflection for quantities

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* enable decimal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change integer

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to verbaliser for decimal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add superessive forms (powers of)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* superscript to superessive

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add vowels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add vowels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix var

Signed-off-by: Jim O'Regan <joregan@kth.se>

* bare minimum electronic test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a symbol

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add incomplete time tagger (partially adapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix error with some inflected abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some alternative measure forms

Signed-off-by: Jim O'Regan <joregan@kth.se>

* hour, minute, second; whichever is last can be inflected

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add test runner for time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal time test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* will want cardinal here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add inflection for things like GBP, where inflection is based on pé

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* docstring

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move two letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add my copyright

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partially adapted number tagger (adapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add unadapted measure tagger (from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* other ways of reading w

Signed-off-by: Jim O'Regan <joregan@kth.se>

* for non deterministic, a bunch of these symbols can be read as letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* currency

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more inflection

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get the abbreviation expanded as letters for non-deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* working now, add a comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also integer, and preserve order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also accept the full words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* deduplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* reorder to make a bit more sense

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* explicitly make tuples elsewhere; this works from what I see of the function output, but not in the resulting fst

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* commenting out weighted part makes this work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* duplicate space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partially adapted money verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* actually saving the adaptations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add time_zone data (copy from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete commented code, irrelevant here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small modifications, still thinking about how to tackle this

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix missing tabs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove pynini checks from tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for measure (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for telephone (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for time (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix cache dir

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tagger for telephone (copy from sv)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add basic tests (native verified)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add components for read digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an example with a different separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* start adapting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add 2-digit area codes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Bp to area codes, no need to be that specific

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export var

Signed-off-by: Jim O'Regan <joregan@kth.se>

* in progress

Signed-off-by: Jim O'Regan <joregan@kth.se>

* country codes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy/paste errors abound

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put in a function rather than duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nominal digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add IP prompt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add google copyright notice; probably meaningless

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more work on telephone

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minor adaptation; more needed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace time verbaliser with version from sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt more

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nearly there

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace with version from sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some tweaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an IP test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a couple more ordinal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move variables

Signed-off-by: Jim O'Regan <joregan@kth.se>

* filter ordinals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic fraction tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* . and / both clash, so only make year optional if it is not deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* using the other word for two, that test cannot pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* numerator and denominator can compound; qdd minus

Signed-off-by: Jim O'Regan <joregan@kth.se>

* form fractionals in ordinal, because something about bare_ordinals does not work when exported

Signed-
F438
off-by: Jim O'Regan <joregan@kth.se>

* add another test, including spaces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* works in the repl, not in reality

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy fraction symbols from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy two lines from es to handle faction symbols

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test for that

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* extend

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ah, I was forgetting to delete preserve order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add pieces from swedish itn, adapted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a function to give from/to minutes for 15/30/45 subdivision

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add functions, but some pieces came from ITN, so are backwards

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, should change the quarter word to a cardinal, or something

Signed-off-by: Jim O'Regan <joregan@kth.se>

* swapping order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more swapping

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an example

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change some things

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some things fixed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more adjustments to time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more todo, but working for this subset

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* missing endings

Signed-off-by: Jim O'Regan <joregan@kth.se>

* sort|uniq

Signed-off-by: Jim O'Regan <joregan@kth.se>

* timezone can be inflected too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add sparrowhark test (todo)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test_cases_word (copy from sv)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some word cases with Hungarian accents

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Hungarian to Jenkinsfile. This may cause much distress and wailing and gnashing of teeth

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix the commented ITN part

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases for the last two parts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix measure cardinals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* a couple more tests, last still not working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missed removing preserver_order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* codeql

Signed-off-by: Jim O'Regan <joregan@kth.se>

* codeql

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment the variables I may wish to use later (codeql)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix decimals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* incorporate feedback from @Laszlo-Weber

Signed-off-by: Jim O'Regan <joregan@kth.se>

* bare minimum tests + fix verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add öre (also for NOK)

Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>

* Comment line, for now

Signed-off-by: Jim O’Regan <joregan@kth.se>

* try breaking this into pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* revert 0c6823e111a876495702d347cf7b347106388ed4

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a bug in cardinal graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* at no point is 000 being deleted; probably why the tests are weird

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert a0d031a861fcd7b5750027f2887f3344f39b6616

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add more spaced alternatives to the non-deterministic cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add the hyphen before or-ing with 000

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change money handling to keep sparrowhawk happy

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add [be]os_or_space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try just rewriting the offending pieces to see if they are coming from here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try just rewriting the offending pieces to see if they are coming from here"

This reverts commit bc06b1162703354fe7bd5efaff7f58ed981d81c0.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Ok... seems to not be happening here either

Revert "try here"

This reverts commit 801c5f1c28d234c8b47a1d6f52f662b909fbb1c2.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try moving a test to see if that makes any difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try duplicating to see if it fails twice

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Ok, fails both times

Revert "try duplicating to see if it fails twice"

This reverts commit 908cddc7b3453fb2deaaa201881a304c853a746a.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 1 fails in some places, 2 in others, so add 2 here and see if that also fails

Signed-off-by: Jim O'Regan <joregan@kth.se>

* see if this makes a difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* It does not

Revert "see if this makes a difference"

This reverts commit dacc61281c4efbfd2d5ce1e91386cbd234392d28.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite regex to silence deprecation warning

Signed-off-by: Jim O'Regan <joregan@kth.se>

* REVERTME: change to see what is happening

Signed-off-by: Jim O'Regan <joregan@kth.se>

* that missing bracket cannot have been good

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no difference, try just deleting leading zero

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move that thing, merge some lines

Signed-off-by: Jim O'Regan <joregan@kth.se>

* at least it fails quickly

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export original

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move things around for no real reason

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add in the clean_cardinal from the tutorial

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "add in the clean_cardinal from the tutorial"

This reverts commit 4f06c885a0bfe1acc183c3560d88c4e2e76574ac.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try this again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* pretty sure this should work. As should the other

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment the ugly kludges to make them easier to remove. They do not work anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, try here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "rewrite regex to silence deprecation warning"

This reverts commit b8a923db57d27c5b3353be4b85bac9efb6e2d220.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "REVERTME: change to see what is happening"

This reverts commit c73e4ef384ec29a2b5877dac9d4fe617a5c681b6.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export unfiltered version of cardinal graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change the variable names

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of duplicate input print

Signed-off-by: Jim O'Regan <joregan@kth.se>

* BUGHUNT: check if string has been escaped

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changing variable, because I am getting tired of looking at that overly long name

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try deleting the normaliser to see if that makes any difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "BUGHUNT: check if string has been escaped"

This reverts commit 70f83241d47b0c73fa41e395eee193cc1685e056.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try deleting the normaliser to see if that makes any difference"

This reverts commit 78f4ded93375308dadb9b5e247f030da2efbecb5.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* moving globals into __init__ fixes the problem

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_sparrowhawk_normalization.sh

Signed-off-by: Jim O’Regan <joregan@kth.se>

* prompt: is not part of the ontology sparrowhawk recognises

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* these two now conflict

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rearrange slightly

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update telephone.py

remove unused import

Signed-off-by: Jim O’Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Es bugfix (#59)

* improve shortest path for decimals and currency

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix sh tn test files for telephone

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* replace non-breaking space

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* improve ambiguous test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* refine weights for decimal

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* improve testing when there are multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* revert ES TN for measures with mixed fractions

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix formatting

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add comment for testing multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Store input_case in Normalizer (#65)

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish telephone fix (#60)

* port fix for telephone from swedish-itn branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend cardinal in non-deterministic mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitespace fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also fix in the verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update Jenkinsfile

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* log instead of print in graph_utils.py (#68)

Signed-off-by: Enno Hermann <enno.hermann@idiap.ch>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CER estimation speedup for audio-based text normalization (#73)

* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add measure coverage for TN and ITN (#62)

* add measure coverage for TN and ITN

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update measure.py

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63)

* upload es-ES and fr-FR g2p dicts

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add inits

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add NALA Spanish dict

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* rename Spanish and French dictionaries

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add Italian dictionary

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.no…
ankitnv pushed a commit to ankitnv/NeMo-text-processing that referenced this pull request Oct 28, 2024
* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* temporal changes will change back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update jp tn date

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving conflict

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing ci test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updats on Jenkins

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jenkinspdate

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* changing the data format, to align to the blind test data

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding one more test item

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing ci test cases
resolving conflicts
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and NEMO_SPACES_AND_ALHPANUMERICS

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixed typo on decimaltext

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing unsed improts

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unused import

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* changed regular space to narrow space

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* imports error fixing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* imports errors

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Jekins update for jp itn

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* reverting

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issuel chaing narrow space to regular normal space

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixing style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixng style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed imports

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jp tn date update

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Update test_cases_fraction.txt

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* removing previously created nemo imports

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* test order arrangement

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolve fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update jp tn

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed import

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Update post_processing.py

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* empty file

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* to delete

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add contributing (#21)

* add contributing

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* add

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add jenkins file (#23)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish TN (#12)

* test now runs, but getting ordinal instead of cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* force ordinals to either have :a/:e or "." at the end

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add minimal ordinal data

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for ordinals, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add // to symbols

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test cases for electronic; transformed with sed from spanish, so I expect errors

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for electronic, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move to graph_utils

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal test case for fractions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for fraction, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix language

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix graph construction to make pluralisation work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for decimal, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for whitelist, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal test case for whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for word, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for date, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for measure, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a pair of test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix plurals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix number, but this whole thing is only partially adapted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix some test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add usd$

Signed-off-by: Jim O'Regan <joregan@kth.se>

* insert "komma"

Signed-off-by: Jim O'Regan <joregan@kth.se>

* "pund" is neuter

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* towards proper graphs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* GBP

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make komma non-det

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more money tagger fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more minor words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* do a bit better with en/ett

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use the correct list

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make sure the numbers have no 1

Signed-off-by: Jim O'Regan <joregan@kth.se>

* abbreviations for million and milliard

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add year suffixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add minimal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* expansions of era abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use eras

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use eras in verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix examples in comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix extension

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* date verbaliser is broken, this does not fix it

Signed-off-by: Jim O'Regan <joregan@kth.se>

* load labels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* right first time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missing space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix year in test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* getting closer to getting dates working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a (failing) test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* date working now

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also handle decades

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove todo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* years where -00 is -hundra

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for telephone (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changes to telephone tagger/verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add partially incomplete test data

Signed-off-by: Jim O'Regan <joregan@kth.se>

* mostly fixed test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more in progress changes to telephone parts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* much prodding later, turns out I forgot a space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missed wrapping

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "no difference"

This reverts commit 29680925bebd65d489f3b1a5415607c12bb7e3b9.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* telephone tagger works

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try adding brackets

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try adding more brackets

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a comment, because I confused myself

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add in abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a version for fraction that does what I intended: 2 & 3 digit numbers without leading 0 are read as cardinals, everything else as digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* single digit

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test case/remove a duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use the nice variable I just added to cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* this is not right; leading zeros fail

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "this is not right; leading zeros fail"

This reverts commit 5997e95e0cb08ffee9cf21a9c82697ed7beb042f.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, this seems to work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* drop the tests starting with comma

Signed-off-by: Jim O'Regan <joregan@kth.se>

* decimal tagger works

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* lower case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add klockan and variants as a prompt, so they are not silently deleted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a very minimal test case for time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite with less ambiguity

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite with less ambiguity, hms

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add prompt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy the roman handling from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* greek letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some fixes to the time tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for time (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for time (adapted from es) ((actually adapted))

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more work on time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* |=, not =

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt verbaliser a little

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some test cases from module comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export some variables to check

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment some stuff that needs major changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try doing this here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try doing this here"

This reverts commit ebdba0e3da5cdde19eae24f268ee0edd21f298b5.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix errors in tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minimal test cases for measure

Signed-off-by: Jim O'Regan <joregan@kth.se>

* sort|uniq everything, see what the difference is

Signed-off-by: Jim O'Regan <joregan@kth.se>

* merge different tsvs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix casing to avoid conflicts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export some variables for testing

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* need an en/ett split here too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix decimal subgraph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove todo, I've just done it

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove missing integer test, does not work elsewhere

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* include greek letters in maths

Signed-off-by: Jim O'Regan <joregan@kth.se>

* include greek here too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minor sg/pl

Signed-off-by: Jim O'Regan <joregan@kth.se>

* dedup

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put these under if, too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no; there are no minor neuters, so that is not relevant here

Signed-off-
10000
by: Jim O'Regan <joregan@kth.se>

* remove greek from here, interferes with delimeter

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export variables to see what is happening

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix some test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* here is one error

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* put ensure_space in graph_utils

Signed-off-by: Jim O'Regan <joregan@kth.se>

* handle cases where unit follows amount

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export a variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a tesst case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* . is not a cardinal separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add yen

Signed-off-by: Jim O'Regan <joregan@kth.se>

* final fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove English roman tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add tokenize_and_classify_lm.py (adapted from en)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove some unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add tokenize_and_classify_with_audio.py (adapted from en)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test pieces for audio (recopied from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add audio test (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* in non-deterministic mode, generate both en and ett

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal non-deterministic test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <joregan@kth.se>

* warnings about missing whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* some Riksdag specific titles

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add my copyright to the other files with non-trivial changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix year

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Swedish support in pynini_export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add Swedish support for sparrowhark tests -- untested (:

Signed-off-by: Jim O'Regan <joregan@kth.se>

* address codeql comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change decade to year; sparrowhawk enforces categories

Signed-off-by: Jim O'Regan <joregan@kth.se>

* shoehorn this stuff into the overly narrow sparrowhawk classes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "shoehorn this stuff into the overly narrow sparrowhawk classes"

This reverts commit a3cf3d5de1702366b2bf9c12ebf7e5d26634c688.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* read out the AM/PM words, they are not read as letters anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change date verbaliser to manage isolated decades

Signed-off-by: Jim O'Regan <joregan@kth.se>

* redo changes to get rid of 'prompt' for 'klockan'

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove broken duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a case for hours without minutes (which should not happen)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* time tests now pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a time test case that also passes here, but not in sparrowhawk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix error in dates, add more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* import delete_preserve_order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeql feedback

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some ambiguous abbreviations for non-deterministic mode (more as a demonstration than anything deeply useful)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move to the correct subdirectory

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add swedish

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove pynini checks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix error with 1000 in non-deterministic cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix here also

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also generate a string of digits if not deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a date case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove duplication

Signed-off-by: Jim O'Regan <joregan@kth.se>

* boost n_tagged

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also copyright this year

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 1500 only fixes one, boost again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 2500 does nothing, going to -1

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway"

This reverts commit 383a096083061b0c79457e815a65e55563c7ac74.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try setting a low weight to everything non-default

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put n_tagged back to 500

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* days of the week

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add more abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* setting weights did not work for cardinals, but it did push the test from taking 11 minutes to something more than 40. Re-reverting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove blank line

Signed-off-by: Jim O'Regan <joregan@kth.se>

* forgot to remove this piece in the merge conflict

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove erroneously added copyright notice

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add __init__.py in a few places it was missing

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add the google notice required by the incoming contributing document

Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CI setup (#25)

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci _cr

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* revert setup tool

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove pytest-runner from setup.py

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Merge EN riva release 22.10 (#26)

* Merge EN riva release 22.10

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Eng TN - update urls to handle dictionary words (#27)

* wip el words

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip el words

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* electronic pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* test pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove unused imports

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add deterministic option normalized options

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins grammar folder

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up, update for SH

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* reduce cardinal graph

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add weight for sh

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Tn en astronomical no (#28)

* Add support for large numbers (>999,999,999,999,999)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache folder in Jenkinsfile

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Increase mem size for CI tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Updating shmem for docker to deal with memory overflow

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Ensure large au cardinal graph is used only if deterministic

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Make comma mandatory in cardinals

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Run FST cache generation and Pytests in separate stages

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix stage

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Change cache folder

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add whitelist param to ITN (#30)

* add whitelist param to itn

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add whitelist to export

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update docstrings

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Eng tn itn (#31)

* Add additional units and plurals

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add support for financial periods (1H22, 2Q19)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add missing plural for "gigabit per second"

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix for measures

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Use environment variables to set path of fst cache

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix environment variable

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Fix parse "None" as string (#33)

* Fix parse "None" as string

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* read double digits for telephone grammar (#32)

* read double digits for telephone grammar

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* import zero graph instead of hard coding

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>

---------

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Install (#35)

* remove conda pynini install

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added pynini install note

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Install (#36)

* remove conda pynini install

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added pynini install note

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.6rc0 (#37)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add ci (#39)

* Add additional languages to CI Pipeline

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix Jenkinsfile

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add missing 'ar' in lang options

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix missing 'ar' in normalize.py

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Correct name of verbalizer far

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Run language tests in stages

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update DE cache folder

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add VI, RU, SV CI tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix misssing bracket, add ZH

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Use non-deterministic TN for RU

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* support the use of phonetic superscript letters for ordinals, because there are maniacs on the internet who think that because you can, you should (#41)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update fr cache path for ci (#44)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update ITN to work after Punctuation capitalization model (#22)

* add cases with capitalization, cardinal, decimal pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix telephone, ordinal

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update electronic

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* review feedback, update whitelist

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* rename capitalize func

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix SH tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins folder name

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* added cased arg to ITN

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add input_case arg to other lang

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dirs update

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix codeql errors

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix sh

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* review

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix default value

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* En names (#42)

* Add support for Financial year and for  years between 1000 BC and 1000AD

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add support for product names and add abbreviations to whitelist

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add weights for some sequences, exclude 'a' before numeric sequence

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache folder for EN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update FR Cache path

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Move text to TSV files, and some code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add additional vocabulary, allow singular usage of units to support adjective phrases

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix issue with whitelist loader not handling weights correctly
Move cased loader file to graph_utils

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* insert space between value and unit

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Insert space between measurement and unit. Adjust weight for ordinal

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update doc and fix alignment for itn (#47)

* save

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* save

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* extend alignment for itn

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Align ci test (#51)

* added jenkins tests for aligment

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added test to pr doc

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Audio-based TN for Swedish (#49)

* Audio-based TN for Swedish, for Språkbanken Tal

Replaces #48

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updating cache directory

(Not entirely sure what the pattern is)

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Delete tokenize_and_classify_lm.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* fraction fix from ITN branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix sv tests (#52)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.7 release (#53)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* En names (#56)

* Rename "period" tag to "text" tag for date to avoid changes to sparrowhawk proto

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update Jenkinsfile

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix bug for hh:mm:ss normalization (#57)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* rewrite regex to silence deprecation warning (#55)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Hungarian TN ✅ (#9)

* additional exports from cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add inflection for quantities

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* enable decimal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change integer

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to verbaliser for decimal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add superessive forms (powers of)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* superscript to superessive

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add vowels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add vowels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix var

Signed-off-by: Jim O'Regan <joregan@kth.se>

* bare minimum electronic test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a symbol

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add incomplete time tagger (partially adapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix error with some inflected abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some alternative measure forms

Signed-off-by: Jim O'Regan <joregan@kth.se>

* hour, minute, second; whichever is last can be inflected

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add test runner for time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal time test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* will want cardinal here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add inflection for things like GBP, where inflection is based on pé

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* docstring

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move two letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add my copyright

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partially adapted number tagger (adapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add unadapted measure tagger (from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* other ways of reading w

Signed-off-by: Jim O'Regan <joregan@kth.se>

* for non deterministic, a bunch of these symbols can be read as letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* currency

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more inflection

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get the abbreviation expanded as letters for non-deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* working now, add a comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also integer, and preserve order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also accept the full words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* deduplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* reorder to make a bit more sense

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* explicitly make tuples elsewhere; this works from what I see of the function output, but not in the resulting fst

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* commenting out weighted part makes this work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* duplicate space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partially adapted money verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* actually saving the adaptations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add time_zone data (copy from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete commented code, irrelevant here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small modifications, still thinking about how to tackle this

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix missing tabs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove pynini checks from tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for measure (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for telephone (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for time (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix cache dir

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tagger for telephone (copy from sv)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add basic tests (native verified)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add components for read digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an example with a different separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* start adapting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add 2-digit area codes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Bp to area codes, no need to be that specific

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export var

Signed-off-by: Jim O'Regan <joregan@kth.se>

* in progress

Signed-off-by: Jim O'Regan <joregan@kth.se>

* country codes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy/paste errors abound

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put in a function rather than duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nominal digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add IP prompt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add google copyright notice; probably meaningless

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more work on telephone

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minor adaptation; more needed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace time verbaliser with version from sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt more

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nearly there

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace with version from sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some tweaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an IP test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a couple more ordinal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move variables

Signed-off-by: Jim O'Regan <joregan@kth.se>

* filter ordinals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic fraction tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* . and / both clash, so only make year optional if it is not deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* using the other word for two, that test cannot pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* numerator and denominator can compound; qdd minus

Signed-off-by: Jim O'Regan <joregan@kth.se>

* form fractionals in ordinal, because something about bare_ordinals does not work when exported

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test, including spaces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* works in the repl, not in reality

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy fraction symbols from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy two lines from es to handle faction symbols

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test for that

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* extend

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ah, I was forgetting to delete preserve order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add pieces from swedish itn, adapted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a function to give from/to minutes for 15/30/45 subdivision

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add functions, but some pieces came from ITN, so are backwards

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, should change the quarter word to a cardinal, or something

Signed-off-by: Jim O'Regan <joregan@kth.se>

* swapping order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more swapping

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an example

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change some things

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some things fixed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more adjustments to time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more todo, but working for this subset

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* missing endings

Signed-off-by: Jim O'Regan <joregan@kth.se>

* sort|uniq

Signed-off-by: Jim O'Regan <joregan@kth.se>

* timezone can be inflected too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add sparrowhark test (todo)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test_cases_word (copy from sv)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some word cases with Hungarian accents

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Hungarian to Jenkinsfile. This may cause much distress and wailing and gnashing of teeth

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix the commented ITN part

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases for the last two parts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix measure cardinals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* a couple more tests, last still not working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missed removing preserver_order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* codeql

Signed-off-by: Jim O'Regan <joregan@kth.se>

* codeql

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment the variables I may wish to use later (codeql)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix decimals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* incorporate feedback from @Laszlo-Weber

Signed-off-by: Jim O'Regan <joregan@kth.se>

* bare minimum tests + fix verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add öre (also for NOK)

Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>

* Comment line, for now

Signed-off-by: Jim O’Regan <joregan@kth.se>

* try breaking this into pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* revert 0c6823e111a876495702d347cf7b347106388ed4

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a bug in cardinal graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* at no point is 000 being deleted; probably why the tests are weird

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert a0d031a861fcd7b5750027f2887f3344f39b6616

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add more spaced alternatives to the non-deterministic cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add the hyphen before or-ing with 000

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change money handling to keep sparrowhawk happy

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add [be]os_or_space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try just rewriting the offending pieces to see if they are coming from here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try just rewriting the offending pieces to see if they are coming from here"

This reverts commit bc06b1162703354fe7bd5efaff7f58ed981d81c0.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Ok... seems to not be happening here either

Revert "try here"

This reverts commit 801c5f1c28d234c8b47a1d6f52f662b909fbb1c2.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try moving a test to see if that makes any difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try duplicating to see if it fails twice

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Ok, fails both times

Revert "try duplicating to see if it fails twice"

This reverts commit 908cddc7b3453fb2deaaa201881a304c853a746a.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 1 fails in some places, 2 in others, so add 2 here and see if that also fails

Signed-off-by: Jim O'Regan <joregan@kth.se>

* see if this makes a difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* It does not

Revert "see if this makes a difference"

This reverts commit dacc61281c4efbfd2d5ce1e91386cbd234392d28.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite regex to silence deprecation warning

Signed-off-by: Jim O'Regan <joregan@kth.se>

* REVERTME: change to see what is happening

Signed-off-by: Jim O'Regan <joregan@kth.se>

* that missing bracket cannot have been good

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no difference, try just deleting leading zero

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move that thing, merge some lines

Signed-off-by: Jim O'Regan <joregan@kth.se>

* at least it fails quickly

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export original

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move things around for no real reason

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add in the clean_cardinal from the tutorial

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "add in the clean_cardinal from the tutorial"

This reverts commit 4f06c885a0bfe1acc183c3560d88c4e2e76574ac.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try this again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* pretty sure this should work. As should the other

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment the ugly kludges to make them easier to remove. They do not work anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, try here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "rewrite regex to silence deprecation warning"

This reverts commit b8a923db57d27c5b3353be4b85bac9efb6e2d220.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "REVERTME: change to see what is happening"

This reverts commit c73e4ef384ec29a2b5877dac9d4fe617a5c681b6.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export unfiltered version of cardinal graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change the variable names

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of duplicate input print

Signed-off-by: Jim O'Regan <joregan@kth.se>

* BUGHUNT: check if string has been escaped

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changing variable, because I am getting tired of looking at that overly long name

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try deleting the normaliser to see if that makes any difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "BUGHUNT: check if string has been escaped"

This reverts commit 70f83241d47b0c73fa41e395eee193cc1685e056.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try deleting the normaliser to see if that makes any difference"

This reverts commit 78f4ded93375308dadb9b5e247f030da2efbecb5.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* moving globals into __init__ fixes the problem

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_sparrowhawk_normalization.sh

Signed-off-by: Jim O’Regan <joregan@kth.se>

* prompt: is not part of the ontology sparrowhawk recognises

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* these two now conflict

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rearrange slightly

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update telephone.py

remove unused import

Signed-off-by: Jim O’Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Es bugfix (#59)

* improve shortest path for decimals and currency

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix sh tn test files for telephone

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* replace non-breaking space

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* improve ambiguous test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* refine weights for decimal

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* improve testing when there are multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* revert ES TN for measures with mixed fractions

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix formatting

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add comment for testing multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Store input_case in Normalizer (#65)

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish telephone fix (#60)

* port fix for telephone from swedish-itn branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend cardinal in non-deterministic mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitespace fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also fix in the verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update Jenkinsfile

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* log instead of print in graph_utils.py (#68)

Signed-off-by: Enno Hermann <enno.hermann@idiap.ch>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CER estimation speedup for audio-based text normalization (#73)

* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add measure coverage for TN and ITN (#62)

* add measure coverage for TN and ITN

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update measure.py

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63)

* upload es-ES and fr-FR g2p dicts

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add inits

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add NALA Spanish dict

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* rename Spanish and French dictionaries

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add Italian dictionary

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.no…
ankitnv added a commit to ankitnv/NeMo-text-processing that referenced this pull request Oct 28, 2024
* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* temporal changes will change back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update jp tn date

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving conflict

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing ci test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updats on Jenkins

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jenkinspdate

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* changing the data format, to align to the blind test data

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding one more test item

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing ci test cases
resolving conflicts
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and NEMO_SPACES_AND_ALHPANUMERICS

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixed typo on decimaltext

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing unsed improts

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unused import

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* changed regular space to narrow space

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* imports error fixing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* imports errors

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Jekins update for jp itn

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* reverting

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issuel chaing narrow space to regular normal space

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixing style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixng style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed imports

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jp tn date update

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Update test_cases_fraction.txt

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* removing previously created nemo imports

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* test order arrangement

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolve fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update jp tn

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed import

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Update post_processing.py

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* empty file

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* to delete

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add contributing (#21)

* add contributing

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* add

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add jenkins file (#23)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish TN (#12)

* test now runs, but getting ordinal instead of cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* force ordinals to either have :a/:e or "." at the end

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add minimal ordinal data

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for ordinals, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add // to symbols

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test cases for electronic; transformed with sed from spanish, so I expect errors

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for electronic, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move to graph_utils

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal test case for fractions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for fraction, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix language

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix graph construction to make pluralisation work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for decimal, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for whitelist, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal test case for whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for word, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for date, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for measure, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a pair of test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix plurals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix number, but this whole thing is only partially adapted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix some test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add usd$

Signed-off-by: Jim O'Regan <joregan@kth.se>

* insert "komma"

Signed-off-by: Jim O'Regan <joregan@kth.se>

* "pund" is neuter

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* towards proper graphs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* GBP

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make komma non-det

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more money tagger fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more minor words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* do a bit better with en/ett

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use the correct list

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make sure the numbers have no 1

Signed-off-by: Jim O'Regan <joregan@kth.se>

* abbreviations for million and milliard

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add year suffixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add minimal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* expansions of era abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use eras

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use eras in verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix examples in comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix extension

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* date verbaliser is broken, this does not fix it

Signed-off-by: Jim O'Regan <joregan@kth.se>

* load labels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* right first time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missing space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix year in test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* getting closer to getting dates working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a (failing) test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* date working now

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also handle decades

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove todo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* years where -00 is -hundra

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for telephone (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changes to telephone tagger/verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add partially incomplete test data

Signed-off-by: Jim O'Regan <joregan@kth.se>

* mostly fixed test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more in progress changes to telephone parts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* much prodding later, turns out I forgot a space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missed wrapping

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "no difference"

This reverts commit 29680925bebd65d489f3b1a5415607c12bb7e3b9.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* telephone tagger works

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try adding brackets

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try adding more brackets

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a comment, because I confused myself

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add in abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a version for fraction that does what I intended: 2 & 3 digit numbers without leading 0 are read as cardinals, everything else as digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* single digit

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test case/remove a duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use the nice variable I just added to cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* this is not right; leading zeros fail

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "this is not right; leading zeros fail"

This reverts commit 5997e95e0cb08ffee9cf21a9c82697ed7beb042f.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, this seems to work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* drop the tests starting with comma

Signed-off-by: Jim O'Regan <joregan@kth.se>

* decimal tagger works

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* lower case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add klockan and variants as a prompt, so they are not silently deleted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a very minimal test case for time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite with less ambiguity

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite with less ambiguity, hms

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add prompt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy the roman handling from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* greek letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some fixes to the time tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for time (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for time (adapted from es) ((actually adapted))

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more work on time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* |=, not =

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt verbaliser a little

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some test cases from module comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export some variables to check

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment some stuff that needs major changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try doing this here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try doing this here"

This reverts commit ebdba0e3da5cdde19eae24f268ee0edd21f298b5.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix errors in tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minimal test cases for measure

Signed-off-by: Jim O'Regan <joregan@kth.se>

* sort|uniq everything, see what the difference is

Signed-off-by: Jim O'Regan <joregan@kth.se>

* merge different tsvs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix casing to avoid conflicts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export some variables for testing

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* need an en/ett split here too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix decimal subgraph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove todo, I've just done it

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove missing integer test, does not work elsewhere

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* include greek letters in maths

Signed-off-by: Jim O'Regan <joregan@kth.se>

* include greek here too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minor sg/pl

Signed-off-by: Jim O'Regan <joregan@kth.se>

* dedup

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put these under if, too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no; there are no minor neuters, so that is not relevant here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove greek from here, interferes with delimeter

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export variables to see what is happening

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix some test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* here is one error

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* put ensure_space in graph_utils

Signed-off-by: Jim O'Regan <joregan@kth.se>

* handle cases where unit follows amount

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export a variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a tesst case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* . is not a cardinal separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add yen

Signed-off-by: Jim O'Regan <joregan@kth.se>

* final fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove English roman tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add tokenize_and_classify_lm.py (adapted from en)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove some unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add tokenize_and_classify_with_audio.py (adapted from en)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test pieces for audio (recopied from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add audio test (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* in non-deterministic mode, generate both en and ett

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal non-deterministic test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <joregan@kth.se>

* warnings about missing whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* some Riksdag specific titles

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add my copyright to the other files with non-trivial changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix year

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Swedish support in pynini_export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add Swedish support for sparrowhark tests -- untested (:

Signed-off-by: Jim O'Regan <joregan@kth.se>

* address codeql comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change decade to year; sparrowhawk enforces categories

Signed-off-by: Jim O'Regan <joregan@kth.se>

* shoehorn this stuff into the overly narrow sparrowhawk classes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "shoehorn this stuff into the overly narrow sparrowhawk classes"

This reverts commit a3cf3d5de1702366b2bf9c12ebf7e5d26634c688.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* read out the AM/PM words, they are not read as letters anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change date verbaliser to manage isolated decades

Signed-off-by: Jim O'Regan <joregan@kth.se>

* redo changes to get rid of 'prompt' for 'klockan'

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove broken duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a case for hours without minutes (which should not happen)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* time tests now pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a time test case that also passes here, but not in sparrowhawk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix error in dates, add more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* import delete_preserve_order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeql feedback

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some ambiguous abbreviations for non-deterministic mode (more as a demonstration than anything deeply useful)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move to the correct subdirectory

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add swedish

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove pynini checks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix error with 1000 in non-deterministic cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix here also

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also generate a string of digits if not deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a date case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove duplication

Signed-off-by: Jim O'Regan <joregan@kth.se>

* boost n_tagged

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also copyright this year

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 1500 only fixes one, boost again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 2500 does nothing, going to -1

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway"

This reverts commit 383a096083061b0c79457e815a65e55563c7ac74.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try setting a low weight to everything non-default

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put n_tagged back to 500

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* days of the week

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add more abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* setting weights did not work for cardinals, but it did push the test from taking 11 minutes to something more than 40. Re-reverting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove blank line

Signed-off-by: Jim O'Regan <joregan@kth.se>

* forgot to remove this piece in the merge conflict

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove erroneously added copyright notice

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add __init__.py in a few places it was missing

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add the google notice required by the incoming contributing document

Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CI setup (#25)

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci _cr

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* revert setup tool

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove pytest-runner from setup.py

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Merge EN riva release 22.10 (#26)

* Merge EN riva release 22.10

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Eng TN - update urls to handle dictionary words (#27)

* wip el words

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip el words

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* electronic pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* test pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove unused imports

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add deterministic option normalized options

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins grammar folder

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up, update for SH

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* reduce cardinal graph

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add weight for sh

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Tn en astronomical no (#28)

* Add support for large numbers (>999,999,999,999,999)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache folder in Jenkinsfile

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Increase mem size for CI tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Updating shmem for docker to deal with memory overflow

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Ensure large au cardinal graph is used only if deterministic

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Make comma mandatory in cardinals

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Run FST cache generation and Pytests in separate stages

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix stage

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Change cache folder

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add whitelist param to ITN (#30)

* add whitelist param to itn

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add whitelist to export

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update docstrings

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Eng tn itn (#31)

* Add additional units and plurals

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add support for financial periods (1H22, 2Q19)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add missing plural for "gigabit per second"

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix for measures

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Use environment variables to set path of fst cache

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix environment variable

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Fix parse "None" as string (#33)

* Fix parse "None" as string

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* read double digits for telephone grammar (#32)

* read double digits for telephone grammar

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* import zero graph instead of hard coding

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>

---------

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Install (#35)

* remove conda pynini install

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added pynini install note

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Install (#36)

* remove conda pynini install

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added pynini install note

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.6rc0 (#37)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add ci (#39)

* Add additional languages to CI Pipeline

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix Jenkinsfile

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add missing 'ar' in lang options

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix missing 'ar' in normalize.py

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Correct name of verbalizer far

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Run language tests in stages

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update DE cache folder

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add VI, RU, SV CI tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix misssing bracket, add ZH

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Use non-deterministic TN for RU

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* support the use of phonetic superscript letters for ordinals, because there are maniacs on the internet who think that because you can, you should (#41)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update fr cache path for ci (#44)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update ITN to work after Punctuation capitalization model (#22)

* add cases with capitalization, cardinal, decimal pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix telephone, ordinal

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update electronic

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* review feedback, update whitelist

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* rename capitalize func

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix SH tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins folder name

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* added cased arg to ITN

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add input_case arg to other lang

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dirs update

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix codeql errors

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix sh

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* review

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix default value

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* En names (#42)

* Add support for Financial year and for  years between 1000 BC and 1000AD

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add support for product names and add abbreviations to whitelist

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add weights for some sequences, exclude 'a' before numeric sequence

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache folder for EN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update FR Cache path

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Move text to TSV files, and some code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add additional vocabulary, allow singular usage of units to support adjective phrases

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix issue with whitelist loader not handling weights correctly
Move cased loader file to graph_utils

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* insert space between value and unit

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Insert space between measurement and unit. Adjust weight for ordinal

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update doc and fix alignment for itn (#47)

* save

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* save

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* extend alignment for itn

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Align ci test (#51)

* added jenkins tests for aligment

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added test to pr doc

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Audio-based TN for Swedish (#49)

* Audio-based TN for Swedish, for Språkbanken Tal

Replaces #48

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updating cache directory

(Not entirely sure what the pattern is)

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Delete tokenize_and_classify_lm.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* fraction fix from ITN branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix sv tests (#52)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.7 release (#53)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* En names (#56)

* Rename "period" tag to "text" tag for date to avoid changes to sparrowhawk proto

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update Jenkinsfile

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix bug for hh:mm:ss normalization (#57)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* rewrite regex to silence deprecation warning (#55)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Hungarian TN ✅ (#9)

* additional exports from cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add inflection for quantities

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* enable decimal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change integer

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to verbaliser for decimal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add superessive forms (powers of)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* superscript to superessive

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add vowels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add vowels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix var

Signed-off-by: Jim O'Regan <joregan@kth.se>

* bare minimum electronic test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a symbol

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add incomplete time tagger (partially adapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix error with some inflected abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some alternative measure forms

Signed-off-by: Jim O'Regan <joregan@kth.se>

* hour, minute, second; whichever is last can be inflected

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add test runner for time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal time test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* will want cardinal here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add inflection for things like GBP, where inflection is based on pé

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* docstring

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move two letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add my copyright

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partially adapted number tagger (adapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add unadapted measure tagger (from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* other ways of reading w

Signed-off-by: Jim O'Regan <joregan@kth.se>

* for non deterministic, a bunch of these symbols can be read as letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* currency

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more inflection

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get the abbreviation expanded as letters for non-deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* working now, add a comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also integer, and preserve order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also accept the full words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* deduplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* reorder to make a bit more sense

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* explicitly make tuples elsewhere; this works from what I see of the function output, but not in the resulting fst

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* commenting out weighted part makes this work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* duplicate space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partially adapted money verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* actually saving the adaptations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add time_zone data (copy from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete commented code, irrelevant here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small modifications, still thinking about how to tackle this

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix missing tabs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove pynini checks from tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for measure (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for telephone (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for time (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix cache dir

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tagger for telephone (copy from sv)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add basic tests (native verified)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add components for read digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an example with a different separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* start adapting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add 2-digit area codes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Bp to area codes, no need to be that specific

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export var

Signed-off-by: Jim O'Regan <joregan@kth.se>

* in progress

Signed-off-by: Jim O'Regan <joregan@kth.se>

* country codes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy/paste errors abound

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put in a function rather than duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nominal digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add IP prompt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add google copyright notice; probably meaningless

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more work on telephone

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minor adaptation; more needed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace time verbaliser with version from sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt more

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nearly there

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace with version from sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some tweaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an IP test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a couple more ordinal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move variables

Signed-off-by: Jim O'Regan <joregan@kth.se>

* filter ordinals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic fraction tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* . and / both clash, so only make year optional if it is not deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>
10000
;

* using the other word for two, that test cannot pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* numerator and denominator can compound; qdd minus

Signed-off-by: Jim O'Regan <joregan@kth.se>

* form fractionals in ordinal, because something about bare_ordinals does not work when exported

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test, including spaces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* works in the repl, not in reality

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy fraction symbols from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy two lines from es to handle faction symbols

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test for that

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* extend

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ah, I was forgetting to delete preserve order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add pieces from swedish itn, adapted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a function to give from/to minutes for 15/30/45 subdivision

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add functions, but some pieces came from ITN, so are backwards

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, should change the quarter word to a cardinal, or something

Signed-off-by: Jim O'Regan <joregan@kth.se>

* swapping order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more swapping

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an example

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change some things

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some things fixed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more adjustments to time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more todo, but working for this subset

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* missing endings

Signed-off-by: Jim O'Regan <joregan@kth.se>

* sort|uniq

Signed-off-by: Jim O'Regan <joregan@kth.se>

* timezone can be inflected too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add sparrowhark test (todo)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test_cases_word (copy from sv)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some word cases with Hungarian accents

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Hungarian to Jenkinsfile. This may cause much distress and wailing and gnashing of teeth

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix the commented ITN part

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases for the last two parts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix measure cardinals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* a couple more tests, last still not working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missed removing preserver_order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* codeql

Signed-off-by: Jim O'Regan <joregan@kth.se>

* codeql

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment the variables I may wish to use later (codeql)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix decimals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* incorporate feedback from @Laszlo-Weber

Signed-off-by: Jim O'Regan <joregan@kth.se>

* bare minimum tests + fix verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add öre (also for NOK)

Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>

* Comment line, for now

Signed-off-by: Jim O’Regan <joregan@kth.se>

* try breaking this into pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* revert 0c6823e111a876495702d347cf7b347106388ed4

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a bug in cardinal graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* at no point is 000 being deleted; probably why the tests are weird

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert a0d031a861fcd7b5750027f2887f3344f39b6616

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add more spaced alternatives to the non-deterministic cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add the hyphen before or-ing with 000

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change money handling to keep sparrowhawk happy

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add [be]os_or_space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try just rewriting the offending pieces to see if they are coming from here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try just rewriting the offending pieces to see if they are coming from here"

This reverts commit bc06b1162703354fe7bd5efaff7f58ed981d81c0.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Ok... seems to not be happening here either

Revert "try here"

This reverts commit 801c5f1c28d234c8b47a1d6f52f662b909fbb1c2.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try moving a test to see if that makes any difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try duplicating to see if it fails twice

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Ok, fails both times

Revert "try duplicating to see if it fails twice"

This reverts commit 908cddc7b3453fb2deaaa201881a304c853a746a.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 1 fails in some places, 2 in others, so add 2 here and see if that also fails

Signed-off-by: Jim O'Regan <joregan@kth.se>

* see if this makes a difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* It does not

Revert "see if this makes a difference"

This reverts commit dacc61281c4efbfd2d5ce1e91386cbd234392d28.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite regex to silence deprecation warning

Signed-off-by: Jim O'Regan <joregan@kth.se>

* REVERTME: change to see what is happening

Signed-off-by: Jim O'Regan <joregan@kth.se>

* that missing bracket cannot have been good

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no difference, try just deleting leading zero

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move that thing, merge some lines

Signed-off-by: Jim O'Regan <joregan@kth.se>

* at least it fails quickly

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export original

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move things around for no real reason

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add in the clean_cardinal from the tutorial

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "add in the clean_cardinal from the tutorial"

This reverts commit 4f06c885a0bfe1acc183c3560d88c4e2e76574ac.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try this again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* pretty sure this should work. As should the other

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment the ugly kludges to make them easier to remove. They do not work anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, try here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "rewrite regex to silence deprecation warning"

This reverts commit b8a923db57d27c5b3353be4b85bac9efb6e2d220.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "REVERTME: change to see what is happening"

This reverts commit c73e4ef384ec29a2b5877dac9d4fe617a5c681b6.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export unfiltered version of cardinal graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change the variable names

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of duplicate input print

Signed-off-by: Jim O'Regan <joregan@kth.se>

* BUGHUNT: check if string has been escaped

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changing variable, because I am getting tired of looking at that overly long name

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try deleting the normaliser to see if that makes any difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "BUGHUNT: check if string has been escaped"

This reverts commit 70f83241d47b0c73fa41e395eee193cc1685e056.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try deleting the normaliser to see if that makes any difference"

This reverts commit 78f4ded93375308dadb9b5e247f030da2efbecb5.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* moving globals into __init__ fixes the problem

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_sparrowhawk_normalization.sh

Signed-off-by: Jim O’Regan <joregan@kth.se>

* prompt: is not part of the ontology sparrowhawk recognises

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* these two now conflict

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rearrange slightly

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update telephone.py

remove unused import

Signed-off-by: Jim O’Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Es bugfix (#59)

* improve shortest path for decimals and currency

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix sh tn test files for telephone

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* replace non-breaking space

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* improve ambiguous test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* refine weights for decimal

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* improve testing when there are multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* revert ES TN for measures with mixed fractions

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix formatting

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add comment for testing multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Store input_case in Normalizer (#65)

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish telephone fix (#60)

* port fix for telephone from swedish-itn branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend cardinal in non-deterministic mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitespace fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also fix in the verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update Jenkinsfile

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* log instead of print in graph_utils.py (#68)

Signed-off-by: Enno Hermann <enno.hermann@idiap.ch>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CER estimation speedup for audio-based text normalization (#73)

* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add measure coverage for TN and ITN (#62)

* add measure coverage for TN and ITN

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update measure.py

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63)

* upload es-ES and fr-FR g2p dicts

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add inits

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add NALA Spanish dict

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* rename Spanish and French dictionaries

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add Italian dictionary

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.no…
ankitnv pushed a commit to ankitnv/NeMo-text-processing that referenced this pull request Oct 28, 2024
* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* temporal changes will change back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update jp tn date

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving conflict

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing ci test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updats on Jenkins

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jenkinspdate

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* changing the data format, to align to the blind test data

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding one more test item

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing ci test cases
resolving conflicts
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and NEMO_SPACES_AND_ALHPANUMERICS

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixed typo on decimaltext

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing unsed improts

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unused import

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* changed regular space to narrow space

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* imports error fixing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* imports errors

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Jekins update for jp itn

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* reverting

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issuel chaing narrow space to regular normal space

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixing style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixng style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed imports

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jp tn date update

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Update test_cases_fraction.txt

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* removing previously created nemo imports

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* test order arrangement

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolve fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update jp tn

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed import

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Update post_processing.py

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* empty file

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* to delete

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add contributing (#21)

* add contributing

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* add

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add jenkins file (#23)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish TN (#12)

* test now runs, but getting ordinal instead of cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* force ordinals to either have :a/:e or "." at the end

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add minimal ordinal data

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for ordinals, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add // to symbols

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test cases for electronic; transformed with sed from spanish, so I expect errors

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for electronic, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move to graph_utils

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal test case for fractions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for fraction, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix language

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix graph construction to make pluralisation work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for decimal, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for whitelist, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal test case for whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for word, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for date, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for measure, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a pair of test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix plurals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix number, but this whole thing is only partially adapted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix some test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add usd$

Signed-off-by: Jim O'Regan <joregan@kth.se>

* insert "komma"

Signed-off-by: Jim O'Regan <joregan@kth.se>

* "pund" is neuter

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* towards proper graphs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* GBP

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make komma non-det

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more money tagger fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more minor words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* do a bit better with en/ett

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use the correct list

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make sure the numbers have no 1

Signed-off-by: Jim O'Regan <joregan@kth.se>

* abbreviations for million and milliard

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add year suffixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add minimal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* expansions of era abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use eras

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use eras in verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix examples in comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix extension

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* date verbaliser is broken, this does not fix it

Signed-off-by: Jim O'Regan <joregan@kth.se>

* load labels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* right first time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missing space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix year in test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* getting closer to getting dates working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a (failing) test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* date working now

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also handle decades

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove todo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* years where -00 is -hundra

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for telephone (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changes to telephone tagger/verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add partially incomplete test data

Signed-off-by: Jim O'Regan <joregan@kth.se>

* mostly fixed test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more in progress changes to telephone parts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* much prodding later, turns out I forgot a space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missed wrapping

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "no difference"

This reverts commit 29680925bebd65d489f3b1a5415607c12bb7e3b9.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* telephone tagger works

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try adding brackets

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try adding more brackets

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a comment, because I confused myself

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add in abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a version for fraction that does what I intended: 2 & 3 digit numbers without leading 0 are read as cardinals, everything else as digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* single digit

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test case/remove a duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use the nice variable I just added to cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* this is not right; leading zeros fail

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "this is not right; leading zeros fail"

This reverts commit 5997e95e0cb08ffee9cf21a9c82697ed7beb042f.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, this seems to work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* drop the tests starting with comma

Signed-off-by: Jim O'Regan <joregan@kth.se>

* decimal tagger works

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* lower case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add klockan and variants as a prompt, so they are not silently deleted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a very minimal test case for time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite with less ambiguity

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite with less ambiguity, hms

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add prompt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy the roman handling from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* greek letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some fixes to the time tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for time (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for time (adapted from es) ((actually adapted))

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more work on time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* |=, not =

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt verbaliser a little

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some test cases from module comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export some variables to check

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment some stuff that needs major changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try doing this here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try doing this here"

This reverts commit ebdba0e3da5cdde19eae24f268ee0edd21f298b5.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix errors in tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minimal test cases for measure

Signed-off-by: Jim O'Regan <joregan@kth.se>

* sort|uniq everything, see what the difference is

Signed-off-by: Jim O'Regan <joregan@kth.se>

* merge different tsvs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix casing to avoid conflicts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export some variables for testing

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* need an en/ett split here too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix decimal subgraph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove todo, I've just done it

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove missing integer test, does not work elsewhere

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* include greek letters in maths

Signed-off-by: Jim O'Regan <joregan@kth.se>

* include greek here too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minor sg/pl

Signed-off-by: Jim O'Regan <joregan@kth.se>

* dedup

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put these under if, too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no; there are no minor neuters, so that is not relevant here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove greek from here, interferes with delimeter

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export variables to see what is happening

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix some test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* here is one error

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* put ensure_space in graph_utils

Signed-off-by: Jim O'Regan <joregan@kth.se>

* handle cases where unit follows amount

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export a variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a tesst case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* . is not a cardinal separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add yen

Signed-off-by: Jim O'Regan <joregan@kth.se>

* final fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove English roman tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add tokenize_and_classify_lm.py (adapted from en)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove some unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add tokenize_and_classify_with_audio.py (adapted from en)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test pieces for audio (recopied from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add audio test (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* in non-deterministic mode, generate both en and ett

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal non-deterministic test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <joregan@kth.se>

* warnings about missing whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* some Riksdag specific titles

Signed-off-by: Jim O'Regan <joregan@k
10000
th.se>

* add my copyright to the other files with non-trivial changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix year

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Swedish support in pynini_export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add Swedish support for sparrowhark tests -- untested (:

Signed-off-by: Jim O'Regan <joregan@kth.se>

* address codeql comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change decade to year; sparrowhawk enforces categories

Signed-off-by: Jim O'Regan <joregan@kth.se>

* shoehorn this stuff into the overly narrow sparrowhawk classes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "shoehorn this stuff into the overly narrow sparrowhawk classes"

This reverts commit a3cf3d5de1702366b2bf9c12ebf7e5d26634c688.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* read out the AM/PM words, they are not read as letters anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change date verbaliser to manage isolated decades

Signed-off-by: Jim O'Regan <joregan@kth.se>

* redo changes to get rid of 'prompt' for 'klockan'

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove broken duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a case for hours without minutes (which should not happen)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* time tests now pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a time test case that also passes here, but not in sparrowhawk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix error in dates, add more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* import delete_preserve_order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeql feedback

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some ambiguous abbreviations for non-deterministic mode (more as a demonstration than anything deeply useful)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move to the correct subdirectory

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add swedish

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove pynini checks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix error with 1000 in non-deterministic cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix here also

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also generate a string of digits if not deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a date case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove duplication

Signed-off-by: Jim O'Regan <joregan@kth.se>

* boost n_tagged

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also copyright this year

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 1500 only fixes one, boost again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 2500 does nothing, going to -1

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway"

This reverts commit 383a096083061b0c79457e815a65e55563c7ac74.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try setting a low weight to everything non-default

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put n_tagged back to 500

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* days of the week

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add more abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* setting weights did not work for cardinals, but it did push the test from taking 11 minutes to something more than 40. Re-reverting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove blank line

Signed-off-by: Jim O'Regan <joregan@kth.se>

* forgot to remove this piece in the merge conflict

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove erroneously added copyright notice

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add __init__.py in a few places it was missing

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add the google notice required by the incoming contributing document

Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CI setup (#25)

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci _cr

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* revert setup tool

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove pytest-runner from setup.py

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Merge EN riva release 22.10 (#26)

* Merge EN riva release 22.10

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Eng TN - update urls to handle dictionary words (#27)

* wip el words

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip el words

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* electronic pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* test pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove unused imports

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add deterministic option normalized options

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins grammar folder

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up, update for SH

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* reduce cardinal graph

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add weight for sh

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Tn en astronomical no (#28)

* Add support for large numbers (>999,999,999,999,999)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache folder in Jenkinsfile

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Increase mem size for CI tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Updating shmem for docker to deal with memory overflow

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Ensure large au cardinal graph is used only if deterministic

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Make comma mandatory in cardinals

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Run FST cache generation and Pytests in separate stages

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix stage

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Change cache folder

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add whitelist param to ITN (#30)

* add whitelist param to itn

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add whitelist to export

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update docstrings

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Eng tn itn (#31)

* Add additional units and plurals

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add support for financial periods (1H22, 2Q19)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add missing plural for "gigabit per second"

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix for measures

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Use environment variables to set path of fst cache

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix environment variable

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Fix parse "None" as string (#33)

* Fix parse "None" as string

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* read double digits for telephone grammar (#32)

* read double digits for telephone grammar

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* import zero graph instead of hard coding

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>

---------

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Install (#35)

* remove conda pynini install

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added pynini install note

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Install (#36)

* remove conda pynini install

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added pynini install note

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.6rc0 (#37)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add ci (#39)

* Add additional languages to CI Pipeline

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix Jenkinsfile

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add missing 'ar' in lang options

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix missing 'ar' in normalize.py

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Correct name of verbalizer far

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Run language tests in stages

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update DE cache folder

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add VI, RU, SV CI tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix misssing bracket, add ZH

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Use non-deterministic TN for RU

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* support the use of phonetic superscript letters for ordinals, because there are maniacs on the internet who think that because you can, you should (#41)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update fr cache path for ci (#44)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update ITN to work after Punctuation capitalization model (#22)

* add cases with capitalization, cardinal, decimal pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix telephone, ordinal

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update electronic

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* review feedback, update whitelist

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* rename capitalize func

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix SH tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins folder name

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* added cased arg to ITN

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add input_case arg to other lang

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dirs update

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix codeql errors

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix sh

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* review

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix default value

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* En names (#42)

* Add support for Financial year and for  years between 1000 BC and 1000AD

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add support for product names and add abbreviations to whitelist

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add weights for some sequences, exclude 'a' before numeric sequence

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache folder for EN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update FR Cache path

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Move text to TSV files, and some code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add additional vocabulary, allow singular usage of units to support adjective phrases

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix issue with whitelist loader not handling weights correctly
Move cased loader file to graph_utils

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* insert space between value and unit

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Insert space between measurement and unit. Adjust weight for ordinal

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update doc and fix alignment for itn (#47)

* save

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* save

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* extend alignment for itn

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Align ci test (#51)

* added jenkins tests for aligment

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added test to pr doc

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Audio-based TN for Swedish (#49)

* Audio-based TN for Swedish, for Språkbanken Tal

Replaces #48

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updating cache directory

(Not entirely sure what the pattern is)

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Delete tokenize_and_classify_lm.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* fraction fix from ITN branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix sv tests (#52)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.7 release (#53)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* En names (#56)

* Rename "period" tag to "text" tag for date to avoid changes to sparrowhawk proto

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update Jenkinsfile

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix bug for hh:mm:ss normalization (#57)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* rewrite regex to silence deprecation warning (#55)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Hungarian TN ✅ (#9)

* additional exports from cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add inflection for quantities

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* enable decimal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change integer

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to verbaliser for decimal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add superessive forms (powers of)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* superscript to superessive

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add vowels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add vowels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix var

Signed-off-by: Jim O'Regan <joregan@kth.se>

* bare minimum electronic test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a symbol

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add incomplete time tagger (partially adapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix error with some inflected abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some alternative measure forms

Signed-off-by: Jim O'Regan <joregan@kth.se>

* hour, minute, second; whichever is last can be inflected

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add test runner for time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal time test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* will want cardinal here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add inflection for things like GBP, where inflection is based on pé

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* docstring

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move two letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add my copyright

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partially adapted number tagger (adapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add unadapted measure tagger (from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* other ways of reading w

Signed-off-by: Jim O'Regan <joregan@kth.se>

* for non deterministic, a bunch of these symbols can be read as letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* currency

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more inflection

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get the abbreviation expanded as letters for non-deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* working now, add a comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also integer, and preserve order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also accept the full words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* deduplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* reorder to make a bit more sense

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* explicitly make tuples elsewhere; this works from what I see of the function output, but not in the resulting fst

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* commenting out weighted part makes this work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* duplicate space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partially adapted money verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* actually saving the adaptations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add time_zone data (copy from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete commented code, irrelevant here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small modifications, still thinking about how to tackle this

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix missing tabs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove pynini checks from tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for measure (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for telephone (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for time (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix cache dir

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tagger for telephone (copy from sv)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add basic tests (native verified)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add components for read digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an example with a different separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* start adapting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add 2-digit area codes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Bp to area codes, no need to be that specific

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export var

Signed-off-by: Jim O'Regan <joregan@kth.se>

* in progress

Signed-off-by: Jim O'Regan <joregan@kth.se>

* country codes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy/paste errors abound

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put in a function rather than duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nominal digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add IP prompt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add google copyright notice; probably meaningless

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more work on telephone

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minor adaptation; more needed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace time verbaliser with version from sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt more

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nearly there

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace with version from sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some tweaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an IP test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a couple more ordinal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move variables

Signed-off-by: Jim O'Regan <joregan@kth.se>

* filter ordinals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic fraction tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* . and / both clash, so only make year optional if it is not deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* using the other word for two, that test cannot pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* numerator and denominator can compound; qdd minus

Signed-off-by: Jim O'Regan <joregan@kth.se>

* form fractionals in ordinal, because something about bare_ordinals does not work when exported

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test, including spaces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* works in the repl, not in reality

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy fraction symbols from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy two lines from es to handle faction symbols

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test for that

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* extend

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ah, I was forgetting to delete preserve order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add pieces from swedish itn, adapted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a function to give from/to minutes for 15/30/45 subdivision

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add functions, but some pieces came from ITN, so are backwards

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, should change the quarter word to a cardinal, or something

Signed-off-by: Jim O'Regan <joregan@kth.se>

* swapping order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more swapping

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an example

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change some things

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some things fixed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more adjustments to time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more todo, but working for this subset

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* missing endings

Signed-off-by: Jim O'Regan <joregan@kth.se>

* sort|uniq

Signed-off-by: Jim O'Regan <joregan@kth.se>

* timezone can be inflected too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add sparrowhark test (todo)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test_cases_word (copy from sv)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some word cases with Hungarian accents

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Hungarian to Jenkinsfile. This may cause much distress and wailing and gnashing of teeth

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix the commented ITN part

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases for the last two parts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix measure cardinals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* a couple more tests, last still not working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missed removing preserver_order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* codeql

Signed-off-by: Jim O'Regan <joregan@kth.se>

* codeql

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment the variables I may wish to use later (codeql)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix decimals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* incorporate feedback from @Laszlo-Weber

Signed-off-by: Jim O'Regan <joregan@kth.se>

* bare minimum tests + fix verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add öre (also for NOK)

Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>

* Comment line, for now

Signed-off-by: Jim O’Regan <joregan@kth.se>

* try breaking this into pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* revert 0c6823e111a876495702d347cf7b347106388ed4

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a bug in cardinal graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* at no point is 000 being deleted; probably why the tests are weird

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert a0d031a861fcd7b5750027f2887f3344f39b6616

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add more spaced alternatives to the non-deterministic cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add the hyphen before or-ing with 000

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change money handling to keep sparrowhawk happy

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add [be]os_or_space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try just rewriting the offending pieces to see if they are coming from here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try just rewriting the offending pieces to see if they are coming from here"

This reverts commit bc06b1162703354fe7bd5efaff7f58ed981d81c0.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Ok... seems to not be happening here either

Revert "try here"

This reverts commit 801c5f1c28d234c8b47a1d6f52f662b909fbb1c2.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try moving a test to see if that makes any difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try duplicating to see if it fails twice

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Ok, fails both times

Revert "try duplicating to see if it fails twice"

This reverts commit 908cddc7b3453fb2deaaa201881a304c853a746a.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 1 fails in some places, 2 in others, so add 2 here and see if that also fails

Signed-off-by: Jim O'Regan <joregan@kth.se>

* see if this makes a difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* It does not

Revert "see if this makes a difference"

This reverts commit dacc61281c4efbfd2d5ce1e91386cbd234392d28.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite regex to silence deprecation warning

Signed-off-by: Jim O'Regan <joregan@kth.se>

* REVERTME: change to see what is happening

Signed-off-by: Jim O'Regan <joregan@kth.se>

* that missing bracket cannot have been good

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no difference, try just deleting leading zero

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move that thing, merge some lines

Signed-off-by: Jim O'Regan <joregan@kth.se>

* at least it fails quickly

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export original

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move things around for no real reason

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add in the clean_cardinal from the tutorial

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "add in the clean_cardinal from the tutorial"

This reverts commit 4f06c885a0bfe1acc183c3560d88c4e2e76574ac.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try this again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* pretty sure this should work. As should the other

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment the ugly kludges to make them easier to remove. They do not work anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, try here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "rewrite regex to silence deprecation warning"

This reverts commit b8a923db57d27c5b3353be4b85bac9efb6e2d220.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "REVERTME: change to see what is happening"

This reverts commit c73e4ef384ec29a2b5877dac9d4fe617a5c681b6.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export unfiltered version of cardinal graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change the variable names

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of duplicate input print

Signed-off-by: Jim O'Regan <joregan@kth.se>

* BUGHUNT: check if string has been escaped

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changing variable, because I am getting tired of looking at that overly long name

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try deleting the normaliser to see if that makes any difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "BUGHUNT: check if string has been escaped"

This reverts commit 70f83241d47b0c73fa41e395eee193cc1685e056.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try deleting the normaliser to see if that makes any difference"

This reverts commit 78f4ded93375308dadb9b5e247f030da2efbecb5.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* moving globals into __init__ fixes the problem

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_sparrowhawk_normalization.sh

Signed-off-by: Jim O’Regan <joregan@kth.se>

* prompt: is not part of the ontology sparrowhawk recognises

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* these two now conflict

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rearrange slightly

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update telephone.py

remove unused import

Signed-off-by: Jim O’Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Es bugfix (#59)

* improve shortest path for decimals and currency

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix sh tn test files for telephone

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* replace non-breaking space

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* improve ambiguous test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* refine weights for decimal

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* improve testing when there are multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* revert ES TN for measures with mixed fractions

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix formatting

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add comment for testing multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Store input_case in Normalizer (#65)

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish telephone fix (#60)

* port fix for telephone from swedish-itn branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend cardinal in non-deterministic mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitespace fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also fix in the verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update Jenkinsfile

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* log instead of print in graph_utils.py (#68)

Signed-off-by: Enno Hermann <enno.hermann@idiap.ch>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CER estimation speedup for audio-based text normalization (#73)

* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add measure coverage for TN and ITN (#62)

* add measure coverage for TN and ITN

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update measure.py

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63)

* upload es-ES and fr-FR g2p dicts

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add inits

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add NALA Spanish dict

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* rename Spanish and French dictionaries

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add Italian dictionary

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.no…
ankitnv pushed a commit to ankitnv/NeMo-text-processing that referenced this pull request Oct 28, 2024
* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* temporal changes will change back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update jp tn date

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving conflict

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing ci test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updats on Jenkins

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jenkinspdate

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* changing the data format, to align to the blind test data

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding one more test item

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing ci test cases
resolving conflicts
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and NEMO_SPACES_AND_ALHPANUMERICS

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixed typo on decimaltext

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing unsed improts

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unused import

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* changed regular space to narrow space

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* imports error fixing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* imports errors

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Jekins update for jp itn

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* reverting

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issuel chaing narrow space to regular normal space

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixing style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixng style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed imports

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jp tn date update

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Update test_cases_fraction.txt

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* removing previously created nemo imports

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* test order arrangement

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolve fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update jp tn

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed import

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Update post_processing.py

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* empty file

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* to delete

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add contributing (#21)

* add contributing

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* add

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add jenkins file (#23)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish TN (#12)

* test now runs, but getting ordinal instead of cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* force ordinals to either have :a/:e or "." at the end

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add minimal ordinal data

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for ordinals, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add // to symbols

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test cases for electronic; transformed with sed from spanish, so I expect errors

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for electronic, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move to graph_utils

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal test case for fractions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for fraction, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix language

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix graph construction to make pluralisation work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for decimal, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for whitelist, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal test case for whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for word, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for date, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for measure, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a pair of test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix plurals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix number, but this whole thing is only partially adapted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix some test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add usd$

Signed-off-by: Jim O'Regan <joregan@kth.se>

* insert "komma"

Signed-off-by: Jim O'Regan <joregan@kth.se>

* "pund" is neuter

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* towards proper graphs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* GBP

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make komma non-det

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more money tagger fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more minor words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* do a bit better with en/ett

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use the correct list

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make sure the numbers have no 1

Signed-off-by: Jim O'Regan <joregan@kth.se>

* abbreviations for million and milliard

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add year suffixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add minimal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* expansions of era abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use eras

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use eras in verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix examples in comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix extension

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* date verbaliser is broken, this does not fix it

Signed-off-by: Jim O'Regan <joregan@kth.se>

* load labels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* right first time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missing space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix year in test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* getting closer to getting dates working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a (failing) test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* date working now

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also handle decades

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove todo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* years where -00 is -hundra

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for telephone (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changes to telephone tagger/verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add partially incomplete test data

Signed-off-by: Jim O'Regan <joregan@kth.se>

* mostly fixed test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more in progress changes to telephone parts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* much prodding later, turns out I forgot a space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missed wrapping

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "no difference"

This reverts commit 29680925bebd65d489f3b1a5415607c12bb7e3b9.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* telephone tagger works

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try adding brackets

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try adding more brackets

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a comment, because I confused myself

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add in abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a version for fraction that does what I intended: 2 & 3 digit numbers without leading 0 are read as cardinals, everything else as digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* single digit

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test case/remove a duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use the nice variable I just added to cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* this is not right; leading zeros fail

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "this is not right; leading zeros fail"

This reverts commit 5997e95e0cb08ffee9cf21a9c82697ed7beb042f.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, this seems to work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* drop the tests starting with comma

Signed-off-by: Jim O'Regan <joregan@kth.se>

* decimal tagger works

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* lower case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add klockan and variants as a prompt, so they are not silently deleted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a very minimal test case for time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite with less ambiguity

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite with less ambiguity, hms

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add prompt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy the roman handling from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* greek letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some fixes to the time tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for time (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for time (adapted from es) ((actually adapted))

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more work on time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* |=, not =

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt verbaliser a little

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some test cases from module comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export some variables to check

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment some stuff that needs major changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try doing this here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try doing this here"

This reverts commit ebdba0e3da5cdde19eae24f268ee0edd21f298b5.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix errors in tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minimal test cases for measure

Signed-off-by: Jim O'Regan <joregan@kth.se>

* sort|uniq everything, see what the difference is

Signed-off-by: Jim O'Regan <joregan@kth.se>

* merge different tsvs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix casing to avoid conflicts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export some variables for testing

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* need an en/ett split here too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix decimal subgraph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove todo, I've just done it

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove missing integer test, does not work elsewhere

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* include greek letters in maths

Signed-off-by: Jim O'Regan <joregan@kth.se>

* include greek here too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minor sg/pl

Signed-off-by: Jim O'Regan <joregan@kth.se>

* dedup

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put these under if, too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no; there are no minor neuters, so that is not relevant here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove greek from here, interferes with delimeter

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export variables to see what is happening

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix some test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* here is one error

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* put ensure_space in graph_utils

Signed-off-by: Jim O'Regan <joregan@kth.se>

* handle cases where unit follows amount

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export a variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a tesst case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* . is not a cardinal separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add yen

Signed-off-by: Jim O'Regan <joregan@kth.se>

* final fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove English roman tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add tokenize_and_classify_lm.py (adapted from en)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove some unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add tokenize_and_classify_with_audio.py (adapted from en)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test pieces for audio (recopied from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add audio test (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* in non-deterministic mode, generate both en and ett

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal non-deterministic test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <joregan@kth.se>

* warnings about missing whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* some Riksdag specific titles

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add my copyright to the other files with non-trivial changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix year

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Swedish support in pynini_export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add Swedish support for sparrowhark tests -- untested (:

Signed-off-by: Jim O'Regan <joregan@kth.se>

* address codeql comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change decade to year; sparrowhawk enforces categories

Signed-off-by: Jim O'Regan <joregan@kth.se>

* shoehorn this stuff into the overly narrow sparrowhawk classes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "shoehorn this stuff into the overly narrow sparrowhawk classes"

This reverts commit a3cf3d5de1702366b2bf9c12ebf7e5d26634c688.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* read out the AM/PM words, they are not read as letters anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change date verbaliser to manage isolated decades

Signed-off-by: Jim O'Regan <joregan@kth.se>

* redo changes to get rid of 'prompt' for 'klockan'

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove broken duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a case for hours without minutes (which should not happen)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* time tests now pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a time test case that also passes here, but not in sparrowhawk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix error in dates, add more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* import delete_preserve_order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeql feedback

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some ambiguous abbreviations for non-deterministic mode (more as a demonstration than anything deeply useful)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move to the correct subdirectory

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add swedish

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove pynini checks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix error with 1000 in non-deterministic cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix here also

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also generate a string of digits if not deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a date case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove duplication

Signed-off-by: Jim O'Regan <joregan@kth.se>

* boost n_tagged

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also copyright this year

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 1500 only fixes one, boost again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 2500 does nothing, going to -1

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway"

This reverts commit 383a096083061b0c79457e815a65e55563c7ac74.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try setting a low weight to everything non-default

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put n_tagged back to 500

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* days of the week

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add more abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* setting weights did not work for cardinals, but it did push the test from taking 11 minutes to something more than 40. Re-reverting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove blank line

Signed-off-by: Jim O'Regan <joregan@kth.se>

* forgot to remove this piece in the merge conflict

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove erroneously added copyright notice

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add __init__.py in a few places it was missing

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add the google notice required by the incoming contributing document

Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CI setup (#25)

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci _cr

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* revert setup tool

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove pytest-runner from setup.py

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Merge EN riva release 22.10 (#26)

* Merge EN riva release 22.10

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Eng TN - update urls to handle dictionary words (#27)

* wip el words

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip el words

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* electronic pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* test pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove unused imports

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add deterministic option normalized options

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins grammar folder

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up, update for SH

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* reduce cardinal graph

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add weight for sh

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Tn en astronomical no (#28)

* Add support for large numbers (>999,999,999,999,999)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache folder in Jenkinsfile

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Increase mem size for CI tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Updating shmem for docker to deal with memory overflow

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Ensure large au cardinal graph is used only if deterministic

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Make comma mandatory in cardinals

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Run FST cache generation and Pytests in separate stages

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix stage

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Change cache folder

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add whitelist param to ITN (#30)

* add whitelist param to itn

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add whitelist to export

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update docstrings

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Eng tn itn (#31)

* Add additional units and plurals

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add support for financial periods (1H22, 2Q19)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add missing plural for "gigabit per second"

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix for measures

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Use environment variables to set path of fst cache

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix environment variable

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Fix parse "None" as string (#33)

* Fix parse "None" as string

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* read double digits for telephone grammar (#32)

* read double digits for telephone grammar

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* import zero graph instead of hard coding

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>

---------

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Install (#35)

* remove conda pynini install

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added pynini install note

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Install (#36)

* remove conda pynini install

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added pynini install note

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.6rc0 (#37)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add ci (#39)

* Add additional languages to CI Pipeline

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix Jenkinsfile

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add missing 'ar' in lang options

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix missing 'ar' in normalize.py

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Correct name of verbalizer far

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Run language tests in stages

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update DE cache folder

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add VI, RU, SV CI tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix misssing bracket, add ZH

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Use non-deterministic TN for RU

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* support the use of phonetic superscript letters for ordinals, because there are maniacs on the internet who think that because you can, you should (#41)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update fr cache path for ci (#44)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update ITN to work after Punctuation capitalization model (#22)

* add cases with capitalization, cardinal, decimal pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix telephone, ordinal

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update electronic

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* review feedback, update whitelist

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* rename capitalize func

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix SH tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins folder name

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* added cased arg to ITN

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add input_case arg to other lang

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dirs update

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix codeql errors

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix sh

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* review

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix default value

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* En names (#42)

* Add support for Financial year and for  years between 1000 BC and 1000AD

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add support for product names and add abbreviations to whitelist

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add weights for some sequences, exclude 'a' before numeric sequence

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache folder for EN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update FR Cache path

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Move text to TSV files, and some code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add additional vocabulary, allow singular usage of units to support adjective phrases

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix issue with whitelist loader not handling weights correctly
Move cased loader file to graph_utils

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* insert space between value and unit

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Insert space between measurement and unit. Adjust weight for ordinal

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update doc and fix alignment for itn (#47)

* save

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* save

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* extend alignment for itn

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Align ci test (#51)

* added jenkins tests for aligment

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added test to pr doc

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Audio-based TN for Swedish (#49)

* Audio-based TN for Swedish, for Språkbanken Tal

Replaces #48

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updating cache directory

(Not entirely sure what the pattern is)

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Delete tokenize_and_classify_lm.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* fraction fix from ITN branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix sv tests (#52)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.7 release (#53)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* En names (#56)

* Rename "period" tag to "text" tag for date to avoid changes to sparrowhawk proto

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update Jenkinsfile

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix bug for hh:mm:ss normalization (#57)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* rewrite regex to silence deprecation warning (#55)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Hungarian TN ✅ (#9)

* additional exports from cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add inflection for quantities

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* enable decimal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change integer

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to verbaliser for decimal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add superessive forms (powers of)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* superscript to superessive

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add vowels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add vowels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix var

Signed-off-by: Jim O'Regan <joregan@kth.se>

* bare minimum electronic test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a symbol

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add incomplete time tagger (partially adapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix error with some inflected abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some alternative measure forms

Signed-off-by: Jim O'Regan <joregan@kth.se>

* hour, minute, second; whichever is last can be inflected

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add test runner for time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal time test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* will want cardinal here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add inflection for things like GBP, where inflection is based on pé

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* docstring

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move two letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add my copyright

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partially adapted number tagger (adapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add unadapted measure tagger (from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* other ways of reading w

Signed-off-by: Jim O'Regan <joregan@kth.se>

* for non deterministic, a bunch of these symbols can be read as letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* currency

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more inflection

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get the abbreviation expanded as letters for non-deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* working now, add a comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also integer, and preserve order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also accept the full words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* deduplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* reorder to make a bit more sense

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* explicitly make tuples elsewhere; this works from what I see of the function output, but not in the resulting fst

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* commenting out weighted part makes this work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* duplicate space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partially adapted money verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* actually saving the adaptations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add time_zone data (copy from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete commented code, irrelevant here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small modifications, still thinking about how to tackle this

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix missing tabs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove pynini checks from tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for measure (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for telephone (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for time (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix cache dir

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tagger for telephone (copy from sv)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add basic tests (native verified)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add components for read digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an example with a different separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* start adapting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add 2-digit area codes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Bp to area codes, no need to be that specific

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export var

Signed-off-by: Jim O'Regan <joregan@kth.se>

* in progress

Signed-off-by: Jim O'Regan <joregan@kth.se>

* country codes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy/paste errors abound

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put in a function rather than duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nominal digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add IP prompt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add google copyright notice; probably meaningless

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more work on telephone

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minor adaptation; more needed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace time verbaliser with version from sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt more

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nearly there

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace with version from sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some tweaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an IP test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a couple more ordinal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move variables

Signed-off-by: Jim O'Regan <joregan@kth.se>

* filter ordinals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic fraction tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* . and / both clash, so only make year optional if it is not deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* using the other word for two, that test cannot pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* numerator and denominator can compound; qdd minus

Signed-off-by: Jim O'Regan <joregan@kth.se>

* form fractionals in ordinal, because something about bare_ordinals does not work when exported

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test, including spaces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* works in the repl, not in reality

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy fraction symbols from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy two lines from es to handle faction symbols

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test for that

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* extend

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ah, I was forgetting to delete preserve order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add pieces from swedish itn, adapted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a function to give from/to minutes for 15/30/45 subdivision

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add functions, but some pieces came from ITN, so are backwards

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, should change the quarter word to a cardinal, or something

Signed-off-by: Jim O'Regan <joregan@kth.se>

* swapping order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more swapping

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an example

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change some things

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some things fixed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more adjustments to time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more todo, but working for this subset

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* missing endings

Signed-off-by: Jim O'Regan <joregan@kth.se>

* sort|uniq

Signed-off-by: Jim O'Regan <joregan@kth.se>

* timezone can be inflected too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add sparrowhark test (todo)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test_cases_word (copy from sv)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some word cases with Hungarian accents

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Hungarian to Jenkinsfile. This may cause much distress and wailing and gnashing of teeth

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix the commented ITN part

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases for the last two parts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix measure cardinals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* a couple more tests, last still not working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missed removing preserver_order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* codeql

Signed-off-by: Jim O'Regan <joregan@kth.se>

* codeql

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment the variables I may wish to use later (codeql)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix decimals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* incorporate feedback from @Laszlo-Weber

Signed-off-by: Jim O'Regan <joregan@kth.se>

* bare minimum tests + fix verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add öre (also for NOK)

Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>

* Comment line, for now

Signed-off-by: Jim O’Regan <joregan@kth.se>

* try breaking this into pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci
F438
] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* revert 0c6823e111a876495702d347cf7b347106388ed4

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a bug in cardinal graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* at no point is 000 being deleted; probably why the tests are weird

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert a0d031a861fcd7b5750027f2887f3344f39b6616

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add more spaced alternatives to the non-deterministic cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add the hyphen before or-ing with 000

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change money handling to keep sparrowhawk happy

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add [be]os_or_space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try just rewriting the offending pieces to see if they are coming from here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try just rewriting the offending pieces to see if they are coming from here"

This reverts commit bc06b1162703354fe7bd5efaff7f58ed981d81c0.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Ok... seems to not be happening here either

Revert "try here"

This reverts commit 801c5f1c28d234c8b47a1d6f52f662b909fbb1c2.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try moving a test to see if that makes any difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try duplicating to see if it fails twice

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Ok, fails both times

Revert "try duplicating to see if it fails twice"

This reverts commit 908cddc7b3453fb2deaaa201881a304c853a746a.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 1 fails in some places, 2 in others, so add 2 here and see if that also fails

Signed-off-by: Jim O'Regan <joregan@kth.se>

* see if this makes a difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* It does not

Revert "see if this makes a difference"

This reverts commit dacc61281c4efbfd2d5ce1e91386cbd234392d28.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite regex to silence deprecation warning

Signed-off-by: Jim O'Regan <joregan@kth.se>

* REVERTME: change to see what is happening

Signed-off-by: Jim O'Regan <joregan@kth.se>

* that missing bracket cannot have been good

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no difference, try just deleting leading zero

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move that thing, merge some lines

Signed-off-by: Jim O'Regan <joregan@kth.se>

* at least it fails quickly

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export original

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move things around for no real reason

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add in the clean_cardinal from the tutorial

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "add in the clean_cardinal from the tutorial"

This reverts commit 4f06c885a0bfe1acc183c3560d88c4e2e76574ac.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try this again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* pretty sure this should work. As should the other

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment the ugly kludges to make them easier to remove. They do not work anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, try here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "rewrite regex to silence deprecation warning"

This reverts commit b8a923db57d27c5b3353be4b85bac9efb6e2d220.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "REVERTME: change to see what is happening"

This reverts commit c73e4ef384ec29a2b5877dac9d4fe617a5c681b6.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export unfiltered version of cardinal graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change the variable names

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of duplicate input print

Signed-off-by: Jim O'Regan <joregan@kth.se>

* BUGHUNT: check if string has been escaped

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changing variable, because I am getting tired of looking at that overly long name

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try deleting the normaliser to see if that makes any difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "BUGHUNT: check if string has been escaped"

This reverts commit 70f83241d47b0c73fa41e395eee193cc1685e056.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try deleting the normaliser to see if that makes any difference"

This reverts commit 78f4ded93375308dadb9b5e247f030da2efbecb5.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* moving globals into __init__ fixes the problem

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_sparrowhawk_normalization.sh

Signed-off-by: Jim O’Regan <joregan@kth.se>

* prompt: is not part of the ontology sparrowhawk recognises

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* these two now conflict

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rearrange slightly

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update telephone.py

remove unused import

Signed-off-by: Jim O’Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Es bugfix (#59)

* improve shortest path for decimals and currency

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix sh tn test files for telephone

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* replace non-breaking space

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* improve ambiguous test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* refine weights for decimal

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* improve testing when there are multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* revert ES TN for measures with mixed fractions

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix formatting

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add comment for testing multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Store input_case in Normalizer (#65)

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish telephone fix (#60)

* port fix for telephone from swedish-itn branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend cardinal in non-deterministic mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitespace fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also fix in the verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update Jenkinsfile

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* log instead of print in graph_utils.py (#68)

Signed-off-by: Enno Hermann <enno.hermann@idiap.ch>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CER estimation speedup for audio-based text normalization (#73)

* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add measure coverage for TN and ITN (#62)

* add measure coverage for TN and ITN

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update measure.py

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63)

* upload es-ES and fr-FR g2p dicts

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add inits

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add NALA Spanish dict

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* rename Spanish and French dictionaries

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add Italian dictionary

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.no…
ankitnv pushed a commit to ankitnv/NeMo-text-processing that referenced this pull request Oct 28, 2024
* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* temporal changes will change back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update jp tn date

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving conflict

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing ci test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updats on Jenkins

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jenkinspdate

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* changing the data format, to align to the blind test data

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding one more test item

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing ci test cases
resolving conflicts
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and NEMO_SPACES_AND_ALHPANUMERICS

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixed typo on decimaltext

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing unsed improts

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unused import

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* changed regular space to narrow space

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* imports error fixing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* imports errors

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Jekins update for jp itn

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* reverting

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issuel chaing narrow space to regular normal space

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixing style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixng style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed imports

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jp tn date update

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Update test_cases_fraction.txt

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* removing previously created nemo imports

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* test order arrangement

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolve fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update jp tn

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed import

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Update post_processing.py

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* empty file

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* to delete

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add contributing (#21)

* add contributing

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* add

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add jenkins file (#23)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish TN (#12)

* test now runs, but getting ordinal instead of cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* force ordinals to either have :a/:e or "." at the end

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add minimal ordinal data

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for ordinals, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add // to symbols

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test cases for electronic; transformed with sed from spanish, so I expect errors

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for electronic, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move to graph_utils

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal test case for fractions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for fraction, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix language

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix graph construction to make pluralisation work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for decimal, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for whitelist, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal test case for whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for word, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for date, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for measure, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a pair of test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix plurals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix number, but this whole thing is only partially adapted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix some test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add usd$

Signed-off-by: Jim O'Regan <joregan@kth.se>

* insert "komma"

Signed-off-by: Jim O'Regan <joregan@kth.se>

* "pund" is neuter

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* towards proper graphs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* GBP

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make komma non-det

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more money tagger fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more minor words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* do a bit better with en/ett

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use the correct list

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make sure the numbers have no 1

Signed-off-by: Jim O'Regan <joregan@kth.se>

* abbreviations for million and milliard

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add year suffixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add minimal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* expansions of era abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use eras

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use eras in verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix examples in comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix extension

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* date verbaliser is broken, this does not fix it

Signed-off-by: Jim O'Regan <joregan@kth.se>

* load labels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* right first time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missing space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix year in test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* getting closer to getting dates working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a (failing) test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* date working now

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also handle decades

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove todo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* years where -00 is -hundra

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for telephone (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changes to telephone tagger/verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add partially incomplete test data

Signed-off-by: Jim O'Regan <joregan@kth.se>

* mostly fixed test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more in progress changes to telephone parts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* much prodding later, turns out I forgot a space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missed wrapping

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "no difference"

This reverts commit 29680925bebd65d489f3b1a5415607c12bb7e3b9.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* telephone tagger works

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try adding brackets

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try adding more brackets

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a comment, because I confused myself

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add in abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a version for fraction that does what I intended: 2 & 3 digit numbers without leading 0 are read as cardinals, everything else as digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* single digit

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test case/remove a duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use the nice variable I just added to cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* this is not right; leading zeros fail

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "this is not right; leading zeros fail"

This reverts commit 5997e95e0cb08ffee9cf21a9c82697ed7beb042f.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, this seems to work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* drop the tests starting with comma

Signed-off-by: Jim O'Regan <joregan@kth.se>

* decimal tagger works

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* lower case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add klockan and variants as a prompt, so they are not silently deleted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a very minimal test case for time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite with less ambiguity

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite with less ambiguity, hms

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add prompt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy the roman handling from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* greek letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some fixes to the time tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for time (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for time (adapted from es) ((actually adapted))

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more work on time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* |=, not =

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt verbaliser a little

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some test cases from module comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export some variables to check

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment some stuff that needs major changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try doing this here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try doing this here"

This reverts commit ebdba0e3da5cdde19eae24f268ee0edd21f298b5.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix errors in tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minimal test cases for measure

Signed-off-by: Jim O'Regan <joregan@kth.se>

* sort|uniq everything, see what the difference is

Signed-off-by: Jim O'Regan <joregan@kth.se>

* merge different tsvs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix casing to avoid conflicts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export some variables for testing

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* need an en/ett split here too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix decimal subgraph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove todo, I've just done it

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove missing integer test, does not work elsewhere

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* include greek letters in maths

Signed-off-by: Jim O'Regan <joregan@kth.se>

* include greek here too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minor sg/pl

Signed-off-by: Jim O'Regan <joregan@kth.se>

* dedup

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put these under if, too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no; there are no minor neuters, so that is not relevant here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove greek from here, interferes with delimeter

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export variables to see what is happening

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix some test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* here is one error

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* put ensure_space in graph_utils

Signed-off-by: Jim O'Regan <joregan@kth.se>

* handle cases where unit follows amount

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export a variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a tesst case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* . is not a cardinal separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add yen

Signed-off-by: Jim O'Regan <joregan@kth.se>

* final fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove English roman tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add tokenize_and_classify_lm.py (adapted from en)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove some unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add tokenize_and_classify_with_audio.py (adapted from en)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test pieces for audio (recopied from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add audio test (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* in non-deterministic mode, generate both en and ett

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal non-deterministic test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <joregan@kth.se>

* warnings about missing whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* some Riksdag specific titles

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add my copyright to the other files with non-trivial changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix year

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Swedish support in pynini_export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add Swedish support for sparrowhark tests -- untested (:

Signed-off-by: Jim O'Regan <joregan@kth.se>

* address codeql comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change decade to year; sparrowhawk enforces categories

Signed-off-by: Jim O'Regan <joregan@kth.se>

* shoehorn this stuff into the overly narrow sparrowhawk classes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "shoehorn this stuff into the overly narrow sparrowhawk classes"

This reverts commit a3cf3d5de1702366b2bf9c12ebf7e5d26634c688.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* read out the AM/PM words, they are not read as letters anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change date verbaliser to manage isolated decades

Signed-off-by: Jim O'Regan <joregan@kth.se>

* redo changes to get rid of 'prompt' for 'klockan'

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove broken duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a case for hours without minutes (which should not happen)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* time tests now pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a time test case that also passes here, but not in sparrowhawk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix error in dates, add more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* import delete_preserve_order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.
10000
com hooks

for more information, see https://pre-commit.ci

* codeql feedback

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some ambiguous abbreviations for non-deterministic mode (more as a demonstration than anything deeply useful)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move to the correct subdirectory

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add swedish

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove pynini checks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix error with 1000 in non-deterministic cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix here also

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also generate a string of digits if not deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a date case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove duplication

Signed-off-by: Jim O'Regan <joregan@kth.se>

* boost n_tagged

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also copyright this year

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 1500 only fixes one, boost again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 2500 does nothing, going to -1

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway"

This reverts commit 383a096083061b0c79457e815a65e55563c7ac74.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try setting a low weight to everything non-default

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put n_tagged back to 500

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* days of the week

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add more abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* setting weights did not work for cardinals, but it did push the test from taking 11 minutes to something more than 40. Re-reverting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove blank line

Signed-off-by: Jim O'Regan <joregan@kth.se>

* forgot to remove this piece in the merge conflict

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove erroneously added copyright notice

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add __init__.py in a few places it was missing

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add the google notice required by the incoming contributing document

Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CI setup (#25)

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci _cr

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* revert setup tool

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove pytest-runner from setup.py

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Merge EN riva release 22.10 (#26)

* Merge EN riva release 22.10

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Eng TN - update urls to handle dictionary words (#27)

* wip el words

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip el words

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* electronic pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* test pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove unused imports

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add deterministic option normalized options

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins grammar folder

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up, update for SH

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* reduce cardinal graph

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add weight for sh

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Tn en astronomical no (#28)

* Add support for large numbers (>999,999,999,999,999)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache folder in Jenkinsfile

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Increase mem size for CI tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Updating shmem for docker to deal with memory overflow

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Ensure large au cardinal graph is used only if deterministic

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Make comma mandatory in cardinals

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Run FST cache generation and Pytests in separate stages

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix stage

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Change cache folder

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add whitelist param to ITN (#30)

* add whitelist param to itn

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add whitelist to export

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update docstrings

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Eng tn itn (#31)

* Add additional units and plurals

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add support for financial periods (1H22, 2Q19)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add missing plural for "gigabit per second"

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix for measures

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Use environment variables to set path of fst cache

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix environment variable

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Fix parse "None" as string (#33)

* Fix parse "None" as string

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* read double digits for telephone grammar (#32)

* read double digits for telephone grammar

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* import zero graph instead of hard coding

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>

---------

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Install (#35)

* remove conda pynini install

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added pynini install note

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Install (#36)

* remove conda pynini install

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added pynini install note

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.6rc0 (#37)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add ci (#39)

* Add additional languages to CI Pipeline

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix Jenkinsfile

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add missing 'ar' in lang options

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix missing 'ar' in normalize.py

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Correct name of verbalizer far

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Run language tests in stages

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update DE cache folder

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add VI, RU, SV CI tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix misssing bracket, add ZH

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Use non-deterministic TN for RU

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* support the use of phonetic superscript letters for ordinals, because there are maniacs on the internet who think that because you can, you should (#41)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update fr cache path for ci (#44)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update ITN to work after Punctuation capitalization model (#22)

* add cases with capitalization, cardinal, decimal pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix telephone, ordinal

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update electronic

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* review feedback, update whitelist

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* rename capitalize func

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix SH tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins folder name

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* added cased arg to ITN

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add input_case arg to other lang

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dirs update

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix codeql errors

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix sh

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* review

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix default value

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* En names (#42)

* Add support for Financial year and for  years between 1000 BC and 1000AD

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add support for product names and add abbreviations to whitelist

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add weights for some sequences, exclude 'a' before numeric sequence

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache folder for EN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update FR Cache path

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Move text to TSV files, and some code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add additional vocabulary, allow singular usage of units to support adjective phrases

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix issue with whitelist loader not handling weights correctly
Move cased loader file to graph_utils

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* insert space between value and unit

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Insert space between measurement and unit. Adjust weight for ordinal

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update doc and fix alignment for itn (#47)

* save

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* save

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* extend alignment for itn

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Align ci test (#51)

* added jenkins tests for aligment

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added test to pr doc

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Audio-based TN for Swedish (#49)

* Audio-based TN for Swedish, for Språkbanken Tal

Replaces #48

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updating cache directory

(Not entirely sure what the pattern is)

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Delete tokenize_and_classify_lm.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* fraction fix from ITN branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix sv tests (#52)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.7 release (#53)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* En names (#56)

* Rename "period" tag to "text" tag for date to avoid changes to sparrowhawk proto

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update Jenkinsfile

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix bug for hh:mm:ss normalization (#57)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* rewrite regex to silence deprecation warning (#55)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Hungarian TN ✅ (#9)

* additional exports from cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add inflection for quantities

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* enable decimal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change integer

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to verbaliser for decimal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add superessive forms (powers of)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* superscript to superessive

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add vowels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add vowels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix var

Signed-off-by: Jim O'Regan <joregan@kth.se>

* bare minimum electronic test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a symbol

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add incomplete time tagger (partially adapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix error with some inflected abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some alternative measure forms

Signed-off-by: Jim O'Regan <joregan@kth.se>

* hour, minute, second; whichever is last can be inflected

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add test runner for time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal time test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* will want cardinal here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add inflection for things like GBP, where inflection is based on pé

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* docstring

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move two letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add my copyright

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partially adapted number tagger (adapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add unadapted measure tagger (from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* other ways of reading w

Signed-off-by: Jim O'Regan <joregan@kth.se>

* for non deterministic, a bunch of these symbols can be read as letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* currency

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more inflection

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get the abbreviation expanded as letters for non-deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* working now, add a comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also integer, and preserve order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also accept the full words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* deduplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* reorder to make a bit more sense

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* explicitly make tuples elsewhere; this works from what I see of the function output, but not in the resulting fst

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* commenting out weighted part makes this work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* duplicate space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partially adapted money verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* actually saving the adaptations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add time_zone data (copy from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete commented code, irrelevant here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small modifications, still thinking about how to tackle this

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix missing tabs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove pynini checks from tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for measure (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for telephone (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for time (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix cache dir

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tagger for telephone (copy from sv)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add basic tests (native verified)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add components for read digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an example with a different separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* start adapting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add 2-digit area codes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Bp to area codes, no need to be that specific

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export var

Signed-off-by: Jim O'Regan <joregan@kth.se>

* in progress

Signed-off-by: Jim O'Regan <joregan@kth.se>

* country codes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy/paste errors abound

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put in a function rather than duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nominal digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add IP prompt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add google copyright notice; probably meaningless

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more work on telephone

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minor adaptation; more needed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace time verbaliser with version from sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt more

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nearly there

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace with version from sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some tweaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an IP test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a couple more ordinal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move variables

Signed-off-by: Jim O'Regan <joregan@kth.se>

* filter ordinals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic fraction tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* . and / both clash, so only make year optional if it is not deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* using the other word for two, that test cannot pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* numerator and denominator can compound; qdd minus

Signed-off-by: Jim O'Regan <joregan@kth.se>

* form fractionals in ordinal, because something about bare_ordinals does not work when exported

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test, including spaces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* works in the repl, not in reality

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy fraction symbols from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy two lines from es to handle faction symbols

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test for that

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* extend

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ah, I was forgetting to delete preserve order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add pieces from swedish itn, adapted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a function to give from/to minutes for 15/30/45 subdivision

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add functions, but some pieces came from ITN, so are backwards

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, should change the quarter word to a cardinal, or something

Signed-off-by: Jim O'Regan <joregan@kth.se>

* swapping order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more swapping

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an example

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change some things

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some things fixed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more adjustments to time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more todo, but working for this subset

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* missing endings

Signed-off-by: Jim O'Regan <joregan@kth.se>

* sort|uniq

Signed-off-by: Jim O'Regan <joregan@kth.se>

* timezone can be inflected too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add sparrowhark test (todo)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test_cases_word (copy from sv)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some word cases with Hungarian accents

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Hungarian to Jenkinsfile. This may cause much distress and wailing and gnashing of teeth

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix the commented ITN part

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases for the last two parts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix measure cardinals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* a couple more tests, last still not working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missed removing preserver_order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* codeql

Signed-off-by: Jim O'Regan <joregan@kth.se>

* codeql

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment the variables I may wish to use later (codeql)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix decimals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* incorporate feedback from @Laszlo-Weber

Signed-off-by: Jim O'Regan <joregan@kth.se>

* bare minimum tests + fix verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add öre (also for NOK)

Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>

* Comment line, for now

Signed-off-by: Jim O’Regan <joregan@kth.se>

* try breaking this into pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* revert 0c6823e111a876495702d347cf7b347106388ed4

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a bug in cardinal graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* at no point is 000 being deleted; probably why the tests are weird

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert a0d031a861fcd7b5750027f2887f3344f39b6616

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add more spaced alternatives to the non-deterministic cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add the hyphen before or-ing with 000

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change money handling to keep sparrowhawk happy

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add [be]os_or_space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try just rewriting the offending pieces to see if they are coming from here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try just rewriting the offending pieces to see if they are coming from here"

This reverts commit bc06b1162703354fe7bd5efaff7f58ed981d81c0.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Ok... seems to not be happening here either

Revert "try here"

This reverts commit 801c5f1c28d234c8b47a1d6f52f662b909fbb1c2.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try moving a test to see if that makes any difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try duplicating to see if it fails twice

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Ok, fails both times

Revert "try duplicating to see if it fails twice"

This reverts commit 908cddc7b3453fb2deaaa201881a304c853a746a.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 1 fails in some places, 2 in others, so add 2 here and see if that also fails

Signed-off-by: Jim O'Regan <joregan@kth.se>

* see if this makes a difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* It does not

Revert "see if this makes a difference"

This reverts commit dacc61281c4efbfd2d5ce1e91386cbd234392d28.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite regex to silence deprecation warning

Signed-off-by: Jim O'Regan <joregan@kth.se>

* REVERTME: change to see what is happening

Signed-off-by: Jim O'Regan <joregan@kth.se>

* that missing bracket cannot have been good

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no difference, try just deleting leading zero

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move that thing, merge some lines

Signed-off-by: Jim O'Regan <joregan@kth.se>

* at least it fails quickly

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export original

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move things around for no real reason

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add in the clean_cardinal from the tutorial

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "add in the clean_cardinal from the tutorial"

This reverts commit 4f06c885a0bfe1acc183c3560d88c4e2e76574ac.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try this again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* pretty sure this should work. As should the other

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment the ugly kludges to make them easier to remove. They do not work anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, try here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "rewrite regex to silence deprecation warning"

This reverts commit b8a923db57d27c5b3353be4b85bac9efb6e2d220.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "REVERTME: change to see what is happening"

This reverts commit c73e4ef384ec29a2b5877dac9d4fe617a5c681b6.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export unfiltered version of cardinal graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change the variable names

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of duplicate input print

Signed-off-by: Jim O'Regan <joregan@kth.se>

* BUGHUNT: check if string has been escaped

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changing variable, because I am getting tired of looking at that overly long name

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try deleting the normaliser to see if that makes any difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "BUGHUNT: check if string has been escaped"

This reverts commit 70f83241d47b0c73fa41e395eee193cc1685e056.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try deleting the normaliser to see if that makes any difference"

This reverts commit 78f4ded93375308dadb9b5e247f030da2efbecb5.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* moving globals into __init__ fixes the problem

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_sparrowhawk_normalization.sh

Signed-off-by: Jim O’Regan <joregan@kth.se>

* prompt: is not part of the ontology sparrowhawk recognises

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* these two now conflict

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rearrange slightly

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update telephone.py

remove unused import

Signed-off-by: Jim O’Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Es bugfix (#59)

* improve shortest path for decimals and currency

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix sh tn test files for telephone

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* replace non-breaking space

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* improve ambiguous test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* refine weights for decimal

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* improve testing when there are multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* revert ES TN for measures with mixed fractions

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix formatting

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add comment for testing multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Store input_case in Normalizer (#65)

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish telephone fix (#60)

* port fix for telephone from swedish-itn branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend cardinal in non-deterministic mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitespace fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also fix in the verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update Jenkinsfile

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* log instead of print in graph_utils.py (#68)

Signed-off-by: Enno Hermann <enno.hermann@idiap.ch>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CER estimation speedup for audio-based text normalization (#73)

* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add measure coverage for TN and ITN (#62)

* add measure coverage for TN and ITN

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update measure.py

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63)

* upload es-ES and fr-FR g2p dicts

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add inits

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add NALA Spanish dict

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* rename Spanish and French dictionaries

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add Italian dictionary

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.no…
ankitnv added a commit to ankitnv/NeMo-text-processing that referenced this pull request Oct 28, 2024
* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* temporal changes will change back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update jp tn date

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving conflict

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing ci test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updats on Jenkins

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jenkinspdate

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* changing the data format, to align to the blind test data

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding one more test item

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing ci test cases
resolving conflicts
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and NEMO_SPACES_AND_ALHPANUMERICS

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixed typo on decimaltext

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing unsed improts

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unused import

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* changed regular space to narrow space

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* imports error fixing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* imports errors

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Jekins update for jp itn

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* reverting

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issuel chaing narrow space to regular normal space

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixing style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixng style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed imports

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jp tn date update

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Update test_cases_fraction.txt

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* removing previously created nemo imports

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* test order arrangement

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolve fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update jp tn

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed import

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Update post_processing.py

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* empty file

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* to delete

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add contributing (#21)

* add contributing

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* add

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add jenkins file (#23)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish TN (#12)

* test now runs, but getting ordinal instead of cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* force ordinals to either have :a/:e or "." at the end

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add minimal ordinal data

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for ordinals, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add // to symbols

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test cases for electronic; transformed with sed from spanish, so I expect errors

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for electronic, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move to graph_utils

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal test case for fractions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for fraction, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix language

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix graph construction to make pluralisation work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for decimal, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for whitelist, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal test case for whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for word, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for date, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for measure, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a pair of test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix plurals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix number, but this whole thing is only partially adapted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix some test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add usd$

Signed-off-by: Jim O'Regan <joregan@kth.se>

* insert "komma"

Signed-off-by: Jim O'Regan <joregan@kth.se>

* "pund" is neuter

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* towards proper graphs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* GBP

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make komma non-det

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more money tagger fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more minor words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* do a bit better with en/ett

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use the correct list

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make sure the numbers have no 1

Signed-off-by: Jim O'Regan <joregan@kth.se>

* abbreviations for million and milliard

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add year suffixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add minimal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* expansions of era abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use eras

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use eras in verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix examples in comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix extension

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* date verbaliser is broken, this does not fix it

Signed-off-by: Jim O'Regan <joregan@kth.se>

* load labels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* right first time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missing space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix year in test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* getting closer to getting dates working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a (failing) test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* date working now

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also handle decades

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove todo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* years where -00 is -hundra

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for telephone (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changes to telephone tagger/verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add partially incomplete test data

Signed-off-by: Jim O'Regan <joregan@kth.se>

* mostly fixed test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more in progress changes to telephone parts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* much prodding later, turns out I forgot a space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missed wrapping

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "no difference"

This reverts commit 29680925bebd65d489f3b1a5415607c12bb7e3b9.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* telephone tagger works

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try adding brackets

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try adding more brackets

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a comment, because I confused myself

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add in abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a version for fraction that does what I intended: 2 & 3 digit numbers without leading 0 are read as cardinals, everything else as digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* single digit

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test case/remove a duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use the nice variable I just added to cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* this is not right; leading zeros fail

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "this is not right; leading zeros fail"

This reverts commit 5997e95e0cb08ffee9cf21a9c82697ed7beb042f.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, this seems to work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* drop the tests starting with comma

Signed-off-by: Jim O'Regan <joregan@kth.se>

* decimal tagger works

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* lower case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add klockan and variants as a prompt, so they are not silently deleted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a very minimal test case for time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite with less ambiguity

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite with less ambiguity, hms

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add prompt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy the roman handling from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* greek letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some fixes to the time tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for time (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for time (adapted from es) ((actually adapted))

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more work on time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* |=, not =

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt verbaliser a little

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some test cases from module comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export some variables to check

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment some stuff that needs major changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try doing this here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try doing this here"

This reverts commit ebdba0e3da5cdde19eae24f268ee0edd21f298b5.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix errors in tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minimal test cases for measure

Signed-off-by: Jim O'Regan <joregan@kth.se>

* sort|uniq everything, see what the difference is

Signed-off-by: Jim O'Regan <joregan@kth.se>

* merge different tsvs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix casing to avoid conflicts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export some variables for testing

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* need an en/ett split here too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix decimal subgraph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove todo, I've just done it

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove missing integer test, does not work elsewhere

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* include greek letters in maths

Signed-off-by: Jim O'Regan <joregan@kth.se>

* include greek here too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minor sg/pl

Signed-off-by: Jim O'Regan <joregan@kth.se>

* dedup

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put these under if, too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no; there are no minor neuters, so that is not relevant here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove greek from here, interferes with delimeter

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export variables to see what is happening

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix some test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* here is one error

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* put ensure_space in graph_utils

Signed-off-by: Jim O'Regan <joregan@kth.se>

* handle cases where unit follows amount

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export a variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a tesst case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* . is not a cardinal separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add yen

Signed-off-by: Jim O'Regan <joregan@kth.se>

* final fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove English roman tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add tokenize_and_classify_lm.py (adapted from en)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove some unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add tokenize_and_classify_with_audio.py (adapted from en)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test pieces for audio (recopied from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add audio test (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* in non-deterministic mode, generate both en and ett

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal non-deterministic test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <joregan@kth.se>

* warnings about missing whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* some Riksdag specific titles

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add my copyright to the other files with non-trivial changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix year

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Swedish support in pynini_export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add Swedish support for sparrowhark tests -- untested (:

Signed-off-by: Jim O'Regan <joregan@kth.se>

* address codeql comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change decade to year; sparrowhawk enforces categories

Signed-off-by: Jim O'Regan <joregan@kth.se>

* shoehorn this stuff into the overly narrow sparrowhawk classes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "shoehorn this stuff into the overly narrow sparrowhawk classes"

This reverts commit a3cf3d5de1702366b2bf9c12ebf7e5d26634c688.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* read out the AM/PM words, they are not read as letters anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change date verbaliser to manage isolated decades

Signed-off-by: Jim O'Regan <joregan@kth.se>

* redo changes to get rid of 'prompt' for 'klockan'

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove broken duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a case for hours without minutes (which should not happen)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* time tests now pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a time test case that also passes here, but not in sparrowhawk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix error in dates, add more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* import delete_preserve_order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeql feedback

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some ambiguous abbreviations for non-deterministic mode (more as a demonstration than anything deeply useful)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move to the correct subdirectory

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add swedish

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove pynini checks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix error with 1000 in non-deterministic cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix here also

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also generate a string of digits if not deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a date case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove duplication

Signed-off-by: Jim O'Regan <joregan@kth.se>

* boost n_tagged

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also copyright this year

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 1500 only fixes one, boost again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 2500 does nothing, going to -1

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway"

This reverts commit 383a096083061b0c79457e815a65e55563c7ac74.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try setting a low weight to everything non-default

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put n_tagged back to 500

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* days of the week

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add more abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* setting weights did not work for cardinals, but it did push the test from taking 11 minutes to something more than 40. Re-reverting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove blank line

Signed-off-by: Jim O'Regan <joregan@kth.se>

* forgot to remove this piece in the merge conflict

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove erroneously added copyright notice

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add __init__.py in a few places it was missing

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add the google notice required by the incoming contributing document

Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CI setup (#25)

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci _cr

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* revert setup tool

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove pytest-runner from setup.py

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Merge EN riva release 22.10 (#26)

* Merge EN riva release 22.10

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Eng TN - update urls to handle dictionary words (#27)

* wip el words

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip el words

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* electronic pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* test pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove unused imports

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add deterministic option normalized options

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins grammar folder

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up, update for SH

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* reduce cardinal graph

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add weight for sh

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Tn en astronomical no (#28)

* Add support for large numbers (>999,999,999,999,999)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache folder in Jenkinsfile

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Increase mem size for CI tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Updating shmem for docker to deal with memory overflow

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Ensure large au cardinal graph is used only if deterministic

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Make comma mandatory in cardinals

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Run FST cache generation and Pytests in separate stages

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix stage

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Change cache folder

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add whitelist param to ITN (#30)

* add whitelist param to itn

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add whitelist to export

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update docstrings

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Eng tn itn (#31)

* Add additional units and plurals

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add support for financial periods (1H22, 2Q19)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add missing plural for "gigabit per second"

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix for measures

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Use environment variables to set path of fst cache

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix environment variable

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Fix parse "None" as string (#33)

* Fix parse "None" as string

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* read double digits for telephone grammar (#32)

* read double digits for telephone grammar

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* import zero graph instead of hard coding

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>

---------

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Install (#35)

* remove conda pynini install

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added pynini install note

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Install (#36)

* remove conda pynini install

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added pynini install note

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.6rc0 (#37)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add ci (#39)

* Add additional languages to CI Pipeline

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix Jenkinsfile

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add missing 'ar' in lang options

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix missing 'ar' in normalize.py

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Correct name of verbalizer far

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Run language tests in stages

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update DE cache folder

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add VI, RU, SV CI tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix misssing bracket, add ZH

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Use non-deterministic TN for RU

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* support the use of phonetic superscript letters for ordinals, because there are maniacs on the internet who think that because you can, you should (#41)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update fr cache path for ci (#44)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update ITN to work after Punctuation capitalization model (#22)

* add cases with capitalization, cardinal, decimal pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix telephone, ordinal

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update electronic

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* review feedback, update whitelist

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* rename capitalize func

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix SH tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins folder name

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* added cased arg to ITN

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add input_case arg to other lang

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dirs update

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix codeql errors

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix sh

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* review

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix default value

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* En names (#42)

* Add support for Financial year and for  years between 1000 BC and 1000AD

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add support for product names and add abbreviations to whitelist

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add weights for some sequences, exclude 'a' before numeric sequence

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache folder for EN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update FR Cache path

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Move text to TSV files, and some code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add additional vocabulary, allow singular usage of units to support adjective phrases

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix issue with whitelist loader not handling weights correctly
Move cased loader file to graph_utils

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* insert space between value and unit

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Insert space between measurement and unit. Adjust weight for ordinal

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update doc and fix alignment for itn (#47)

* save

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* save

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* extend alignment for itn

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Align ci test (#51)

* added jenkins tests for aligment

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added test to pr doc

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Audio-based TN for Swedish (#49)

* Audio-based TN for Swedish, for Språkbanken Tal

Replaces #48

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updating cache directory

(Not entirely sure what the pattern is)

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Delete tokenize_and_classify_lm.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* fraction fix from ITN branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix sv tests (#52)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.7 release (#53)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* En names (#56)

* Rename "period" tag to "text" tag for date to avoid changes to sparrowhawk proto

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update Jenkinsfile

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix bug for hh:mm:ss normalization (#57)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* rewrite regex to silence deprecation warning (#55)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Hungarian TN ✅ (#9)

* additional exports from cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add inflection for quantities

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* enable decimal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change integer

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to verbaliser for decimal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add superessive forms (powers of)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* superscript to superessive

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add vowels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add vowels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix var

Signed-off-by: Jim O'Regan <joregan@kth.se>

* bare minimum electronic test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a symbol

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add incomplete time tagger (partially adapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix error with some inflected abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some alternative measure forms

Signed-off-by: Jim O'Regan <joregan@kth.se>

* hour, minute, second; whichever is last can be inflected

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add test runner for time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal time test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* will want cardinal here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add inflection for things like GBP, where inflection is based on pé

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* docstring

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move two letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add my copyright

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partially adapted number tagger (adapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add unadapted measure tagger (from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* other ways of reading w

Signed-off-by: Jim O'Regan <joregan@kth.se>

* for non deterministic, a bunch of these symbols can be read as letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* currency

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more inflection

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get the abbreviation expanded as letters for non-deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* working now, add a comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also integer, and preserve order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also accept the full words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* deduplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* reorder to make a bit more sense

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* explicitly make tuples elsewhere; this works from what I see of the function output, but not in the resulting fst

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* commenting out weighted part makes this work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* duplicate space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partially adapted money verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* actually saving the adaptations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add time_zone data (copy from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete commented code, irrelevant here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small modifications, still thinking about how to tackle this

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix missing tabs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove pynini checks from tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for measure (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for telephone (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for time (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix cache dir

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tagger for telephone (copy from sv)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add basic tests (native verified)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add components for read digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an example with a different separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* start adapting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add 2-digit area codes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Bp to area codes, no need to be that specific

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export var

Signed-off-by: Jim O'Regan <joregan@kth.se>

* in progress

Signed-off-by: Jim O'Regan <joregan@kth.se>

* country codes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy/paste errors abound

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put in a function rather than duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nominal digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add IP prompt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add google copyright notice; probably meaningless

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more work on telephone

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minor adaptation; more needed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace time verbaliser with version from sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt more

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nearly there

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace with version from sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some tweaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an IP test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a couple more ordinal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move variables

Signed-off-by: Jim O'Regan <joregan@kth.se>

* filter ordinals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic fraction tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* . and / both clash, so only make year optional if it is not deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* using the other word for two, that test cannot pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* numerator and denominator can compound; qdd minus

Signed-off-by: Jim O'Regan <joregan@kth.se>

* form fractionals in ordinal, because something about bare_ordinals does not work when exported

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test, including spaces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* works in the repl, not in reality

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy fraction symbols from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy two lines from es to handle faction symbols

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test for that

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* extend

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ah, I was forgetting to delete preserve order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add pieces from swedish itn, adapted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a function to give from/to minutes for 15/30/45 subdivision

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add functions, but some pieces came from ITN, so are backwards

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, should change the quarter word to a cardinal, or something

Signed-off-by: Jim O'Regan <joregan@kth.se>

* swapping order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more swapping

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an example

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change some things

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some things fixed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more adjustments to time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more todo, but working for this subset

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* missing endings

Signed-off-by: Jim O'Regan <joregan@kth.se>

* sort|uniq

Signed-off-by: Jim O'Regan <joregan@kth.se>

* timezone can be inflected too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add sparrowhark test (todo)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test_cases_word (copy from sv)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some word cases with Hungarian accents

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Hungarian to Jenkinsfile. This may cause much distress and wailing and gnashing of teeth

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix the commented ITN part

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases for the last two parts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix measure cardinals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* a couple more tests, last still not working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missed removing preserver_order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commi
10000
t.ci

* unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* codeql

Signed-off-by: Jim O'Regan <joregan@kth.se>

* codeql

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment the variables I may wish to use later (codeql)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix decimals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* incorporate feedback from @Laszlo-Weber

Signed-off-by: Jim O'Regan <joregan@kth.se>

* bare minimum tests + fix verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add öre (also for NOK)

Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>

* Comment line, for now

Signed-off-by: Jim O’Regan <joregan@kth.se>

* try breaking this into pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* revert 0c6823e111a876495702d347cf7b347106388ed4

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a bug in cardinal graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* at no point is 000 being deleted; probably why the tests are weird

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert a0d031a861fcd7b5750027f2887f3344f39b6616

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add more spaced alternatives to the non-deterministic cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add the hyphen before or-ing with 000

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change money handling to keep sparrowhawk happy

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add [be]os_or_space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try just rewriting the offending pieces to see if they are coming from here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try just rewriting the offending pieces to see if they are coming from here"

This reverts commit bc06b1162703354fe7bd5efaff7f58ed981d81c0.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Ok... seems to not be happening here either

Revert "try here"

This reverts commit 801c5f1c28d234c8b47a1d6f52f662b909fbb1c2.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try moving a test to see if that makes any difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try duplicating to see if it fails twice

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Ok, fails both times

Revert "try duplicating to see if it fails twice"

This reverts commit 908cddc7b3453fb2deaaa201881a304c853a746a.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 1 fails in some places, 2 in others, so add 2 here and see if that also fails

Signed-off-by: Jim O'Regan <joregan@kth.se>

* see if this makes a difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* It does not

Revert "see if this makes a difference"

This reverts commit dacc61281c4efbfd2d5ce1e91386cbd234392d28.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite regex to silence deprecation warning

Signed-off-by: Jim O'Regan <joregan@kth.se>

* REVERTME: change to see what is happening

Signed-off-by: Jim O'Regan <joregan@kth.se>

* that missing bracket cannot have been good

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no difference, try just deleting leading zero

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move that thing, merge some lines

Signed-off-by: Jim O'Regan <joregan@kth.se>

* at least it fails quickly

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export original

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move things around for no real reason

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add in the clean_cardinal from the tutorial

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "add in the clean_cardinal from the tutorial"

This reverts commit 4f06c885a0bfe1acc183c3560d88c4e2e76574ac.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try this again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* pretty sure this should work. As should the other

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment the ugly kludges to make them easier to remove. They do not work anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, try here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "rewrite regex to silence deprecation warning"

This reverts commit b8a923db57d27c5b3353be4b85bac9efb6e2d220.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "REVERTME: change to see what is happening"

This reverts commit c73e4ef384ec29a2b5877dac9d4fe617a5c681b6.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export unfiltered version of cardinal graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change the variable names

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of duplicate input print

Signed-off-by: Jim O'Regan <joregan@kth.se>

* BUGHUNT: check if string has been escaped

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changing variable, because I am getting tired of looking at that overly long name

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try deleting the normaliser to see if that makes any difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "BUGHUNT: check if string has been escaped"

This reverts commit 70f83241d47b0c73fa41e395eee193cc1685e056.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try deleting the normaliser to see if that makes any difference"

This reverts commit 78f4ded93375308dadb9b5e247f030da2efbecb5.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* moving globals into __init__ fixes the problem

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_sparrowhawk_normalization.sh

Signed-off-by: Jim O’Regan <joregan@kth.se>

* prompt: is not part of the ontology sparrowhawk recognises

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* these two now conflict

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rearrange slightly

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update telephone.py

remove unused import

Signed-off-by: Jim O’Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Es bugfix (#59)

* improve shortest path for decimals and currency

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix sh tn test files for telephone

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* replace non-breaking space

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* improve ambiguous test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* refine weights for decimal

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* improve testing when there are multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* revert ES TN for measures with mixed fractions

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix formatting

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add comment for testing multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Store input_case in Normalizer (#65)

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish telephone fix (#60)

* port fix for telephone from swedish-itn branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend cardinal in non-deterministic mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitespace fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also fix in the verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update Jenkinsfile

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* log instead of print in graph_utils.py (#68)

Signed-off-by: Enno Hermann <enno.hermann@idiap.ch>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CER estimation speedup for audio-based text normalization (#73)

* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add measure coverage for TN and ITN (#62)

* add measure coverage for TN and ITN

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update measure.py

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63)

* upload es-ES and fr-FR g2p dicts

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add inits

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add NALA Spanish dict

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* rename Spanish and French dictionaries

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add Italian dictionary

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.no…
ankitnv added a commit to ankitnv/NeMo-text-processing that referenced this pull request Oct 28, 2024
* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* temporal changes will change back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update jp tn date

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving conflict

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing ci test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updats on Jenkins

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jenkinspdate

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* changing the data format, to align to the blind test data

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding one more test item

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing ci test cases
resolving conflicts
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and NEMO_SPACES_AND_ALHPANUMERICS

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixed typo on decimaltext

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing unsed improts

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unused import

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* changed regular space to narrow space

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* imports error fixing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* imports errors

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Jekins update for jp itn

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* reverting

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issuel chaing narrow space to regular normal space

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixing style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixng style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed imports

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jp tn date update

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Update test_cases_fraction.txt

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* removing previously created nemo imports

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* test order arrangement

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolve fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update jp tn

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed import

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Update post_processing.py

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* empty file

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* to delete

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add contributing (#21)

* add contributing

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* add

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add jenkins file (#23)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish TN (#12)

* test now runs, but getting ordinal instead of cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* force ordinals to either have :a/:e or "." at the end

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add minimal ordinal data

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for ordinals, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add // to symbols

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test cases for electronic; transformed with sed from spanish, so I expect errors

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for electronic, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move to graph_utils

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal test case for fractions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for fraction, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix language

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix graph construction to make pluralisation work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for decimal, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for whitelist, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal test case for whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for word, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for date, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for measure, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a pair of test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix plurals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix number, but this whole thing is only partially adapted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix some test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add usd$

Signed-off-by: Jim O'Regan <joregan@kth.se>

* insert "komma"

Signed-off-by: Jim O'Regan <joregan@kth.se>

* "pund" is neuter

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* towards proper graphs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* GBP

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make komma non-det

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more money tagger fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more minor words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* do a bit better with en/ett

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use the correct list

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make sure the numbers have no 1

Signed-off-by: Jim O'Regan <joregan@kth.se>

* abbreviations for million and milliard

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add year suffixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add minimal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* expansions of era abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use eras

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use eras in verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix examples in comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix extension

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* date verbaliser is broken, this does not fix it

Signed-off-by: Jim O'Regan <joregan@kth.se>

* load labels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* right first time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missing space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix year in test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* getting closer to getting dates working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a (failing) test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* date working now

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also handle decades

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove todo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* years where -00 is -hundra

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for telephone (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changes to telephone tagger/verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add partially incomplete test data

Signed-off-by: Jim O'Regan <joregan@kth.se>

* mostly fixed test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more in progress changes to telephone parts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* much prodding later, turns out I forgot a space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missed wrapping

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "no difference"

This reverts commit 29680925bebd65d489f3b1a5415607c12bb7e3b9.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* telephone tagger works

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try adding brackets

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try adding more brackets

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a comment, because I confused myself

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add in abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a version for fraction that does what I intended: 2 & 3 digit numbers without leading 0 are read as cardinals, everything else as digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* single digit

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test case/remove a duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use the nice variable I just added to cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* this is not right; leading zeros fail

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "this is not right; leading zeros fail"

This reverts commit 5997e95e0cb08ffee9cf21a9c82697ed7beb042f.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, this seems to work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* drop the tests starting with comma

Signed-off-by: Jim O'Regan <joregan@kth.se>

* decimal tagger works

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* lower case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add klockan and variants as a prompt, so they are not silently deleted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a very minimal test case for time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite with less ambiguity

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite with less ambiguity, hms

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add prompt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy the roman handling from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* greek letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some fixes to the time tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for time (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for time (adapted from es) ((actually adapted))

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more work on time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* |=, not =

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt verbaliser a little

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some test cases from module comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export some variables to check

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment some stuff that needs major changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try doing this here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try doing this here"

This reverts commit ebdba0e3da5cdde19eae24f268ee0edd21f298b5.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix errors in tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minimal test cases for measure

Signed-off-by: Jim O'Regan <joregan@kth.se>

* sort|uniq everything, see what the difference is

Signed-off-by: Jim O'Regan <joregan@kth.se>

* merge different tsvs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix casing to avoid conflicts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export some variables for testing

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* need an en/ett split here too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix decimal subgraph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove todo, I've just done it

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove missing integer test, does not work elsewhere

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* include greek letters in maths

Signed-off-by: Jim O'Regan <joregan@kth.se>

* include greek here too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minor sg/pl

Signed-off-by: Jim O'Regan <joregan@kth.se>

* dedup

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put these under if, too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no; there are no minor neuters, so that is not relevant here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove greek from here, interferes with delimeter

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export variables to see what is happening

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix some test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* here is one error

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* put ensure_space in graph_utils

Signed-off-by: Jim O'Regan <joregan@kth.se>

* handle cases where unit follows amount

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export a variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a tesst case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* . is not a cardinal separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add yen

Signed-off-by: Jim O'Regan <joregan@kth.se>

* final fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove English roman tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add tokenize_and_classify_lm.py (adapted from en)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove some unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add tokenize_and_classify_with_audio.py (adapted from en)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test pieces for audio (recopied from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add audio test (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* in non-deterministic mode, generate both en and ett

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal non-deterministic test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <joregan@kth.se>

* warnings about missing whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* some Riksdag specific titles

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add my copyright to the other files with non-trivial changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix year

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Swedish support in pynini_export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add Swedish support for sparrowhark tests -- untested (:

Signed-off-by: Jim O'Regan <joregan@kth.se>

* address codeql comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change decade to year; sparrowhawk enforces categories

Signed-off-by: Jim O'Regan <joregan@kth.se>

* shoehorn this stuff into the overly narrow sparrowhawk classes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "shoehorn this stuff into the overly narrow sparrowhawk classes"

This reverts commit a3cf3d5de1702366b2bf9c12ebf7e5d26634c688.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* read out the AM/PM words, they are not read as letters anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change date verbaliser to manage isolated decades

Signed-off-by: Jim O'Regan <joregan@kth.se>

* redo changes to get rid of 'prompt' for 'klockan'

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove broken duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a case for hours without minutes (which should not happen)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* time tests now pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a time test case that also passes here, but not in sparrowhawk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix error in dates, add more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* import delete_preserve_order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeql feedback

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some ambiguous abbreviations for non-deterministic mode (more as a demonstration than anything deeply useful)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move to the correct subdirectory

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add swedish

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove pynini checks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix error with 1000 in non-deterministic cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix here also

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also generate a string of digits if not deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a date case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove duplication

Signed-off-by: Jim O'Regan <joregan@kth.se>

* boost n_tagged

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also copyright this year

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 1500 only fixes one, boost again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 2500 does nothing, going to -1

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway"

This reverts commit 383a096083061b0c79457e815a65e55563c7ac74.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try setting a low weight to everything non-default

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put n_tagged back to 500

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* days of the week

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add more abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* setting weights did not work for cardinals, but it did push the test from taking 11 minutes to something more than 40. Re-reverting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more informat
10000
ion, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove blank line

Signed-off-by: Jim O'Regan <joregan@kth.se>

* forgot to remove this piece in the merge conflict

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove erroneously added copyright notice

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add __init__.py in a few places it was missing

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add the google notice required by the incoming contributing document

Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CI setup (#25)

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci _cr

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* revert setup tool

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove pytest-runner from setup.py

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Merge EN riva release 22.10 (#26)

* Merge EN riva release 22.10

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Eng TN - update urls to handle dictionary words (#27)

* wip el words

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip el words

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* electronic pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* test pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove unused imports

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add deterministic option normalized options

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins grammar folder

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up, update for SH

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* reduce cardinal graph

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add weight for sh

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Tn en astronomical no (#28)

* Add support for large numbers (>999,999,999,999,999)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache folder in Jenkinsfile

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Increase mem size for CI tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Updating shmem for docker to deal with memory overflow

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Ensure large au cardinal graph is used only if deterministic

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Make comma mandatory in cardinals

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Run FST cache generation and Pytests in separate stages

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix stage

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Change cache folder

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add whitelist param to ITN (#30)

* add whitelist param to itn

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add whitelist to export

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update docstrings

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Eng tn itn (#31)

* Add additional units and plurals

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add support for financial periods (1H22, 2Q19)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add missing plural for "gigabit per second"

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix for measures

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Use environment variables to set path of fst cache

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix environment variable

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Fix parse "None" as string (#33)

* Fix parse "None" as string

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* read double digits for telephone grammar (#32)

* read double digits for telephone grammar

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* import zero graph instead of hard coding

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>

---------

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Install (#35)

* remove conda pynini install

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added pynini install note

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Install (#36)

* remove conda pynini install

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added pynini install note

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.6rc0 (#37)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add ci (#39)

* Add additional languages to CI Pipeline

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix Jenkinsfile

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add missing 'ar' in lang options

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix missing 'ar' in normalize.py

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Correct name of verbalizer far

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Run language tests in stages

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update DE cache folder

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add VI, RU, SV CI tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix misssing bracket, add ZH

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Use non-deterministic TN for RU

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* support the use of phonetic superscript letters for ordinals, because there are maniacs on the internet who think that because you can, you should (#41)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update fr cache path for ci (#44)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update ITN to work after Punctuation capitalization model (#22)

* add cases with capitalization, cardinal, decimal pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix telephone, ordinal

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update electronic

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* review feedback, update whitelist

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* rename capitalize func

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix SH tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins folder name

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* added cased arg to ITN

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add input_case arg to other lang

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dirs update

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix codeql errors

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix sh

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* review

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix default value

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* En names (#42)

* Add support for Financial year and for  years between 1000 BC and 1000AD

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add support for product names and add abbreviations to whitelist

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add weights for some sequences, exclude 'a' before numeric sequence

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache folder for EN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update FR Cache path

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Move text to TSV files, and some code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add additional vocabulary, allow singular usage of units to support adjective phrases

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix issue with whitelist loader not handling weights correctly
Move cased loader file to graph_utils

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* insert space between value and unit

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Insert space between measurement and unit. Adjust weight for ordinal

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update doc and fix alignment for itn (#47)

* save

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* save

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* extend alignment for itn

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Align ci test (#51)

* added jenkins tests for aligment

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added test to pr doc

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Audio-based TN for Swedish (#49)

* Audio-based TN for Swedish, for Språkbanken Tal

Replaces #48

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updating cache directory

(Not entirely sure what the pattern is)

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Delete tokenize_and_classify_lm.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* fraction fix from ITN branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix sv tests (#52)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.7 release (#53)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* En names (#56)

* Rename "period" tag to "text" tag for date to avoid changes to sparrowhawk proto

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update Jenkinsfile

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix bug for hh:mm:ss normalization (#57)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* rewrite regex to silence deprecation warning (#55)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Hungarian TN ✅ (#9)

* additional exports from cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add inflection for quantities

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* enable decimal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change integer

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to verbaliser for decimal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add superessive forms (powers of)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* superscript to superessive

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add vowels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add vowels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix var

Signed-off-by: Jim O'Regan <joregan@kth.se>

* bare minimum electronic test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a symbol

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add incomplete time tagger (partially adapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix error with some inflected abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some alternative measure forms

Signed-off-by: Jim O'Regan <joregan@kth.se>

* hour, minute, second; whichever is last can be inflected

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add test runner for time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal time test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* will want cardinal here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add inflection for things like GBP, where inflection is based on pé

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* docstring

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move two letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add my copyright

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partially adapted number tagger (adapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add unadapted measure tagger (from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* other ways of reading w

Signed-off-by: Jim O'Regan <joregan@kth.se>

* for non deterministic, a bunch of these symbols can be read as letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* currency

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more inflection

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get the abbreviation expanded as letters for non-deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* working now, add a comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also integer, and preserve order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also accept the full words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* deduplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* reorder to make a bit more sense

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* explicitly make tuples elsewhere; this works from what I see of the function output, but not in the resulting fst

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* commenting out weighted part makes this work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* duplicate space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partially adapted money verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* actually saving the adaptations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add time_zone data (copy from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete commented code, irrelevant here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small modifications, still thinking about how to tackle this

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix missing tabs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove pynini checks from tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for measure (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for telephone (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for time (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix cache dir

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tagger for telephone (copy from sv)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add basic tests (native verified)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add components for read digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an example with a different separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* start adapting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add 2-digit area codes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Bp to area codes, no need to be that specific

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export var

Signed-off-by: Jim O'Regan <joregan@kth.se>

* in progress

Signed-off-by: Jim O'Regan <joregan@kth.se>

* country codes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy/paste errors abound

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put in a function rather than duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nominal digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add IP prompt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add google copyright notice; probably meaningless

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more work on telephone

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minor adaptation; more needed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace time verbaliser with version from sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt more

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nearly there

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace with version from sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some tweaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an IP test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a couple more ordinal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move variables

Signed-off-by: Jim O'Regan <joregan@kth.se>

* filter ordinals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic fraction tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* . and / both clash, so only make year optional if it is not deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* using the other word for two, that test cannot pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* numerator and denominator can compound; qdd minus

Signed-off-by: Jim O'Regan <joregan@kth.se>

* form fractionals in ordinal, because something about bare_ordinals does not work when exported

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test, including spaces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* works in the repl, not in reality

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy fraction symbols from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy two lines from es to handle faction symbols

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test for that

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* extend

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ah, I was forgetting to delete preserve order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add pieces from swedish itn, adapted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a function to give from/to minutes for 15/30/45 subdivision

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add functions, but some pieces came from ITN, so are backwards

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, should change the quarter word to a cardinal, or something

Signed-off-by: Jim O'Regan <joregan@kth.se>

* swapping order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more swapping

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an example

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change some things

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some things fixed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more adjustments to time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more todo, but working for this subset

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* missing endings

Signed-off-by: Jim O'Regan <joregan@kth.se>

* sort|uniq

Signed-off-by: Jim O'Regan <joregan@kth.se>

* timezone can be inflected too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add sparrowhark test (todo)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test_cases_word (copy from sv)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some word cases with Hungarian accents

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Hungarian to Jenkinsfile. This may cause much distress and wailing and gnashing of teeth

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix the commented ITN part

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases for the last two parts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix measure cardinals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* a couple more tests, last still not working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missed removing preserver_order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* codeql

Signed-off-by: Jim O'Regan <joregan@kth.se>

* codeql

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment the variables I may wish to use later (codeql)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix decimals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* incorporate feedback from @Laszlo-Weber

Signed-off-by: Jim O'Regan <joregan@kth.se>

* bare minimum tests + fix verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add öre (also for NOK)

Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>

* Comment line, for now

Signed-off-by: Jim O’Regan <joregan@kth.se>

* try breaking this into pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* revert 0c6823e111a876495702d347cf7b347106388ed4

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a bug in cardinal graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* at no point is 000 being deleted; probably why the tests are weird

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert a0d031a861fcd7b5750027f2887f3344f39b6616

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add more spaced alternatives to the non-deterministic cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add the hyphen before or-ing with 000

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change money handling to keep sparrowhawk happy

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add [be]os_or_space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try just rewriting the offending pieces to see if they are coming from here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try just rewriting the offending pieces to see if they are coming from here"

This reverts commit bc06b1162703354fe7bd5efaff7f58ed981d81c0.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Ok... seems to not be happening here either

Revert "try here"

This reverts commit 801c5f1c28d234c8b47a1d6f52f662b909fbb1c2.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try moving a test to see if that makes any difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try duplicating to see if it fails twice

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Ok, fails both times

Revert "try duplicating to see if it fails twice"

This reverts commit 908cddc7b3453fb2deaaa201881a304c853a746a.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 1 fails in some places, 2 in others, so add 2 here and see if that also fails

Signed-off-by: Jim O'Regan <joregan@kth.se>

* see if this makes a difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* It does not

Revert "see if this makes a difference"

This reverts commit dacc61281c4efbfd2d5ce1e91386cbd234392d28.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite regex to silence deprecation warning

Signed-off-by: Jim O'Regan <joregan@kth.se>

* REVERTME: change to see what is happening

Signed-off-by: Jim O'Regan <joregan@kth.se>

* that missing bracket cannot have been good

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no difference, try just deleting leading zero

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move that thing, merge some lines

Signed-off-by: Jim O'Regan <joregan@kth.se>

* at least it fails quickly

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export original

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move things around for no real reason

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add in the clean_cardinal from the tutorial

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "add in the clean_cardinal from the tutorial"

This reverts commit 4f06c885a0bfe1acc183c3560d88c4e2e76574ac.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try this again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* pretty sure this should work. As should the other

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment the ugly kludges to make them easier to remove. They do not work anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, try here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "rewrite regex to silence deprecation warning"

This reverts commit b8a923db57d27c5b3353be4b85bac9efb6e2d220.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "REVERTME: change to see what is happening"

This reverts commit c73e4ef384ec29a2b5877dac9d4fe617a5c681b6.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export unfiltered version of cardinal graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change the variable names

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of duplicate input print

Signed-off-by: Jim O'Regan <joregan@kth.se>

* BUGHUNT: check if string has been escaped

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changing variable, because I am getting tired of looking at that overly long name

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try deleting the normaliser to see if that makes any difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "BUGHUNT: check if string has been escaped"

This reverts commit 70f83241d47b0c73fa41e395eee193cc1685e056.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try deleting the normaliser to see if that makes any difference"

This reverts commit 78f4ded93375308dadb9b5e247f030da2efbecb5.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* moving globals into __init__ fixes the problem

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_sparrowhawk_normalization.sh

Signed-off-by: Jim O’Regan <joregan@kth.se>

* prompt: is not part of the ontology sparrowhawk recognises

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* these two now conflict

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rearrange slightly

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update telephone.py

remove unused import

Signed-off-by: Jim O’Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Es bugfix (#59)

* improve shortest path for decimals and currency

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix sh tn test files for telephone

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* replace non-breaking space

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* improve ambiguous test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* refine weights for decimal

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* improve testing when there are multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* revert ES TN for measures with mixed fractions

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix formatting

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add comment for testing multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Store input_case in Normalizer (#65)

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish telephone fix (#60)

* port fix for telephone from swedish-itn branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend cardinal in non-deterministic mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitespace fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also fix in the verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update Jenkinsfile

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* log instead of print in graph_utils.py (#68)

Signed-off-by: Enno Hermann <enno.hermann@idiap.ch>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CER estimation speedup for audio-based text normalization (#73)

* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add measure coverage for TN and ITN (#62)

* add measure coverage for TN and ITN

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update measure.py

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63)

* upload es-ES and fr-FR g2p dicts

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add inits

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add NALA Spanish dict

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* rename Spanish and French dictionaries

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add Italian dictionary

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[…
ankitnv added a commit to ankitnv/NeMo-text-processing that referenced this pull request Oct 28, 2024
* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* temporal changes will change back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update jp tn date

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving conflict

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing ci test cases

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* updats on Jenkins

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jenkinspdate

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* changing the data format, to align to the blind test data

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding one more test item

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* temporal fixings attempt to fixn SH test errors, will fix back

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* adding grammars back in the tokenizer

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixing ci test cases
resolving conflicts
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* with pynini closure had errors chaing back to no closure version

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving space issue on fraction, added NEMO_NARROW_NON_BREAK_SPACE

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolving space fraction issue added NEMO_NARROW_NON_BREAK_SPACE and NEMO_SPACES_AND_ALHPANUMERICS

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixed typo on decimaltext

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed grammar

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing unsed improts

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unused import

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* changed regular space to narrow space

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* imports error fixing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* imports errors

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Jekins update for jp itn

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* reverting

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update for fraction space issuel chaing narrow space to regular normal space

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixing style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fixng style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed imports

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* jp tn date update

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Update test_cases_fraction.txt

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* removing previously created nemo imports

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* test order arrangement

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* resolve fraction space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* style fix

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix style

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* space issue

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update jp tn

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing unsed import

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Update post_processing.py

Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com>

* empty file

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* to delete

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* removing

Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add contributing (#21)

* add contributing

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* add

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add jenkins file (#23)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish TN (#12)

* test now runs, but getting ordinal instead of cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* force ordinals to either have :a/:e or "." at the end

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add minimal ordinal data

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for ordinals, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add // to symbols

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test cases for electronic; transformed with sed from spanish, so I expect errors

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for electronic, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to make electronic verbaliser work (not yet)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move to graph_utils

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal test case for fractions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for fraction, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix language

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix graph construction to make pluralisation work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for decimal, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for whitelist, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal test case for whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for word, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for date, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for measure, adapted from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a pair of test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix plurals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix number, but this whole thing is only partially adapted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix some test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add usd$

Signed-off-by: Jim O'Regan <joregan@kth.se>

* insert "komma"

Signed-off-by: Jim O'Regan <joregan@kth.se>

* "pund" is neuter

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* towards proper graphs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* GBP

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make komma non-det

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more money tagger fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more minor words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* do a bit better with en/ett

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use the correct list

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* make sure the numbers have no 1

Signed-off-by: Jim O'Regan <joregan@kth.se>

* abbreviations for million and milliard

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add year suffixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add minimal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* expansions of era abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use eras

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use eras in verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix examples in comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix extension

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* date verbaliser is broken, this does not fix it

Signed-off-by: Jim O'Regan <joregan@kth.se>

* load labels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* right first time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missing space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix year in test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* getting closer to getting dates working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a (failing) test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* date working now

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also handle decades

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove todo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* years where -00 is -hundra

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for telephone (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changes to telephone tagger/verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add partially incomplete test data

Signed-off-by: Jim O'Regan <joregan@kth.se>

* mostly fixed test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more in progress changes to telephone parts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* much prodding later, turns out I forgot a space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missed wrapping

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "no difference"

This reverts commit 29680925bebd65d489f3b1a5415607c12bb7e3b9.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* telephone tagger works

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try adding brackets

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try adding more brackets

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a comment, because I confused myself

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add in abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a version for fraction that does what I intended: 2 & 3 digit numbers without leading 0 are read as cardinals, everything else as digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* single digit

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test case/remove a duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* use the nice variable I just added to cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* this is not right; leading zeros fail

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "this is not right; leading zeros fail"

This reverts commit 5997e95e0cb08ffee9cf21a9c82697ed7beb042f.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, this seems to work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* drop the tests starting with comma

Signed-off-by: Jim O'Regan <joregan@kth.se>

* decimal tagger works

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* lower case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add klockan and variants as a prompt, so they are not silently deleted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a very minimal test case for time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite with less ambiguity

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite with less ambiguity, hms

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add prompt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy the roman handling from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* greek letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some fixes to the time tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test runner for time (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* test runner for time (adapted from es) ((actually adapted))

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more work on time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* |=, not =

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt verbaliser a little

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some test cases from module comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export some variables to check

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small fix

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment some stuff that needs major changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try doing this here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try doing this here"

This reverts commit ebdba0e3da5cdde19eae24f268ee0edd21f298b5.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix errors in tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minimal test cases for measure

Signed-off-by: Jim O'Regan <joregan@kth.se>

* sort|uniq everything, see what the difference is

Signed-off-by: Jim O'Regan <joregan@kth.se>

* merge different tsvs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix casing to avoid conflicts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export some variables for testing

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* need an en/ett split here too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix decimal subgraph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove todo, I've just done it

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove missing integer test, does not work elsewhere

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* include greek letters in maths

Signed-off-by: Jim O'Regan <joregan@kth.se>

* include greek here too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minor sg/pl

Signed-off-by: Jim O'Regan <joregan@kth.se>

* dedup

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put these under if, too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no; there are no minor neuters, so that is not relevant here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove greek from here, interferes with delimeter

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export variables to see what is happening

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix some test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* here is one error

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* put ensure_space in graph_utils

Signed-off-by: Jim O'Regan <joregan@kth.se>

* handle cases where unit follows amount

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export a variable

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a tesst case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* . is not a cardinal separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add yen

Signed-off-by: Jim O'Regan <joregan@kth.se>

* final fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove English roman tagger

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add tokenize_and_classify_lm.py (adapted from en)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove some unused pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add tokenize_and_classify_with_audio.py (adapted from en)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test pieces for audio (recopied from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add audio test (adapted from es)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* in non-deterministic mode, generate both en and ett

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal non-deterministic test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <joregan@kth.se>

* warnings about missing whitelist

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove commented pieces/things that will not be used

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* some Riksdag specific titles

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add my copyright to the other files with non-trivial changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix year

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Swedish support in pynini_export

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add Swedish support for sparrowhark tests -- untested (:

Signed-off-by: Jim O'Regan <joregan@kth.se>

* address codeql comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change decade to year; sparrowhawk enforces categories

Signed-off-by: Jim O'Regan <joregan@kth.se>

* shoehorn this stuff into the overly narrow sparrowhawk classes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "shoehorn this stuff into the overly narrow sparrowhawk classes"

This reverts commit a3cf3d5de1702366b2bf9c12ebf7e5d26634c688.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* read out the AM/PM words, they are not read as letters anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change date verbaliser to manage isolated decades

Signed-off-by: Jim O'Regan <joregan@kth.se>

* redo changes to get rid of 'prompt' for 'klockan'

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove broken duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a case for hours without minutes (which should not happen)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* time tests now pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a time test case that also passes here, but not in sparrowhawk

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix error in dates, add more tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* import delete_preserve_order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeql feedback

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some ambiguous abbreviations for non-deterministic mode (more as a demonstration than anything deeply useful)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move to the correct subdirectory

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add swedish

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove pynini checks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix error with 1000 in non-deterministic cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix here also

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also generate a string of digits if not deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a date case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove duplication

Signed-off-by: Jim O'Regan <joregan@kth.se>

* boost n_tagged

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also copyright this year

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 1500 only fixes one, boost again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 2500 does nothing, going to -1

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "remove audio normalisation; I do not have the time to get this working right now, and there was a bunch more to do for it anyway"

This reverts commit 383a096083061b0c79457e815a65e55563c7ac74.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try setting a low weight to everything non-default

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put n_tagged back to 500

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* days of the week

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add more abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* setting weights did not work for cardinals, but it did push the test from taking 11 minutes to something more than 40. Re-reverting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove blank line

Signed-off-by: Jim O'Regan <joregan@kth.se>

* forgot to remove this piece in the merge conflict

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove erroneously added copyright notice

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add __init__.py in a few places it was missing

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add the google notice required by the incoming contributing document

Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CI setup (#25)

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci _cr

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* revert setup tool

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove pytest-runner from setup.py

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Merge EN riva release 22.10 (#26)

* Merge EN riva release 22.10

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Eng TN - update urls to handle dictionary words (#27)

* wip el words

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip el words

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* electronic pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* test pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove unused imports

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add deterministic option normalized options

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins grammar folder

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up, update for SH

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* reduce cardinal graph

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add weight for sh

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Tn en astronomical no (#28)

* Add support for large numbers (>999,999,999,999,999)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache folder in Jenkinsfile

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Increase mem size for CI tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Updating shmem for docker to deal with memory overflow

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Ensure large au cardinal graph is used only if deterministic

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Make comma mandatory in cardinals

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Run FST cache generation and Pytests in separate stages

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix stage

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Change cache folder

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add whitelist param to ITN (#30)

* add whitelist param to itn

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add whitelist to export

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update docstrings

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Eng tn itn (#31)

* Add additional units and plurals

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add support for financial periods (1H22, 2Q19)

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add missing plural for "gigabit per second"

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix for measures

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Use environment variables to set path of fst cache

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix environment variable

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Fix parse "None" as string (#33)

* Fix parse "None" as string

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* read double digits for telephone grammar (#32)

* read double digits for telephone grammar

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* import zero graph instead of hard coding

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>

---------

Signed-off-by: Larisa Kempbell <lkempbell@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Install (#35)

* remove conda pynini install

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added pynini install note

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Install (#36)

* remove conda pynini install

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added pynini install note

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix text

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.6rc0 (#37)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Add ci (#39)

* Add additional languages to CI Pipeline

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix Jenkinsfile

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add missing 'ar' in lang options

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix missing 'ar' in normalize.py

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Correct name of verbalizer far

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Run language tests in stages

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update DE cache folder

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add VI, RU, SV CI tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix misssing bracket, add ZH

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Use non-deterministic TN for RU

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* support the use of phonetic superscript letters for ordinals, because there are maniacs on the internet who think that because you can, you should (#41)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update fr cache path for ci (#44)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update ITN to work after Punctuation capitalization model (#22)

* add cases with capitalization, cardinal, decimal pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix telephone, ordinal

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restarting ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update electronic

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* review feedback, update whitelist

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* rename capitalize func

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix SH tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins folder name

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* added cased arg to ITN

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add input_case arg to other lang

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dirs update

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix codeql errors

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix sh

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* review

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix default value

Signed-off-by: ekmb <ebakhturina@nvidia.com>

---------

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* En names (#42)

* Add support for Financial year and for  years between 1000 BC and 1000AD

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add support for product names and add abbreviations to whitelist

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add weights for some sequences, exclude 'a' before numeric sequence

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Add tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update cache folder for EN

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update FR Cache path

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Move text to TSV files, and some code cleanup

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add additional vocabulary, allow singular usage of units to support adjective phrases

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix issue with whitelist loader not handling weights correctly
Move cased loader file to graph_utils

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* insert space between value and unit

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Insert space between measurement and unit. Adjust weight for ordinal

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* update doc and fix alignment for itn (#47)

* save

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* save

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* extend alignment for itn

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Align ci test (#51)

* added jenkins tests for aligment

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added test to pr doc

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Audio-based TN for Swedish (#49)

* Audio-based TN for Swedish, for Språkbanken Tal

Replaces #48

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updating cache directory

(Not entirely sure what the pattern is)

Signed-off-by: Jim O’Regan <joregan@kth.se>

* Delete tokenize_and_classify_lm.py

Signed-off-by: Jim O’Regan <joregan@kth.se>

* fraction fix from ITN branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix sv tests (#52)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* 0.1.7 release (#53)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* En names (#56)

* Rename "period" tag to "text" tag for date to avoid changes to sparrowhawk proto

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update Jenkinsfile

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* fix bug for hh:mm:ss normalization (#57)

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* rewrite regex to silence deprecation warning (#55)

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Hungarian TN ✅ (#9)

* additional exports from cardinal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add inflection for quantities

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* enable decimal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change integer

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fixes to verbaliser for decimal

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more test cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add superessive forms (powers of)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* superscript to superessive

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add vowels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add vowels

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix var

Signed-off-by: Jim O'Regan <joregan@kth.se>

* bare minimum electronic test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test case

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a symbol

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add incomplete time tagger (partially adapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix error with some inflected abbreviations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some alternative measure forms

Signed-off-by: Jim O'Regan <joregan@kth.se>

* hour, minute, second; whichever is last can be inflected

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add test runner for time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add very minimal time test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* will want cardinal here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add inflection for things like GBP, where inflection is based on pé

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* docstring

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move two letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add my copyright

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partially adapted number tagger (adapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small changes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add unadapted measure tagger (from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* other ways of reading w

Signed-off-by: Jim O'Regan <joregan@kth.se>

* for non deterministic, a bunch of these symbols can be read as letters

Signed-off-by: Jim O'Regan <joregan@kth.se>

* currency

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more inflection

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get the abbreviation expanded as letters for non-deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* working now, add a comment

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also integer, and preserve order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also accept the full words

Signed-off-by: Jim O'Regan <joregan@kth.se>

* deduplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* reorder to make a bit more sense

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* explicitly make tuples elsewhere; this works from what I see of the function output, but not in the resulting fst

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt comments

Signed-off-by: Jim O'Regan <joregan@kth.se>

* commenting out weighted part makes this work

Signed-off-by: Jim O'Regan <joregan@kth.se>

* duplicate space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* partially adapted money verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* actually saving the adaptations

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add time_zone data (copy from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* delete commented code, irrelevant here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* small modifications, still thinking about how to tackle this

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change year of copyright in empty files, they aren't eligible anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix missing tabs

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove pynini checks from tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix typo

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for measure (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for telephone (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add verbaliser for time (unadapted from de)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* uncomment everything. yolo.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix cache dir

Signed-off-by: Jim O'Regan <joregan@kth.se>

* tagger for telephone (copy from sv)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add basic tests (native verified)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add components for read digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an example with a different separator

Signed-off-by: Jim O'Regan <joregan@kth.se>

* start adapting

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add 2-digit area codes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Bp to area codes, no need to be that specific

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export var

Signed-off-by: Jim O'Regan <joregan@kth.se>

* in progress

Signed-off-by: Jim O'Regan <joregan@kth.se>

* country codes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy/paste errors abound

Signed-off-by: Jim O'Regan <joregan@kth.se>

* put in a function rather than duplicate

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nominal digits

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add IP prompt

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add google copyright notice; probably meaningless

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more work on telephone

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix path

Signed-off-by: Jim O'Regan <joregan@kth.se>

* minor adaptation; more needed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace time verbaliser with version from sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* adapt more

Signed-off-by: Jim O'Regan <joregan@kth.se>

* nearly there

Signed-off-by: Jim O'Regan <joregan@kth.se>

* replace with version from sv

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some tweaks

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an IP test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add a couple more ordinal tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move variables

Signed-off-by: Jim O'Regan <joregan@kth.se>

* filter ordinals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic fraction tests

Signed-off-by: Jim O'Regan <joregan@kth.se>

* . and / both clash, so only make year optional if it is not deterministic

Signed-off-by: Jim O'Regan <joregan@kth.se>

* using the other word for two, that test cannot pass

Signed-off-by: Jim O'Regan <joregan@kth.se>

* numerator and denominator can compound; qdd minus

Signed-off-by: Jim O'Regan <joregan@kth.se>

* form fractionals in ordinal, because something about bare_ordinals does not work when exported

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add another test, including spaces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* works in the repl, not in reality

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* copy fraction symbols from es

Signed-off-by: Jim O'Regan <joregan@kth.se>

* copy two lines from es to handle faction symbols

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a test for that

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* extend

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ah, I was forgetting to delete preserve order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add pieces from swedish itn, adapted

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add a function to give from/to minutes for 15/30/45 subdivision

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add functions, but some pieces came from ITN, so are backwards

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, should change the quarter word to a cardinal, or something

Signed-off-by: Jim O'Regan <joregan@kth.se>

* swapping order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more swapping

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove import

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add an example

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change some things

Signed-off-by: Jim O'Regan <joregan@kth.se>

* some things fixed

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more adjustments to time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* more todo, but working for this subset

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more time

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* missing endings

Signed-off-by: Jim O'Regan <joregan@kth.se>

* sort|uniq

Signed-off-by: Jim O'Regan <joregan@kth.se>

* timezone can be inflected too

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add sparrowhark test (todo)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add test_cases_word (copy from sv)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add some word cases with Hungarian accents

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add Hungarian to Jenkinsfile. This may cause much distress and wailing and gnashing of teeth

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix the commented ITN part

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add hu

Signed-off-by: Jim O'Regan <joregan@kth.se>

* basic test cases for the last two parts

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix measure cardinals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* a couple more tests, last still not working

Signed-off-by: Jim O'Regan <joregan@kth.se>

* missed removing preserver_order

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix test

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* codeql

Signed-off-by: Jim O'Regan <joregan@kth.se>

* codeql

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment the variables I may wish to use later (codeql)

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix decimals

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* incorporate feedback from @Laszlo-Weber

Signed-off-by: Jim O'Regan <joregan@kth.se>

* bare minimum tests + fix verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add öre (also for NOK)

Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>

* Comment line, for now

Signed-off-by: Jim O’Regan <joregan@kth.se>

* try breaking this into pieces

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add missing __init__.py

Signed-off-by: Jim O'Regan <joregan@kth.se>

* revert 0c6823e111a876495702d347cf7b347106388ed4

Signed-off-by: Jim O'Regan <joregan@kth.se>

* fix a bug in cardinal graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* at no point is 000 being deleted; probably why the tests are weird

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert a0d031a861fcd7b5750027f2887f3344f39b6616

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add more spaced alternatives to the non-deterministic cases

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add the hyphen before or-ing with 000

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change money handling to keep sparrowhawk happy

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add [be]os_or_space

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try just rewriting the offending pieces to see if they are coming from here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se&
8D47
gt;

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try just rewriting the offending pieces to see if they are coming from here"

This reverts commit bc06b1162703354fe7bd5efaff7f58ed981d81c0.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add extra spaced versions

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Ok... seems to not be happening here either

Revert "try here"

This reverts commit 801c5f1c28d234c8b47a1d6f52f662b909fbb1c2.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try moving a test to see if that makes any difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try duplicating to see if it fails twice

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Ok, fails both times

Revert "try duplicating to see if it fails twice"

This reverts commit 908cddc7b3453fb2deaaa201881a304c853a746a.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* 1 fails in some places, 2 in others, so add 2 here and see if that also fails

Signed-off-by: Jim O'Regan <joregan@kth.se>

* see if this makes a difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* It does not

Revert "see if this makes a difference"

This reverts commit dacc61281c4efbfd2d5ce1e91386cbd234392d28.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rewrite regex to silence deprecation warning

Signed-off-by: Jim O'Regan <joregan@kth.se>

* REVERTME: change to see what is happening

Signed-off-by: Jim O'Regan <joregan@kth.se>

* that missing bracket cannot have been good

Signed-off-by: Jim O'Regan <joregan@kth.se>

* no difference, try just deleting leading zero

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move that thing, merge some lines

Signed-off-by: Jim O'Regan <joregan@kth.se>

* at least it fails quickly

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export original

Signed-off-by: Jim O'Regan <joregan@kth.se>

* move things around for no real reason

Signed-off-by: Jim O'Regan <joregan@kth.se>

* add in the clean_cardinal from the tutorial

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "add in the clean_cardinal from the tutorial"

This reverts commit 4f06c885a0bfe1acc183c3560d88c4e2e76574ac.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try this again

Signed-off-by: Jim O'Regan <joregan@kth.se>

* pretty sure this should work. As should the other

Signed-off-by: Jim O'Regan <joregan@kth.se>

* comment the ugly kludges to make them easier to remove. They do not work anyway

Signed-off-by: Jim O'Regan <joregan@kth.se>

* ok, try here

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "rewrite regex to silence deprecation warning"

This reverts commit b8a923db57d27c5b3353be4b85bac9efb6e2d220.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "REVERTME: change to see what is happening"

This reverts commit c73e4ef384ec29a2b5877dac9d4fe617a5c681b6.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* remove unused imports

Signed-off-by: Jim O'Regan <joregan@kth.se>

* export unfiltered version of cardinal graph

Signed-off-by: Jim O'Regan <joregan@kth.se>

* change the variable names

Signed-off-by: Jim O'Regan <joregan@kth.se>

* get rid of duplicate input print

Signed-off-by: Jim O'Regan <joregan@kth.se>

* BUGHUNT: check if string has been escaped

Signed-off-by: Jim O'Regan <joregan@kth.se>

* changing variable, because I am getting tired of looking at that overly long name

Signed-off-by: Jim O'Regan <joregan@kth.se>

* try deleting the normaliser to see if that makes any difference

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "BUGHUNT: check if string has been escaped"

This reverts commit 70f83241d47b0c73fa41e395eee193cc1685e056.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Revert "try deleting the normaliser to see if that makes any difference"

This reverts commit 78f4ded93375308dadb9b5e247f030da2efbecb5.

Signed-off-by: Jim O'Regan <joregan@kth.se>

* moving globals into __init__ fixes the problem

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_sparrowhawk_normalization.sh

Signed-off-by: Jim O’Regan <joregan@kth.se>

* prompt: is not part of the ontology sparrowhawk recognises

Signed-off-by: Jim O'Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* these two now conflict

Signed-off-by: Jim O'Regan <joregan@kth.se>

* rearrange slightly

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update telephone.py

remove unused import

Signed-off-by: Jim O’Regan <joregan@kth.se>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Es bugfix (#59)

* improve shortest path for decimals and currency

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix sh tn test files for telephone

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* replace non-breaking space

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* improve ambiguous test cases

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* refine weights for decimal

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* improve testing when there are multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* revert ES TN for measures with mixed fractions

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* fix formatting

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add comment for testing multiple shortest paths

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Store input_case in Normalizer (#65)

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* Swedish telephone fix (#60)

* port fix for telephone from swedish-itn branch

Signed-off-by: Jim O'Regan <joregan@kth.se>

* extend cardinal in non-deterministic mode

Signed-off-by: Jim O'Regan <joregan@kth.se>

* whitespace fixes

Signed-off-by: Jim O'Regan <joregan@kth.se>

* also fix in the verbaliser

Signed-off-by: Jim O'Regan <joregan@kth.se>

* Update Jenkinsfile

Signed-off-by: Jim O’Regan <joregan@kth.se>

---------

Signed-off-by: Jim O'Regan <joregan@kth.se>
Signed-off-by: Jim O’Regan <joregan@kth.se>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* log instead of print in graph_utils.py (#68)

Signed-off-by: Enno Hermann <enno.hermann@idiap.ch>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* CER estimation speedup for audio-based text normalization (#73)

* Replaced jiwer with editdistance to speed up CER estimation

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* add measure coverage for TN and ITN (#62)

* add measure coverage for TN and ITN

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* Remove unused imports

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update measure.py

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>

---------

Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>

* upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63)

* upload es-ES and fr-FR g2p dicts

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add inits

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add NALA Spanish dict

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* rename Spanish and French dictionaries

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

* add Italian dictionary

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0